NsJail: A light-weight process isolation tool for Linux

(nsjail.dev)

64 points | by rzk 257 days ago

6 comments

qbane 257 days ago
I have forked this project long ago and have built an online judge utilizing its BPF integration to filter out unwanted syscalls. The fork implements the time/mem usage reporting to satisfy the judge's need and it has improved my knowledge to modern Linux kernels.
There were some rough edges back then, but it had been my go-to tool to run user-provided code in isolation.
https://github.com/NeoHOJ/nsjail
5- 257 days ago
it'd be interesting to see a comparison of these -- the building blocks are (mostly) the same, but the interfaces differ in interesting ways:
- nsjail
- firejail
- bubblewrap
- runc
etc.
[-]
- a-french-anon 257 days ago
  As a bubblewrap user, beware https://github.com/containers/bubblewrap/pull/586 still missing. The usual ^C doesn't work with your sandboxed stuff, very annoying.
  A cursory look at NSjail tells me its filesystem stuff is less granular than bwrap's bind mounting.
  Firejail can't handle : in some paths (at all, no escaping provided) which made me dump it.
  [-]
  - selendym 256 days ago
    > Firejail can't handle : in some paths (at all, no escaping provided) which made me dump it.
    This doesn't match my experience. For example, the following works just fine in a profile file:
```
  blacklist /sys/devices/pci0000:00/*
```
    Can you give an example of what you had problems with?
    [-]
    - a-french-anon 256 days ago
      Looks like this specific character got fixed. But still a lot of forbidden ones.
      cf https://github.com/netblue30/firejail/issues/4614, https://github.com/netblue30/firejail/blob/master/src/fireja... and https://github.com/netblue30/firejail/blob/master/src/lib/co...
- pveierland 257 days ago
  Another interesting and modern alternative is Syd written in Rust.
  https://gitlab.exherbo.org/sydbox/sydbox
- beardedwizard 257 days ago
  One is not like the others - firejail is aimed at more of desktop type applications you interact with, where the others can do so but are more suited for arbitrary workloads.
  A parent comment mentions ebpf syscall interception, many end up combining gvisor and nsjail and seccomp.
- anonzzzies 257 days ago
  Me too, for me the ease of use is rather important. NSJail is very easy to use, I am not sure which ones I tried when looking for these tools but some of them were an absolute pain to get going.
  Edit: funnily, chatgpt 03-mini tells me nsjail is the second hardest to use (first = systemd) of these...
- sushidev 257 days ago
  And jailer from firecracker and systemd itself which has some similar capabilities
- yamrzou 257 days ago
  And pledge(): https://justine.lol/pledge/
  [-]
  - kennysoona 257 days ago
    pledge is the openbsd version of landlock, a pretty different category from the other namespace based solutions listed.
    [-]
    - bjackman 257 days ago
      It's still a reasonable comparison though. The seccomp-bpf is part of nsjail is achieving the same thing, one way to look at it is that Landlock/pledge are just a better implementation for the same approximate feature.
      [-]
      - kennysoona 257 days ago
        I don't really find it reasonable, landlock type functionality is a tiny subset of what namespace based sandboxing offers. It's like comparing a scanner to authenticate ID cards against a fortified house.
        [-]
        l0kod 255 days ago
        Namespace are very useful to build virtual environments, but I think it's important to keep in mind that they are not designed for sandboxing and don't provide security guarantees (e.g. mount point propagation), nor fine-grained access rights, nor security events (e.g. logs)... which might be OK according to use cases. Also, namespaces increase the attack surface of the kernel (e.g. vulnerabilities that can be reached through user namespaces). That being said, even if Landlock can control the most important filesystem access rights, not all of them are supported yet. New kernel releases bring new Landlock features (e.g. IPC, network control). It takes some time to build a new and safe access control system but we'll get there!
        bjackman 257 days ago
        Oh yeah I was just talking specifically about the seccomp-bpf bit. It's not comparable to nsjail as a whole.
aa-jv 257 days ago
A few decades back we had the ability to cryogenically freeze processes, save them to storage, move the bins to another system, and defrost them to be run again. This was a great feature that I had hoped would make its way into mainstream kernels, but it seems to have disappeared off the face of the earth.
I wonder if the expansion of process isolation tooling will ever lead us back to this situation again, anyone know? It seems to me that strict isolation would be a vital rudimentary requirement for cryofreezing processes...
[-]
- losfair 257 days ago
  You might be looking for CRIU (https://criu.org/) - it works perfectly on the current kernel.
  [-]
  - bjackman 257 days ago
    IIUC this even has logic to reconstitute TCP connections - https://criu.org/TCP_connection
- yamrzou 257 days ago
  A bit of tangent, but reminds me of the Deep Freeze Windows app: https://www.faronics.com/products/deep-freeze
  I wonder if a similar tool exists for Linux.
  [-]
  - yjftsjthsd-h 257 days ago
    It would be easy enough using an overlay file system, but I'm not aware of anybody having nicely packaged it up.
  - q2dg 257 days ago
    Guest user account??
- shanemhansen 256 days ago
  Well at the VM level live migration and vmotion have been around for a while. I've watched a VM get migrated while ping is running without missing a single packet.
  CRIU is used lots of places for Linux processes but in my experience is far more low level and finicky and it tends to do things that require root permissions. It's used in production, but I would be shocked if, for example, someone made it so k8s could just live migrate any pod with CRIU.
  Just think of the possible ways apps that might break if you changed their hostname or pid out from under them. And that's not even including stuff like connections to localhost or shared memory.
- mfashby 257 days ago
  Yeah there is some capability for this, for example https://criu.org
oulipo 257 days ago
Is there an equivalent for MacOS ?
[-]
- Alifatisk 257 days ago
  Isn't macos already isolating each app?
  [-]
  - BoingBoomTschak 257 days ago
    Isn't that for graphical apps (.app) only? How do I sandbox ffmpeg I installed via MacPorts, for example?
    [-]
    - Alifatisk 257 days ago
      Oh you're right, I remember something about sandbox-exec but I'm not sure if it's similar to jails.
Alifatisk 257 days ago
So this is like jails for BSD?
hbddvvbj 256 days ago
[flagged]