I have forked this project long ago and have built an online judge utilizing its BPF integration to filter out unwanted syscalls. The fork implements the time/mem usage reporting to satisfy the judge's need and it has improved my knowledge to modern Linux kernels.
There were some rough edges back then, but it had been my go-to tool to run user-provided code in isolation.
One is not like the others - firejail is aimed at more of desktop type applications you interact with, where the others can do so but are more suited for arbitrary workloads.
A parent comment mentions ebpf syscall interception, many end up combining gvisor and nsjail and seccomp.
Me too, for me the ease of use is rather important. NSJail is very easy to use, I am not sure which ones I tried when looking for these tools but some of them were an absolute pain to get going.
Edit: funnily, chatgpt 03-mini tells me nsjail is the second hardest to use (first = systemd) of these...
It's still a reasonable comparison though. The seccomp-bpf is part of nsjail is achieving the same thing, one way to look at it is that Landlock/pledge are just a better implementation for the same approximate feature.
I don't really find it reasonable, landlock type functionality is a tiny subset of what namespace based sandboxing offers. It's like comparing a scanner to authenticate ID cards against a fortified house.
Namespace are very useful to build virtual environments, but I think it's important to keep in mind that they are not designed for sandboxing and don't provide security guarantees (e.g. mount point propagation), nor fine-grained access rights, nor security events (e.g. logs)... which might be OK according to use cases. Also, namespaces increase the attack surface of the kernel (e.g. vulnerabilities that can be reached through user namespaces). That being said, even if Landlock can control the most important filesystem access rights, not all of them are supported yet. New kernel releases bring new Landlock features (e.g. IPC, network control). It takes some time to build a new and safe access control system but we'll get there!
A few decades back we had the ability to cryogenically freeze processes, save them to storage, move the bins to another system, and defrost them to be run again. This was a great feature that I had hoped would make its way into mainstream kernels, but it seems to have disappeared off the face of the earth.
I wonder if the expansion of process isolation tooling will ever lead us back to this situation again, anyone know? It seems to me that strict isolation would be a vital rudimentary requirement for cryofreezing processes...
Well at the VM level live migration and vmotion have been around for a while. I've watched a VM get migrated while ping is running without missing a single packet.
CRIU is used lots of places for Linux processes but in my experience is far more low level and finicky and it tends to do things that require root permissions. It's used in production, but I would be shocked if, for example, someone made it so k8s could just live migrate any pod with CRIU.
Just think of the possible ways apps that might break if you changed their hostname or pid out from under them. And that's not even including stuff like connections to localhost or shared memory.
There were some rough edges back then, but it had been my go-to tool to run user-provided code in isolation.
https://github.com/NeoHOJ/nsjail
- nsjail
- firejail
- bubblewrap
- runc
etc.
A cursory look at NSjail tells me its filesystem stuff is less granular than bwrap's bind mounting.
Firejail can't handle : in some paths (at all, no escaping provided) which made me dump it.
This doesn't match my experience. For example, the following works just fine in a profile file:
Can you give an example of what you had problems with?cf https://github.com/netblue30/firejail/issues/4614, https://github.com/netblue30/firejail/blob/master/src/fireja... and https://github.com/netblue30/firejail/blob/master/src/lib/co...
https://gitlab.exherbo.org/sydbox/sydbox
A parent comment mentions ebpf syscall interception, many end up combining gvisor and nsjail and seccomp.
Edit: funnily, chatgpt 03-mini tells me nsjail is the second hardest to use (first = systemd) of these...
I wonder if the expansion of process isolation tooling will ever lead us back to this situation again, anyone know? It seems to me that strict isolation would be a vital rudimentary requirement for cryofreezing processes...
I wonder if a similar tool exists for Linux.
CRIU is used lots of places for Linux processes but in my experience is far more low level and finicky and it tends to do things that require root permissions. It's used in production, but I would be shocked if, for example, someone made it so k8s could just live migrate any pod with CRIU.
Just think of the possible ways apps that might break if you changed their hostname or pid out from under them. And that's not even including stuff like connections to localhost or shared memory.