CapROS: Capability-Based Reliable Operating System

(capros.org)

102 points | by gjvc 22 hours ago

11 comments

mfedderly 20 hours ago
I had the privilege of taking two classes with Dr Shapiro while I was in undergrad. The second class revolved around a related operating system named Coyotos. One of the most memorable classes was a 3 hour session where we worked through the boot sequence step by step [1]. The single lecture helped us all appreciate the delicate dance to bring up an x86 processor, a history lesson in the various features that had been bolted onto x86 over time, and a bunch of helpful debugging tips when your options are limited (it prints "Co" "yo" and "tos" in different stages!).
This was easily one of my most memorable lectures from undergrad, and it really helped to show me that even your operating system is just more software that you can read and understand.
1. https://github.com/vsrinivas/coyotos/blob/c68719b851e253aa11...
[-]
- ajb 24 minutes ago
  Dr Shapiro has "open to work" on his LinkedIn right now, FWIW. Don't know what kind of work he's interested in today.
  I followed his work on bitc for a while (it was his alternative to rust).
- ryanjshaw 18 hours ago
  I was a nerdy kid living in the middle of nowhere in Africa. I think we’d had dialup for about 2 years at that, and I emailed him with some questions about how to understand the mathematical notations used in his EROS work. He was very kind and helpful in his response, even though my questions were probably very naive.
- kragen 20 hours ago
  Coyotos and CapROS are two continuations of EROS.
  [-]
  - naasking 8 hours ago
    CapROS is a literal successor taking the EROS source code and continuing development. Coyotos is more of a spiritual successor, a redesign of the core OS but in the same spirit.
kragen 21 hours ago
Seems like Charlie hasn't been merging pull requests in three years: https://github.com/capros-os/capros
And the list has been idle since then: https://sourceforge.net/p/capros/mailman/capros-devel/
I wonder if something has happened to him? I hope he's okay.
btilly 20 hours ago
The fact that we went with access control lists instead of true capabilities has long been a disappointment to me.
For people who understand OO, capabilities are the simplest model in the world. You hand out objects. You can call methods on the object. What that method call has access to depends on the permissions on the object, not your permissions. Entire classes of security mistakes (most notably the "confused deputy" become impossible.
The only commercial success that was a true capability system was the AS/400. Not coincidently, single stand alone machines averaged 99.99%-99.999% uptime. And it never had a significant security compromise. (Individual systems did, of course, have problems due to weak passwords and poor configuration. But they were still remarkably resistant.
Capability systems work so well that when people wanted to improve security on Linux, they called it capabilities. Even though it wasn't.
Unfortunately, the world went with ACLs. That's baked in to the design of things like Windows and POSIX. Which means that all of the consumer software out there expects ACLs. In order to get them to run on a pure capability system, you have to do things like create a POSIX subsystem. At which point, you've just thrown away the whole reason to use capabilities in the first place.
[-]
- Findecanor 17 hours ago
  The big problem is that you'd need to be able to change permissions over time. With ACLs that is simple and direct: if you have the access right, you just change the ACL. Traditional capabilities last forever, unless there is some sort of support for revoking already issued capabilities, and those mechanisms tend are far from straightforward.
  Some systems have revocation as a core feature, but a cascading revocation (every delegation as a branch in a tree, and revoke a whole subtree of delegated capabilities) is often complex and takes time, especially if they are on disk. There have also been protocols (for EROS-like OS:es) for setting up systems with additional capabilities to revoke individual capabilities but they are even more complex IMHO. So, in most capability systems the only way to revoke capabilities to a resource is to remove the resource itself.
  In CHERI, where every pointer is a capability, revocation of capabilities into a memory object relies on what is effectively a parallel garbage collector process that finds all pointers to revoked objects and overwrites them with an invalid pointer that traps on use. [0]
  In the fantasy OS of my mind, ACLs have instead been promoted to "access-control trees" that include a "grant option", allowing a user to grant the permission she has to someone else. But once the first user's permissions are revoked, the sub-tree of re-granted permissions get revoked as well. I think that could be achieved with existing file systems ACLs, with added topology info and enforcement by the OS. Then actual capabilities would be created first when a file is opened, as file handles, but unlike Unix file handles they could be revoked, be revoked in a cascading manner, and revoked automatically if the underlying ACT gets changed.
  Authorization Certificates (as in X.509) are a type of distributed cryptographic capabilities, but require complex distribution of "revocation lists". In recent years, there new types of distributed "authorization tokens" have been introduced such as e.g. "Biscuits" [1].
  [0] https://www.semanticscholar.org/paper/Cornucopia-Reloaded%3A...
  [1] https://www.biscuitsec.org
  [-]
  - naasking 8 hours ago
    > Traditional capabilities last forever, unless there is some sort of support for revoking already issued capabilities, and those mechanisms tend are far from straightforward.
    Revocation is very straightforward in EROS/CapROS and previous systems: it's just incrementing a version number on the capability target. Since the new version number doesn't match any existing capabilities, all of those capabilities are effectively revoked. Revocation is really a non-issue, it's been solved since the 1970s.
    [-]
    - Findecanor 5 hours ago
      You're missing the problem. With OS/object capabilities, you'd want to revoke only some (and those derived from them), and keep the rest in place. Otherwise they would not be a viable alternative to ACLs.
      For pointers-as-capabilities, and version counter as protection against use-after-free, you can't assign it or the object ID too many bits because you don't want to make the size of pointers unwieldy. I've read articles of such systems that use random numbers or encrypted counters to get more randomness but at the end of the day, the safety is still only probabilistic.
      [-]
      - naasking 3 hours ago
        > Otherwise it would not be a viable alternative to ACLs.
        I just want to point out that this is vastly overblown IMO. Typical ACL systems are just organized differently than capability systems that follow POLA. In the former, permissions to a number of objects are widely shared across many different subjects, and so fine-grained grant and revocation for subjects seem natural. It thus also seems natural to think that you will need fine-grained revocation in a capability system, but that's typically not true, because there is virtually no system-wide sharing of any objects in comparable fashion to ACL systems.
        In POLA and capability systems, any given object is almost always only reachable by one or two other objects, and of course this must be the case because otherwise it wouldn't be POLA! The need for fine-grained revocation is thus practically nil, and for those rare occasions that you do want some kind of fine-grained revocation, proxy and membrane patterns enable this.
        > It is also not "solved" problem for capabilities-as-pointers.
        I'm not clear on what you think isn't solved exactly. KeyKOS solved this issue in the 1970s.
        > I've read articles of systems using random numbers or an encrypted counter to get more randomness but at the end of the day, the safety is still only probabilistic.
        To be clear, cryptographic capabilities are not the same as object capabilities.
        [-]
        Findecanor 3 hours ago
        The counter in a pointer is still a kind of key, even if not cryptographic. What I mean is the risk of hitting the limit of number of available counter bits.
        Some approaches on this take the probabilistic route and reuse counter values. Others invalidate the object ID when the counter wraps. I've myself designed a system that did the latter, but I think that was viable in that case only because the churn was expected to be extremely low. But for arbitrary pointers in arbitrary programs you can not make such an assumption.
  - LoganDark 12 hours ago
    > Traditional capabilities last forever, unless there is some sort of support for revoking already issued capabilities, and those mechanisms tend are far from straightforward.
    Capabilities don't have to hold the actual permission to access the object. Capabilities can simply hold a provenance that can be used to verify the source of the access. If that access is then revoked from that source, the capability doesn't need to change at all. This is similar to how generational arenas work in some game engines, IMO.
    AFAIK Android performs something similar to this with the storage URLs that are provided to apps, which will be different depending on which picker provided the file/media, etc. Apple probably also does something similar, but I'd imagine with objects rather than strings.
    [-]
    - bheadmaster 11 hours ago
      > Capabilities don't have to hold the actual permission to access the object. Capabilities can simply hold a provenance that can be used to verify the source of the access. If that access is then revoked from that source, the capability doesn't need to change at all.
      Which complicates the initial premise that
      > capabilities are the simplest model in the world. You hand out objects. You can call methods on the object. What that method call has access to depends on the permissions on the object, not your permissions.
      Which is exactly what the parent said. Capabilities sound simple at first, but require complex machinery to work.
    - naasking 8 hours ago
      > Capabilities can simply hold a provenance that can be used to verify the source of the access. If that access is then revoked from that source, the capability doesn't need to change at all
      This is basically using access control lists to mimic a capability system [1]. The capability folks did something similar in "Polaris", their layer atop Windows XP that enforced principle of least authority by default. If only MS had taken that and run with it.
      [1] A Distributed Capability Computing System (DCCS), http://www.webstart.com/jed/papers/DCCS/
      [2] Polaris: Virus-Safe Computing For Windows XP, https://cacm.acm.org/research/polaris-2/
      [-]
      - btilly 7 hours ago
        How equivalent this is to ACLs depends on what "provenance" means here.
        One of the strategies with capabilities is that I do not hand you the capability that I own to X. Instead, I create a proxy Y that can make requests of X, and then hand you the capability to make requests of Y.
        If I later stop Y, you lose access.
        This can be viewed as a kind of provenance. The history of how the access came to be is reflected in the actual capability. The downside, obviously, is that we've added overhead. But this strategy can allow us to do a number of interesting things. Like split an existing capability into multiple finer grained ones.
        [-]
        naasking 7 hours ago
        "Construct proxy" need not be a primitive in the core system though. If message passing is the only means of communication, then interposition to create facets or attenuate permissions naturally follows. This works with ACLs too. All you need to do is restrict the rights amplification authority to discriminate what a capability actually points to, but this is a rights amplification operation and so should itself be a capability that's closely held. DCCS did this right, IIRC.
        [-]
        btilly 6 hours ago
        Yes. Just as we can build an ACL on a capability system, we can build a capability system on an ACL.
        But this approach is more natural in a capability system. You have to write software differently for dealing with "I got permission through an ACL" versus "I got information through a capability". So when the default expectation is, "I get a capability," the right abstraction is already there for "...and this capability has something more behind it."
- mikewarot 7 hours ago
  The thing that worries me about WASM is this exact conflict between compatibility with ACLs and security. It's like handing over your banking account authorization for every possible financial transaction, even if all you want to do is buy an ice cream cone. In the real world, capabilities based system, (aka a wallet with cash) you hand them a $5 bill, and wait for change.
  [-]
  - btilly 7 hours ago
    You mean handing over everything needs to drain your bank account, like a check, instead of handing over a number that can be used to deposit money in your account, like the giro system that was developed in Amsterdam.
    After seeing the giro system in practice, I couldn't believe that we still use checks...
- ryanjshaw 18 hours ago
  It’s bizarre to me that not one megawealthy tech nerd has thrown 8 figures at some smart people in an attempt to solve the capabilities-based OS UX problem. The payoff would be remarkable.
  [-]
  - csrse 15 hours ago
    I guess Fuchsia is an attempt at a capability-based OS with wider appeal. The architecture seems interesting, wish there was quicker progress on it.
    [-]
    - surajrmal 7 hours ago
      It's honestly moving fast. OS are just complicated these days and it takes a long time to get to parity on all fronts.
  - Jyaif 10 hours ago
    The tricky part is not doing the capability-based OS, it's getting adoption.
    Linux is good enough, so a slightly better OS is not going to cut it.
- EGreg 20 hours ago
  I guess you must really love Capnproto then: https://github.com/iguazio/go-capnproto2
  [-]
  - btilly 19 hours ago
    Not sure why you think that my opinions about operating systems would predict my opinions about an RPC system.
  - ocdtrekkie 18 hours ago
    You may want to change this link, this is an extremely old fork of the Go capnp implementation. It's neither official nor current!
    I'd recommend just pointing to capnproto.org
Hexayurt 7 hours ago
Waterken was the same kind of logic, applied at web API scale.
https://shiftleft.com/mirrors/www.hpl.hp.com/techreports/201...
The failure of this system and the HP ESpeak system are what left the gap which the blockchain smart contract model filled.
I have complex thoughts about that.
[-]
- Hexayurt 7 hours ago
  Specifically: a globally visible distributed database is a fantastic resource for managing namespaces, as demonstrated by DNS and SSL Certificate Authorities.
  But when we start essentially doing _transactions_ by writing into such a database, it starts to look like buying a domain name every time you want to make a credit card payment.
  There is an architectural problem here.
pyrolistical 20 hours ago
https://en.wikipedia.org/wiki/Capability-based_security
It’s like sharing google doc link. You configure the link to be read only or read/write.
Now imagine you can create as many links as you want with all possible permission combinations. Then you have a capability based system
ahlCVA 15 hours ago
There is also a relatively modern capability-based kernel in the L4 family of microkernels, called Fiasco.OC: https://os.inf.tu-dresden.de/fiasco/overview.html
There are also a bunch of components for building a functional userspace (such as L4Re or Genode).
[-]
- NooneAtAll3 13 hours ago
  what does L4 mean here?
  [-]
  - sirwhinesalot 13 hours ago
    L4 was a microkernel design by Jochen Liedtke (RIP). It was notable for proving that microkernels can perform much better than was thought at the time (L4 performed 20x better than the Mach microkernel).
    The work was so influential it got the ACM SIGOPS Hall of Fame Award in 2015. A whole family of microkernels based on that original design have since been developed, hence the "L4 microkernel family".
  - unwind 13 hours ago
    It's a family of microkernels.
    https://en.wikipedia.org/wiki/L4_microkernel_family
retrac 16 hours ago
I've written a little bit before about KeyKOS/GNOSIS, which is the capability operating system used by Tymshare to host their timesharing language services on IBM mainframes, in the 70s and 80s. From a comment 3 years ago I'll just repost the relevant part:
> KeyKOS (developed by Tymshare for their commercial computing services in the 1970s) - A capability operating system. If everything in UNIX was a file, then everything in KeyKOS was a memory page and capabilities (keys) to access those pages. The kernel has no state that isn't calculated from values in the virtual memory storage. The system snapshots the virtual memory state regularly. There are subtle consequences from this. Executing processes are effectively memory-mapped files that constantly rewrite themselves, with only the snapshots being written out. Snapshotting the virtual memory state of the system snapshots everything -- including the state of running processes. There's no need for a file system, just a means to map names to sets of pages, which is done by an ordinary process. After a crash, processes and their state are internally consistent, and continue running from their last snapshot. For those who are intrigued, there's a good introduction, written in 1979, by the system's designers available here: http://cap-lore.com/CapTheory/upenn/Gnosis/Gnosis.html (It was GNOSIS before being renamed KeyKOS.) And a later document written in the 90s aimed at UNIX users making the case: http://cap-lore.com/CapTheory/upenn/NanoKernel/NanoKernel.ht... Some work on capability systems continues, but it seems the lessons learned have largely been forgotten.
The core abstraction is simpler than the Unix process model or that of many other operating systems. Processes have keys which access virtual memory pages. All of storage including persistent secondary storage is just one big pool of virtual memory pages. These can be shared between processes. That's all that's necessary to implement things like filesystems and networking which are often thought to require special handling. A filesystem is just names and addresses of pages in storage. Give a process a capability to do shared memory with a process that maintains such a structure. I find the emphasis on minimizing process and kernel state, such that processes can be snapshot and frozen at any time and are inherently persistent, handled as the set of the relevant pages, to be genius. Though the architecture does have the classic microkernel/nanokernel performance penalties, as have been long debated.
iberator 17 hours ago
Intel did this is 1989 with iAPX 432. Super interesting and SUPER complex (just check out the documentation of cpu architecture), that's it failed hard.
Flat memory model always win vs Star Trek like architecture who bo one understands
[-]
- gnufx 9 hours ago
  1970s-ish capability systems with support in hardware/firmware include CAP, Flex, System/38, Plessey System 250 (which a former colleague worked on) -- the last two commercial; see https://en.wikipedia.org/wiki/Capability-based_security.
  I'd like to think their time has come, given vulnerabilities I see.
silasdavis 16 hours ago
Most of the links seem to be broken on https://www.capros.org/overview.html
mikewarot 21 hours ago
Why is it that every Capability based system seems to be a toolkit for running a single program instead of an OS ready for daily use? Is it just me?
[-]
- kragen 21 hours ago
  It's just you. seL4, CheriBSD, etc., do not fit your description. Neither did KeyKOS itself. You're presumably looking at research prototypes.
  [-]
  - ratmice 20 hours ago
    I'd also note capros doesn't fit that description either. I don't know that there were examples that ran more than a single process.
    That's probably not true, for anything relying on drivers since user mode drivers are basically processes there... but in the way that people might think of a process.
    [-]
    - kragen 20 hours ago
      I mean, there isn't exactly a thriving ecosystem of existing software built for CapROS. Right now I don't think anybody even has CapROS itself building.
      The problem has gotten a lot easier since the EROS days, thanks to Xen, QEMU, UEFI (?), and the explosion of cheap hardware, but it looks like maybe Charlie got sick or lost interest or something?
      [-]
      - ratmice 20 hours ago
        Yeah, I did see a email on a capabilities list from him about him no longer working on it because of lack of feedback & wanting to just enjoy his retirement. That was the impression I got.
        When he had resumed his work on it, I personally had been going through a back injury. I still feel bad that I didn't get a chance to contribute any of the hardware ports and software I wrote for it.
        [-]
        kragen 20 hours ago
        Hmm, do you know when?
        [-]
        ratmice 19 hours ago
        I wasn't able to google it, or find a public link to the email (but it was posted on a public list) so here is some relevant snippets from it.
        Nov 20 2022 titled CapROS status
        "When I retired a year ago I hoped to correct some of those issues, but I want to enjoy retirement and not just have a full-time unpaid job.", ...
        "I am considering just abandoning CapROS. I believe there are some useful ideas in the system, but so far no one seems to have known or cared about them."
        [-]
        ryukafalz 19 hours ago
        Since it is a public list, here's the link: https://groups.google.com/g/cap-talk/c/Box4XXhSevw/m/18pUqAQ...
        He posted on the list recently too if folks were worried: https://groups.google.com/g/cap-talk/c/XCBwf-zpJWA/m/6CWsNA-...
- wmf 20 hours ago
  A lot of OS projects develop the kernel then run out of steam. It's especially hard for capabilities because there's no established standard like Unix/Posix to copy. Capability OSes are still a research topic.
- spencerflem 20 hours ago
  Check out Genode Sculpt for a vision of a workable desktop !
  It’s capable of dynamic flows, adding and removing programs, has ports of Chromium and Virtual Box. The devs daily drive it :)
- naasking 7 hours ago
  Capability-based operating systems are sufficiently dissimilar to standard ACL operating systems that ordinary software cannot be directly ported without losing some or many of the capability advantages. Furthermore, they are typically very security focused, and so they they've spent a lot of time researching security-focused interfaces and idioms for end users, rather than just re-implementing the hodge-podge of poorly thought out user interfaces that seem to reintroduce the same security vulnerabilities again and again, eg. CSRF is just the "confused deputy" attack known since the 1980s.
  I suggest reading some of their stuff [1], it's pretty interesting and accessible.
  [1] The EROS Trusted Window System, https://srl.cs.jhu.edu/pubs/SRL2003-05.pdf
contrarian1234 20 hours ago
Most of my "wtf is going on" moments on Linux have to do with permissions. I loath the industry move to even more security. I want a more Emacs-like experience. Multiuser systems have become the exception and most people have a personal computer with one user. Dealing with evil apps is a loosing battle b/c the attack surface is too large.
I think the counter argument to more security is Distro Repos. When was the last time you apt-get'ed some software and had your documents stolen?
If you add blocks then you need to somehow communicate to the use when it's failing and that's hard... You see the shitshow that is Android security where apps have mysterious access to some directories and not others and it's impossible to understand what's going on. Maybe capabilities will work better, it's unclear to me.
[-]
- iberator 17 hours ago
  Just link statically compiled emacs into /sbin/init and you are done
- krautburglar 19 hours ago
  Absolutely! Most of it is there to protect their moats from us, not us from “hackers”.