Unfashionably secure: why we use isolated VMs

(blog.thinkst.com)

305 points | by mh_ 84 days ago

30 comments

  • PedroBatista 84 days ago
    As a permanent "out of style" curmudgeon in the last ~15 years, I like that people are discovering that maybe VMs are in fact the best approach for a lot of workloads and the LXC cottage industry and Docker industrial complex that developed around solving problems created by themselves or solved decades ago might need to take a hike.

    Modern "containers" were invented to make things more reproducible ( check ) and simplify dev and deployments ( NOT check ).

    Personally FreeBSD Jails / Solaris Zones are the thing I like to dream are pretty much as secure as a VM and a perfect fit for a sane dev and ops workflow, I didn't dig too deep into this is practice, maybe I'm afraid to learn the contrary, but I hope not.

    Either way Docker is "fine" but WAY overused and overrated IMO.

    • compsciphd 84 days ago
      As the person who created docker (well, before docker - see https://www.usenix.org/legacy/events/atc10/tech/full_papers/... and compare to docker), I argued that it wasn't just good for containers, but could be used to improve VM management as well (i.e. a single VM per running image - seehttps://www.usenix.org/legacy/events/lisa11/tech/full_papers...)

      I then went onto built a system with kubernetes that enabled one to run "kubernetes pods" in independent VMs - https://github.com/apporbit/infranetes (as well as create hybrid "legacy" VM / "modern" container deployments all managed via kubernetes.)

      - as a total aside (while I toot my own hort on the topic of papers I wrote or contributed to), note the reviewer of this paper that originally used the term Pod for a running container - https://www.usenix.org/legacy/events/osdi02/tech/full_papers... - explains where Kubernetes got the term from.

      I'd argue that FreeBSD Jails / Solaris Zones (Solaris Zone/ZFS inspired my original work) really aren't any more secure than containers on linux, as they all suffer from the same fundamental problem of the entire kernel being part of one's "tcb", so any security advantage they have is simply due lack of bugs, not simply a better design.

      • bombela 84 days ago
        > As the person who created docker (well, before docker - see https://www.usenix.org/legacy/events/atc10/tech/full_papers/... and compare to docker)

        I picked the name and wrote the first prototype (python2) of Docker in 2012. I had not read your document (dated 2010). I didn't really read English that well at the time, I probably wouldn't have been able to understand it anyways.

        https://en.wikipedia.org/wiki/Multiple_discovery

        More details for the curious: I wrote the design doc and implemented the prototype. But not in a vacuum. It was a lot work with Andrea, Jérôme and Gabriel. Ultimately, we all liked the name Docker. The prototype already had the notion of layers, lifetime management of containers and other fundamentals. It exposed an API (over TCP with zerorpc). We were working on container orchestration, and we needed a daemon to manage the life cycle of containers on every machine.

        • compsciphd 84 days ago
          I'd note I didn't say you copied it, just that I created it first (i.e. "compare paper to docker". also, as you note, its possible someone else did it too, but at least my conception got through academic peer-review / patent office, yeah, there's a patent, never been attempted to be enforced though to my knowledge).

          when I describe my work (I actually should have used quotes here), I generally give air quotes when saying it, or say "proto docker", as it provides context for what I did (there's also a lot of people who view docker as synonymous with containerization as a whole, and I say that containers existed way before me). I generally try to approach it humbly, but I am proud that I predicted and built what the industry seemingly needed (or at least is heavily using).

          people have asked me why I didn't pursue it as a company, and my answer is a) I'm not much of an entrepreneur (main answer), and b) I felt it was a feature, not a "product", and would therefore only really profitable for those that had a product that could use it as a feature (which one could argue that product turned out to be clouds, i.e. they are the ones really making money off this feature). or as someone once said a feature isn't necessarily a product and a product isn't necessarily a company.

          • bombela 84 days ago
            I understood your point. I wanted to clarify, and in some ways connect with you.

            At the time, I didn't know what I was doing. Maybe my colleagues did some more, but I doubt that. I just wanted to stop waking up at night because our crappy container management code was broken again. The most brittle part was the lifecycle of containers (and their filesystem). I recall being very adamant about the layered filesystem, because it allowed to share storage and RAM across running (containerized) processes. This saves in pure storage and RAM usage, but also in CPU time, because the same code (like the libc for example) is cached across all processes. Of course this only works if you have a lot of common layers. But I remember at the time, it made for very noticeable savings. Anyways, fun tidbits.

            I wonder how much faster/better it would have been if inspired by your academic research. Or maybe not knowing anything made it so we solved the problems at hand in order. I don't know. I left the company shortly after. They renamed to Docker, and made it what it is today.

            • chatmasta 84 days ago
              I like to say that Docker wouldn’t exist if the Python packaging and dependency management system weren’t complete garbage. You can draw a straight line from “run Python” to dotCloud to Docker.

              Does that jive with your experience/memory at all? How much of your motivation for writing Docker could have been avoided if there were a sane way to compile a Python application into a single binary?

              It’s funny, this era of dotCloud type IaaS providers kind of disappeared for a while, only to be semi-revived by the likes of Vercel (who, incidentally, moved away from a generic platform for running containers, in favor of focusing on one specific language runtime). But its legacy is containerization. And it’s kind of hard to imagine the world without containers now (for better or worse).

              • bombela 82 days ago
                I do not think the mess of dependency management in Python got us to Docker/containers. Rather Docker/containers standardized deploying applications to production. Which brings reproducibility without having to solve dependency management.

                Long answer with context follows.

                I was focused on container allocation and lifecycle. So my experience, recollection, and understanding of what we were doing is biased with this in mind.

                dotCloud was offering a cheaper alternative to virtual machines. We started with pretty much full Linux distributions in the containers. I think some still had a /boot with the unused Linux kernel in there.

                I came to the job with some experience testing deploying Linux at scale quickly by preparing images with chroot before making a tarball to then distribute over the network (via multicast from a seed machine) with a quick grub update. This was for quickly installing Counter-Strike servers for tournament in Europe. In those days it was one machine per game server. I was also used to run those tarball as virtual machines for throughout testing. To save storage space on my laptop at the time, I would hard-link together all the common files across my various chroot directories. I would only tarball to ship it out.

                It turned out my counter-strike tarballs from 2008 would run fine as containers in 2011.

                The main competition was Heroku. They did not use containers at the beginning. And they focused on running one language stack very well. It was Ruby and a database I forget.

                At dotCloud we could run anything. And we wanted to be known serving everything. All your languages, not just one. So early on we started offering base images ready made for specific languages and database. It was so much work to support. We had a few base images per team member to maintain, while still trying to develop the platform.

                The layered filesystem was to pack ressources more efficiently on our servers. We definitely liked that it saved build time on our laptop when testing (we still had small and slow spinning disks in 2011).

                So I wouldn't say that Docker wouldn't exist without the mess of dependency management in software. It just happened to offer a standardized interface between application developers, and the person running it in production (devops/sre).

                The fact you could run the container on your local (Linux) machine was great for testing. Then people realized they could work around dependency hell and non reproducible development environment by using containers.

            • compsciphd 84 days ago
              they did it "simpler", i.e. academic work has to be "perfect" in a way a product does not. so (from my perspective), they punted the entire concept of making what I would refer to as a "layer aware linux distribution" and just created layers "on demand" (via RUN syntax of dockerfiles).

              From an academic perspective, its "terrible", so much duplicate layers out in the world, from a practical perspective of delivering a product, it makes a lot of sense.

              It's also simpler from the fact that I was trying to make it work for both what I call "persistent" containers (ala pets in the terminology) that could be upgraded in place and "ephemeral" containers (ala cattle) when in practice the work to enable upgrading in place (replacing layers on demand) to upgrade "persistent" containers I'm not sure is that useful (its technologically interesting, but that's different than useful).

              My argument for this was that this actually improves runtime upgrading of systems. With dpkg/rpm, if you upgrade libc, your systems is actually temporarily in a state where it can't run any applications (in the delta of time when the old libc .so is deleted and the new one is created in its place, or completely overwrites it), any program that attempts to run in that (very) short period time, will fail (due to libc not really existing). By having a mechanism where layers could be swapped in essentially an atomic manner, no delete / overwrite of files occurs and therefore there is zero time when programs won't run.

              In practice, the fact that a real world product came out with a very similar design/implementation makes me feel validated (i.e. a lot of phd work is one offs, never to see the light of day after the papers for it are published).

              • pxc 84 days ago
                > so (from my perspective), they punted the entire concept of making what I would refer to as a "layer aware linux distribution"

                Would you consider there to be any 'layer-aware Linux distributions' today, e.g., NixOS, GuixSD, rpm-ostree-based distros like Fedora CoreOS, or distri?

                > so much duplicate layers out in the world

                Have you seen this, which lets existing container systems understand a Linux package manager's packages as individual layers?

                https://github.com/pdtpartners/nix-snapshotter

                • mananaysiempre 83 days ago
                  (Not GP.)

                  NixOS can share its Nix store with child (systemd-nspawn) containers. That is, if you go all in, package everything using Nix, and then carefully ensure you don’t have differing (transitive build- or run-time) dependency versions anywhere, those dependencies will be shared to the maximum extent possible. The amount of sharing you actually get matches the effort you put into making your containers use the same dependency versions. No “layers”, but still close what you’re getting at, I think.

                  On the other hand, Nixpkgs (which NixOS is built on top of) doesn’t really follow a discipline of minimizing package sizes to the extent that, say, Alpine does. You fairly often find documentation and development components living together with the runtime ones, especially for less popular software. (The watchword here is “closure size”, as in the size of a package and all of its transitive runtime dependencies.)

                  • pxc 83 days ago
                    > On the other hand, Nixpkgs (which NixOS is built on top of) doesn’t really follow a discipline of minimizing package sizes to the extent that, say, Alpine does. You fairly often find documentation and development components living together with the runtime ones, especially for less popular software. (The watchword here is “closure size”, as in the size of a package and all of its transitive runtime dependencies.)

                    Yep. I remember before Nix even had multi-output derivations! I once broke some packages trying to reduce closure sizes when that feature got added, too. :(

                    Besides continuing to split off more dev and doc outputs, it'd be cool if somehow Nixpkgs had a `pkgsForAnts` just like it has a `pkgsStatic`, where packages just disable more features and integrations. On the other hand, by the time you're really optimizing your Nix container builds it's probably well worth it to use overrides and build from source anyway, binary cache be damned.

                • compsciphd 83 days ago
                  I'll try to get back to this to give a proper response, but can't promise.
        • shizcakes 83 days ago
          I’m really confused. Solomon Hykes is typically credited as the creator of Docker. Who are you? Why is he credited if someone else created it?
          • inetknght 83 days ago
            > Why is he credited if someone else created it?

            This is the internet and just about everyone could be diagnosed with Not Invented Here syndrome. First one to get recognition for creating something that's already been created is just a popular meme.

          • bombela 82 days ago
            Solomon was the CEO of dotCloud. Sébastien the CTO. I (François-Xavier "bombela") was one of the first software engineer employee. Along with Andrea, Jérôme and Louis with Sam as our manager.

            When it became clear that we had reached the limits of our existing buggy code, I pushed hard to get to work on the replacement. After what seemed an eternity pitching Solomon, I was finally allowed to work on it.

            I wrote the design doc of Docker with the help of Andrea, Jérôme, Louis and Gabriel. I implemented the prototype in Python. To this day, three of us will still argue who really choose the name Docker. We are very good friends.

            Not long after, I left the company. Because I was under paid, I could barely make ends meet at the time. I had to borrow money to see a doctor. I did not mind, it's the start-up life, am I right? I worked 80h/week happy. But then I realized not everybody was under paid. And other companies would pay me more for less work. When I asked, Solomon refused to pay me more, and after being denied three times, I quit. I never got any shares. I couldn't afford to buy the options anyways, and they had delayed the paperwork multiple times, such that I technically quit before the vesting started. I went to Google, were they showered me with cash in comparison. The next morning after my departure from dotCloud, Solomon raised everybody's salary. My friends took me to dinner to celebrate my sacrifice.

            I am not privy to all the details after I left. But here is what I know. Andrea rewrote Docker in Go. It was made open source. Solomon even asked me to participate as an external contributor. For free of course. As a gesture to let me add my name to the commit history. Probably the biggest insult I ever received in life.

            dotCloud was renamed Docker. The original dotCloud business was sold to a German company for the customers.

            I believe Solomon saw the potential of Docker for all, and not merely an internal detail within a distributed system designed to orchestrate containers. My vision was extremely limited, and focused on reducing the suffering of my oncall duties.

            A side story: the German company transfered to me the trademark of zerorpc, the open source network library powering dotCloud. I had done a lot of work on it. Solomon refused to hand me off the empty github/zerorpc group he was squatting. He offered to grant me access but retain control. I went for github/0rpc instead. I did not have time nor money to spend on a lawyer.

            By this point you might think I have a vendetta against a specific individual. I can assure you that I tried hard to paint things fairly with a flattering light.

          • icedchai 83 days ago
            He means he came up with the same concept, not literally created Docker.
      • sulandor 83 days ago
        > any security advantage they have is simply due lack of bugs, not simply a better design.

        feels like maybe there is some corelation

      • ysnp 84 days ago
        Would you say approaches like gvisor or nabla containers provide more/enough evolution on the security front? Or is there something new on the horizon that excites you more as a prospect?
        • the_duke 84 days ago
          GVisor basically works by intercepting all Linux syscalls, and emulating a good chunk of the Linux kernel in userspace code. In theory this allows lowering the overhead per VM, and more fine-grained introspection and rate limiting / balancing across VMs, because not every VM needs to run it's own kernel that only interacts with the environment through hardware interfaces. Interaction happens through the Linux syscall ABI instead.

          From an isolation perspective it's not more secure than a VM, but less, because GVisor needs to implement it's own security sandbox to isolate memory, networking, syscalls, etc, and still has to rely on the kernel for various things.

          It's probably more secure than containers though, because the kernel abstraction layer is separate from the actual host kernel and runs in userspace - if you trust the implementation... using a memory-safe language helps there. (Go)

          The increased introspectioncapabiltiy would make it easier to detect abuse and to limit available resources on a more fine-grained level though.

          Note also that GVisor has quite a lot of overhead for syscalls, because they need to be piped through various abstraction layers.

          • compsciphd 84 days ago
            I actually wonder how much "overhead" a VM actually has. i.e. a linux kernel that doesn't do anything (say perhaps just boots to an init that mounts proc and every n seconds read in/prints out /proc/meminfo) how much memory would the kernel actually be using?

            So if processes in gvisor map to processes on the underlying kernel, I'd agree it gives one a better ability to introspect (at least in an easy manner).

            It gives me an idea that I'd think would be interesting (I think this has been done, but it escapes me where), to have a tool that is external to the VM (runs on the hypervisor host) that essentially has "read only" access to the kernel running in the VM to provide visibility into what's running on the machine without an agent running within the VM itself. i.e. something that knows where the processes list is, and can walk it to enumerate what's running on the system.

            I can imagine the difficulties in implementing such a thing (especially on a multi cpu VM), where even if you could snapshot the kernel memory state efficiently, it be difficult to do it in a manner that provided a "safe/consistent" view. It might be interesting if the kernel itself could make a hypercall into the hypervisor at points of consistency (say when finished making an update and about to unlock the resource) to tell the tool when the data can be collected.

            • stacktrust 84 days ago
              https://github.com/Wenzel/pyvmidbg

                LibVMI-based debug server, implemented in Python. Building a guest aware, stealth and agentless full-system debugger.. GDB stub allows you to debug a remote process running in a VM with your favorite GDB frontend. By leveraging virtual machine introspection, the stub remains stealth and requires no modification of the guest.
              
              more: https://github.com/topics/virtual-machine-introspection
            • eru 83 days ago
              > I actually wonder how much "overhead" a VM actually has. i.e. a linux kernel that doesn't do anything (say perhaps just boots to an init that mounts proc and every n seconds read in/prints out /proc/meminfo) how much memory would the kernel actually be using?

              You don't necessarily need to run a full operating system in your VM. See eg https://mirage.io/

            • ecnahc515 84 days ago
              > I actually wonder how much "overhead" a VM actually has. i.e. a linux kernel that doesn't do anything (say perhaps just boots to an init that mounts proc and every n seconds read in/prints out /proc/meminfo) how much memory would the kernel actually be using?

              There's already some memory sharing available using DAX in Kata Containers at least: https://github.com/kata-containers/kata-containers/blob/main...

            • xtacy 84 days ago
              > to have a tool that is external to the VM (runs on the hypervisor host) that essentially has "read only" access to the kernel running on the VM to provide visibility into what's running on the machine without an agent running within the VM itself

              Not quite what you are after, but comes close ... you could run gdb on the kernel in this fashion and inspect, pause, step through kernel code: https://stackoverflow.com/questions/11408041/how-to-debug-th....

            • XorNot 84 days ago
              What I really want is a "magic" shell on a VM - i.e. the ability using introspection calls to launch a process on the VM which gives me stdin/stdout, and is running bash or something - but is just magically there via an out-of-band mechanism.
              • compsciphd 84 days ago
                Not really "out of band", but many VMs allow you to setup a serial console, which is sort of that, albeit with a login, but in reality, could create one without one, still have to go through hypervisor auth to access it in all cases, so perhaps good enough for your case?
                • blipvert 83 days ago
                  Indeed, easy enough to get a serial device on Xen.

                  Another possibility could be to implement a simple protocol which uses the xenstore key/value interface to pass messages between host and guest?

                • bbarnett 83 days ago
                  You can launch KVM/qemu with screen + text console, and just log in there. You can also configure KVM to have a VNC session on launch, and that ... while graphical, is another eye into the console + login.

                  (Just mentioning two ways without serial console to handle this, although serial console would be fine.)

          • fpoling 83 days ago
            Go is not memory safe even when the code has none of unsafe blocks, although through a typical usage and sufficient testing memory safety bugs are avoided. If one needs truly memory-safe language, then use Rust, Java, C# etc.
          • actionfromafar 83 days ago
            This sounds vaguely like the forgotten Linux a386 (not i386!).
        • compsciphd 84 days ago
          been out of the space for a bit (though interviewing again, so might get back into it), gvisor at least as the "userspace" hypervisor, seemed to provide minimal value vs modern hypervisor systems with low overhead / quick boot VMs (ala firecracker). With that said, I only looked at it years ago, so I could very well be out of date on it.

          Wasn't aware of Nabla, but they seem to be going with the unikernel approach (based on a cursory look at them). Unikernels have been "popular" (i.e. multiple attempts) in the space (mostly to basically run a single process app without any context switches), but it creates a process that is fundamentally different than what you develop and is therefore harder to debug.

          while the unikernels might be useful in the high frequency trading space (where any time savings are highly valued), I'm personally more skeptical of them in regular world usage (and to an extent, I think history has born this out, as it doesn't feel like any of the attempts at it, has gotten real traction)

          • eyberg 84 days ago
            I think many people (including unikernel proponents themselves) vastly underestimate the amount of work that goes into writing an operating system that can run lots of existing prod workloads.

            There is a reason why Linux is over 30 years old and basically owns the server market.

            As you note, since it's not really a large existing market you basically have to bootstrap it which makes it that much harder.

            We (nanovms.com) are lucky enough to have enough customers that have helped push things forward.

            For the record I don't know of any of our customers or users that are using them for HFT purposes - something like 99% of our crowd is on public cloud with plain old webapp servers.

          • tptacek 84 days ago
            Modern gVisor uses KVM, not ptrace, for this reason.
            • compsciphd 84 days ago
              so I did a check, it would seem that gvisor with kvm, mostly works for bare metal, not on existing VMs (nested virtualization).

              https://gvisor.dev/docs/architecture_guide/platforms/

              "Note that while running within a nested VM is feasible with the KVM platform, the systrap platform will often provide better performance in such a setup, due to the overhead of nested virtualization."

              I'd argue then for most people (unless have your own baremetal hyperscaler farm), one would end up using gvisor without kvm, but speaking from a place of ignorance here, so feel free to correct me.

      • birdiesanders 83 days ago
        My infra is this exactly. K8s managed containers that manage qemu VMs. Every VM has its own management environment, they don’t ever see each other, and they work just the same as using virtual-manager, but I get infinite flexibility in my env provisioning before I start a VM that gets placed in its tenant network that is isolated.
      • ignoramous 83 days ago
        > I argued that it wasn't just good for containers, but could be used to improve VM management as well (i.e. a single VM per running image

        Believe Google embarked on this path with Crostini for ChromiumOS [0], but now it seems like they're going to scale down their ambitions in favour of Android [1]. Crostini may not but looks like the underlying VMM (crosvm) might live on [2].

        > I'd argue that FreeBSD Jails / Solaris Zones (Solaris Zone/ZFS inspired my original work) really aren't any more secure than containers on linux, as they all suffer from the same fundamental problem of the entire kernel being part of one's "tcb", so any security advantage they have is simply due lack of bugs, not simply a better design.

        Jails (or an equivalent concept/implementation) come in handy where the Kernel/OS may want to sandbox higher privilege services (like with minijail in ChromiumOS [3]).

        [0] https://www.youtube.com/watch?v=WwrXqDERFm8&t=300 / summary: https://g.co/gemini/share/41a794b8e6ae (mirror: https://archive.is/5njY1)

        [1] https://news.ycombinator.com/item?id=40661703

        [2] https://source.android.com/docs/core/virtualization/virtuali...

        [3] https://www.chromium.org/chromium-os/developer-library/guide...

      • cryptonector 83 days ago
        > I'd argue that FreeBSD Jails / Solaris Zones [...] really aren't any more secure than containers on linux, as they all suffer from the same fundamental problem of the entire kernel being part of one's "tcb", [...]

        And also CPU branch prediction state, RAM chips, etc. The side-channels are legion.

    • topspin 84 days ago
      Isn't this discussion based on a false dichotomy? I, too, use VMs to isolate customers, and I use containers within those VMs, either with or without k8s. These tools solve different problems. Containers solve software management, whereas VMs provide a high degree of isolation.

      Container orchestration is where I see the great mistake in all of this. I consider everything running in a k8s cluster to be one "blast domain." Containers can be escaped. Faulty containers impact everyone relying on a cluster. Container orchestration is the thing I believe is "overused." It was designed to solve "hyper" scale problems, and it's being misused in far more modest use cases where VMs should prevail. I believe the existence of container orchestration and its misapplication has retarded the development of good VM tools: I dream of tools that create, deploy and manage entire VMs with the same ease as Docker, and that these tools have not matured and gained popularity because container orchestration is so easily misapplied.

      Strongly disagree about containers and dev/deployment ("NOT check"). I can no longer imagine development without containers: it would be intolerable. Container repos are a godsend for deployment.

      • rodgerd 84 days ago
        > Container orchestration is the thing I believe is "overused." It was designed to solve "hyper" scale problems, and it's being misused in far more modest use cases where VMs should prevail.

        As a relatively early corporate adopter of k8s, this is absolutely correct. There are problems where k8s is actually easier than building the equivalent capability elsewhere, but a lot of uses it's put to seem to be driven more by a desire to have kubernetes on one's resume.

        • topspin 84 days ago
          For what k8s was designed to do -- herding vast quantities of ephemeral compute resources across a global network -- it's great. That's not my problem with it. My problem is that by being widely misapplied it has stunted the development of good solutions to everything else. K8s users spend their efforts trying to coax k8s to do things it was never intended to do, and so the k8s "ecosystem" has spiraled into this duplicative, esoteric, fragile, and costly bazar of complexity and overengineering.
          • rodgerd 81 days ago
            Indeed. One of my rules of thumb is that if it requires permanent storage, k8s is the wrong place, even if it is in-theory possible to do. Dealing with that whole side of things is so error-prone.
            • topspin 80 days ago
              I've looked at all the prevailing efforts to wed k8s and block storage and network file systems. It's all nauseating.

              On one hand you have network storage vendors (netapp, synology, ceph, whatever) baking up "drivers" aiming at a moving "CSI" target, a spec revised 10 times over seven years as the k8s world muddles its way to retrofitting state to a system that was never intended to deal with state at all. These vendor specific (!) "drivers" are doing laughable things like exec-ing iscsiadm to cobble up storage connections from the host. The only way this could be more "error-prone", as you say, is if Microsoft was selling it.

              On the other hand, you have perfectly good, battle tested kernels that have all the necessary vendor independent modules and tools to handle this stuff flawlessly, and no thought given as to how to "orchestrate" them.*

              Beyond that we have live migration**, a capability long part of every hypervisor platform (for some apparently unfathomable reason, entirely lost on k8s disciples,) yet completely forgone by container orchestration. Maybe we'll have a working CRIU sometime before I retire... but I'm not holding my breath.

              * Kata containers is the closest we've seen to something real here as far as I know, and even there the developers only considered dealing with network attached storage when users started filing issues on it, the main hang-up being that k8s wasn't capable of propagating the necessary volume metadata, until the Kata folks extended k8s to do so.

              ** Kata containers again: No support for live migration. The staggering irony is that they haven't considered this because k8s has no concept of live migration: there's no extant, standardized way to express live migration in k8s. So a platform (Kata) that proports to facilitate orchestrating VMs foregoes one of the key affordances of VMs because k8s says so (by omission.) That goes directly to my point that the widespread misapplication of k8s is actively retarding other solutions.

              Sorry for the rant. Should anyone happen to read all of that, please don't mistake me for an anti-k8s type: I'm not. I'm anti- the foolishness of people trying to bend k8s into something it was never designed to be, to the exclusion of literally everything else in the computing world.

        • zarzavat 83 days ago
          > but a lot of uses it's put to seem to be driven more by a desire to have kubernetes on one's resume

          It’s no wonder people feel compelled to do this, given how many employers expect experience with k8s from applicants.

          Kubernetes is a computer worm that spreads via resumes.

      • marcosdumay 83 days ago
        > It was designed to solve "hyper" scale problems

        Even then, IMO, it makes too little sense. It would be a bit useful if non-used containers wasted a lot of resources, or if you could get an unlimited amount of them from somewhere.

        But no, just creating all the containers you can and leaving them there wastes almost nothing, they are limited by the hardware you have or rent, and the thing clouds can rent are either full VMs or specialized single-application sandboxes.

        AFAIK, containers solve the "how do I run both this PHP7 and this PHP8 applications on my web server?" problem, and not much more.

      • trueismywork 83 days ago
        Which language do you develop in?
        • topspin 83 days ago
          In no particular order; Python, Go, Perl, C, Java, Ruby, TCL, PHP, some proprietary stuff, all recently (last 1-2 years) and in different versions: Java: 8, 11 and 17, for example. Deployed to multiple environments at multiple sites, except the C, which is embedded MCU work.
    • everforward 84 days ago
      > Modern "containers" were invented to make thinks more reproducible ( check ) and simplify dev and deployments ( NOT check ).

      I do strongly believe deployments of containers are easier. If you want something that parallels a raw VM, you can "docker run" the image. Things like k8s can definitely be complicated, but the parallel there is more like running a whole ESXi cluster. Having done both, there's really only a marginal difference in complexity between k8s and an ESXi cluster supporting a similar feature set.

      The dev simplification is supposed to be "stop dealing with tickets from people with weird environments", though it admittedly often doesn't apply to internal application where devs have some control over the environment.

      > Personally FreeBSD Jails / Solaris Zones are the thing I like to dream are pretty much as secure as a VM and a perfect fit for a sane dev and ops workflow

      I would be interested to hear how you use them. From my perspective, raw jails/zones are missing features and implementing those features on top of them ends up basically back at Docker (probably minus the virtual networking). E.g. jails need some way to get new copies of the code that runs in them, so you can either use Docker or write some custom Ansible/Chef/etc that does basically the same thing.

      Maybe I'm wrong, and there is some zen to be found in raw-er tools.

    • anonfordays 84 days ago
      >Personally FreeBSD Jails / Solaris Zones are the thing I like to dream are pretty much as secure as a VM and a perfect fit for a sane dev and ops workflow, I didn't dig too deep into this is practice, maybe I'm afraid to learn the contrary, but I hope not

      Having run both at scale, I can confirm and assure you they are not as secure as VMs and did not produce sane devops workflows. Not that Docker is much better, but it is better from the devops workflow perspective, and IMHO that's why Docker "won" and took over the industry.

      • kkfx 84 days ago
        A sane DevOps workflow is with declarative systems like NixOS or Guix System, definitively not on a VM infra in practice regularly not up to date, full of useless deps, on a host definitively not up to date, with the entire infra typically not much managed nor manageable and with an immense attack surface...

        VMs are useful for those who live on the shoulder of someone else (i.e. *aaS) witch is ALL but insecure.

        • secondcoming 84 days ago
          I'm not sure what you're referring to here?

          Our cloud machines are largely VMs. Deployments mean building a new image and telling GCP to deploy that as machines come and go due to scaling. The software is up to date, dependencies are managed via ansible.

          Maybe you think VMs means monoliths? That doesn't have to be the case.

          • kkfx 83 days ago
            That's precisely the case: instead of owning hw, witch per-machine it's a kind-of monolith (even counting blades and other modular solution), you deploy a full OS or half-full to run just a single service, on top of another "OS". Of course yes, this is the cloud model, and is also the ancient and deprecated mainframe model, with much more added complexity and no unique ownership with an enormously big attack surface.

            Various return of experience prove that cloud model is not cheap nor reliable than owning iron, it's just fast since you live on the shoulders of someone else. A speed you will pay at an unknown point in time when something happen and you have zero control other that.

            DevOps meaning the Devs taking over the Ops without having the needed competences, it's a modern recipe to a failing digital ecosystems and we witnessed that more and more with various "biblical outages" from "Roomba devices briked due to an AWS mishap, cars of a certain vendor with a slice or RCEs, payment systems outages, ... a resilient infra it's not a centrally managed decentralized infra, it's a vast and diverse ecosystem interoperating with open and standard tools and protocols. Classic mail or Usenet infra are resilient, GMail backed by Alphabet infra is not.

            What if Azure tomorrow collapse? What's the impact? What's the attack surface of living on the shoulder of someone else, typically much bigger than you and often in other countries where getting even legal protections is costly and complex?

            Declarative systems on iron means you can replicate your infra ALONE on the iron, VMs meaning you need much more resources and you do not even know the entire stack of your infra, you can't essentially replicate nothing. VMs/images are still made the classical '80s style semi-manual way with some automation written by a dev knowing just how to manage his/her own desktop a bit and others will use it careless "it's easy to destroy and re-start", as a result we have seen in production images with someone unknown SSH authorized keys because to be quick someone pick the first ready made image from Google Search and add just few things, we are near the level of crap of the dot-com bubble, with MUCH more complexity and weight.

            • bbarnett 83 days ago
              (note .. use 'which' not 'witch', quite different words)

              Not sure if you mentioned it, but cost and scaling is an absurd trick of AWS and others. AWS is literally 1000s, and in some usage cases even millions of times more expensive than your own hardware. Some believe that employee cost savings help here, but that's not even remotely close.

              Scaling is absurd. You can buy one server worth $10k, that can handle the equivalent of thousands upon thousands of AWS instances' workload. You can buy far cheaper servers ($2k each), colo them yourself, have failover capability, and even have multi-datacentre redundancy, immensely cheaper than AWS. 1000 of times cheaper. All with more power than you'd ever, ever, ever scale at AWS.

              All that engineering to scale, all that effort to containerize, all that reliance upon AWS and their support system.. unneeded. You can still run docker locally, or VMs, or just pound it out to raw hardware.

              So on top of your "run it on bare metal" concept, there's the whole "why are you wasting time and spending money" for AWS, argument. It's so insanely expensive. I cannot repeat enough how insanely expensive AWS is. I cannot repeat enough how AWS scaling is a lie, when you don't NEED to scale using local hardware. You just have so much more power.

              Now.. there is one caveat, and you touch on this. Skill. Expertise. As in, you have to actually not do Really Dumb Things, like write code that uses 1000s of times CPU to do the same task, or write DB queries or schema that eat up endless resources. But of course, if you do those things on your own hardware, in DEV, you can see them and fix.

              If you do those in AWS, people just shrug, and pay immense sums of money and never figure it out.

              I wonder, how many startups have failed due to AWS costs?

              • kkfx 83 days ago
                > use 'which' not 'witch', quite different words

                Thanks and sorry for my English, even if I use it for work I do not normally use it conversationally and as a result it's still very poor for me...

                Well I do not specifically talk about AWS, but in general living on someone else is much more expensive in OPEX than what it can be spared in CAPEX, and it's a deeply critical liability, specially when we start to develop on someone else API instead of just deploy something "standard" we can always move unchanged.

                Yes, technical debt is a big issue but is a relative issue because if you can't maintain your own infra you can't be safe anyway, the "initial easiness" means a big disaster sooner or later, and the more later it is the more expensive it will be. Of course an unipersonal startup can't have on it's own iron offsite backups, geo replication and so on, but dose the MINIMUM usage of third party services trying to be as standard and vendor independent as possible until you earn enough to own it's definitively possible at any scale.

                Unfortunately it's a thing we almost lost since now Operation essentially does not exists anymore except for few giants, Devs have no substantial skill since they came from "quick" full immersion bootcamps where they learned just to do repetitive things with specific tools like modern Ford model workers able only to turn a wrench and still most of the management fails to understand IT for what it is, not "computers" like astronomers telescopes, but information, like stars for astronomers. This toxic mix have allowed very few to earn hyper big positions, but they start to collapse because their commercial model is technically untenable and we start all paying the high price.

        • nine_k 84 days ago
          VMs are useful when you don't own or rent dedicated hardware. Which is a lot of cases, especially when your load varies seriously over the day or week.

          And even if you do manage dedicated servers, it's often wise to use VMs on them to better isolate parts of the system, aka limit the blast radius.

          • kkfx 83 days ago
            Witch is a good recipe to pay much more thinking to be smart and pay less, being tied to some third parties decisions for anything running, having a giant attack surface and so on...

            There are countless lessons about how owning hw is cheaper than not, there are countless examples of "cloud nightmares", countless examples of why a system need to be simple and securely design from start not "isolated", but people refuse to learn, specially since they are just employees for living on the shoulder of someone else means less work to do and managers typically do not know even the basic of IT to understand.

    • markandrewj 83 days ago
      I wish people would stop going on about BSD jails as if they are the same. I would recommend at least using jails first. Most people using container technologies are well versed in BSD jails, as well as other technologies such as LXD, CRI-O, Micro VM's, and traditional virtualization technologies (KVM).

      You will encounter rough edges with any technology if you use it long enough. Container technologies require learning new skills, and this is where I personally see people often get frustrated. There is also the lean left mentality of container environments, where you are expected to be responsible for your environment, which is difficult for some. I.E. users become responsible for more then in a traditional virtualizated environment. People didn't stop using VM's, they just started using containers as well. What you should use is dependent on the workload. When you have to manage more then a single VM, and work on a larger team, the value of containers becomes more apparent. Not to mention the need to rapidly patch and update in today's environment. Often VM's don't get patched because applications aren't architected in a way to allow for updates without downtime, although it is possible. There is a mentality of 'if it's not broke, don't fix it'. There is some truth that virtualized hardware can provide bounds of seperation as well, but other things like selinux also enforce these boundaries. Not to mention containers are often running inside VM's as well.

      Using ephemeral VM's is not a new concept. The idea of 'cattle vs pets', and cloud, was built on KVM (OpenStack/AWS).

    • lkrubner 84 days ago
      I agree. VMs rely on old technologies, and are reliable in that way. By contrast, the move to Docker then necessitated additional technologies, such as Kubernetes, and Kubernetes brought an avalanche of new technologies to help manage Docker/Kubernetes. I am wary of any technology that in theory should make things simpler but in fact draws you down a path that requires you to learn a dozen new technologies. The Docker/Kubernetes path also drove up costs, especially the cost associated with the time needed to set up the devops correctly. Anything that takes time costs money. When I was at Averon the CEO insisted on absolutely perfect reliability and therefore flawless devops, so we hired a great devops guy to help us get setup, but he needed several weeks to set everything up, and his hourly rate was expensive. We could have just "push some code to a server" and we would have saved $40,000. When I consult with early stage startups, and they worry about the cost of devops, I point out that we can start simply, by pushing some code to a server, as if this was still 2001, and we can proceed slowly and incrementally from there. While Docker/Kubernetes offers infinite scalability, I warn entrepreneurs that their first concern should be keeping things simple and therefore low cost. And then the next step is to introduce VMs, and then use something like Packer to enable the VMs to be uses as AMIs and so allow the devops to develop to the point of using Terraform -- but all of that can wait till the product actually gains some traction.
    • dboreham 84 days ago
      For me it's about the ROAC property (Runs On Any Computer). I prefer working with stuff that I can run. Running software is live software, working software, loved software. Software that only works in weird places is bad, at least for me. Docker is pretty crappy in most respects, but it has the ROAC going for it.

      I would love to have a "docker-like thing" (with ROAC) that used VMs not containers (or some other isolation tech that works). But afaik that thing does not yet exist. Yes there are several "container-tool, but we made it use VMs" (firecracker and downline), but they all need weirdo special setup, won't run on my laptop, or a generic Digitalocean VM.

      • 01HNNWZ0MV43FF 84 days ago
        Yeah that's kind of a crummy tradeoff.

        Docker is "Runs on any Linux, mostly, if you have a new enough kernel" meaning it packages a big VM anyway for Windows and macOS

        VMs are "Runs on anything! ... Sorta, mostly, if you have VM acceleration" meaning you have to pick a VM software and hope the VM doesn't crash for no reason. (I have real bad luck with UTM and VirtualBox on my Macbook host for some reason.)

        All I want is everything - An APE-like program that runs on any OS, maybe has shims for slightly-old kernels, doesn't need a big installation step, and runs any useful guest OS. (i.e. Linux)

        • compsciphd 84 days ago
          docker is your userspace program carries all its user space dependencies with it and doesn't depend on the userspace configuration of the underlying system.

          What I argued in my paper is that systems like docker (i.e. what I created before it), improve over VMs and (even Zones/ZFS) in their ability to really run ephemeral computation. i.e. if it takes microseconds to setup the container file system, you can run a boatload of heterogeneous containers even if they only needed to run for very shot periods of time). Solaris Zones/ZFS didn't lend itself to heterogeneous environments, but simply cloning as single homogeneous environment, while VMs suffered from that problem, they also (at least at the time, much improved as of late) required a reasonably long bootup time.

        • neaanopri 84 days ago
          The modern developer yearns for Java
          • smallmancontrov 84 days ago
            I had to use eclipse the other day. How the hell is it just as slow and clunky as I remember from 20 years ago? Does it exist in a pocket dimension where Moore's Law doesn't apply?
            • marcosdumay 83 days ago
              It's not as slow as it was 20 years ago.

              It's only as slow as you remember because the actual performance was so bad that you can't physiologically remember it.

            • TurningCanadian 84 days ago
              That's not Java's fault though. IntelliJ IDEA is also built on Java and runs just fine.
            • qwery 84 days ago
              I think it's pretty remarkable to see any application in continuous use for so long, especially with so few changes[0] -- Eclipse must be doing something right!

              Maintaining (if not actively improving/developing) a piece of useful software without performance degradation -- that's a win.

              Keeping that up for decades? That's exceptional.

              [0] "so few changes": I'm not commenting on the amount of work done on the project or claiming that there is no useful/visible features added or upgrades, but referring to Eclipse of today feeling like the same application as it always did, and that Eclipse hasn't had multiple alarmingly frequent "reboots", "overhauls", etc.

              [?] keeping performance constant over the last decade or two is a win, relatively speaking, anyway

              • dijit 84 days ago
                I agree, that you've pointed it out to me makes it obvious that this is not the norm, and we should celebrate this.

                I'm reminded of Casey Muratori's rant on Visual Studio; a program that largely feels like it hasn't changed much but clearly has regressed in performance massively; https://www.youtube.com/watch?v=GC-0tCy4P1U

              • password4321 84 days ago
                > without performance degradation

                Not accounting for Moore's Law, yikes. Need a comparison adjusted for "today's dollars".

          • mschuster91 84 days ago
            Java's ecosystem is just as bad. Gradle is insanely flexible but people create abominations out of it, Maven is extremely rigid so people resort to even worse abominations to get basic shit done.
          • gryfft 84 days ago
            Maybe just the JVM.
        • klooney 83 days ago
          > Runs on any Linux, mostly, if you have a new enough kernel"

          New enough kernel means CentOS 5 is right out. But it's been a decade, it'll run on anything vaguely sensible to be running today

          • 01HNNWZ0MV43FF 80 days ago
            Maybe I'm just grumpy that once you line up the support windows, it's impossible to get new software on old hardware, even though the "oomph" is there

            Maybe my next big hobby project should be emulating bleeding-edge Linux on some old 686 hardware. Like that guy who booted Ubuntu on an 8-bit AVR in a matter of mere days

      • ThreatSystems 84 days ago
        Vagrant / Packer?
        • stackskipton 84 days ago
          Wouldn't work here, they have software on each VM that cannot be reimaged. To use Packer properly, you should treat like you do stateless pod, just start a new one and take down the old one.
          • ThreatSystems 83 days ago
            Sure then throw Ansible over the top for configuration/change management. Packer gives you a solid base for repeatable deployments. Their model was to ensure that data stays within the VM which a deployed AMI made from Packer would suit the bill quite nicely. If they need to do per client configuration then ansible or even AWS SSM could fit the bill there once EC2 instance is deployed.

            For data sustainment if they need to upgrade / replace VMs, have a secondary EBS (volume) mounted which solely stores persistent data for the account.

            • stackskipton 82 days ago
              That might work as well. I've found Packer + Ansible to be juice not really worth the squeeze vs base Ubuntu/Debian/Rocky + bigger Ansible playbook.
        • gavindean90 84 days ago
          With all the mind share that terraform gets you would thing vagrant would at least be known but alas
          • tptacek 84 days ago
            Somebody educate me about the problem Packer would solve for you in 2024?
            • sgarland 84 days ago
              Making machine images. AWS calls them AMIs. Whatever your platform, that's what it's there for. It's often combined with Ansible, and basically runs like this:

              1. Start a base image of Debian / Ubuntu / whatever – this is often done with Terraform.

              2. Packer types a boot command after power-on to configure whatever you'd like

              3. Packer manages the installation; with Debian and its derivatives, this is done mostly through the arcane language of preseed [0]

              4. As a last step, a pre-configured SSH password is set, then the new base VM reboots

              5. Ansible detects SSH becoming available, and takes over to do whatever you'd like.

              6. Shut down the VM, and create clones as desired. Manage ongoing config in a variety of ways – rolling out a new VM for any change, continuing with Ansible, shifting to Puppet, etc.

              [0]: https://wiki.debian.org/DebianInstaller/Preseed

              • pxc 84 days ago
                This is nice in its uniformity (same tool works for any distro that has an existing AMI to work with), but it's insanely slow compared to just putting a rootfs together and uploading it as an image.

                I think I'd usually rather just use whatever distro-specific tools for putting together a li'l chroot (e.g., debootstrap, pacstrap, whatever) and building a suitable rootfs in there, then finish it up with amazon-ec2-ami-tools or euca2ools or whatever and upload directly. The pace of iteration with Packer is just really painful for me.

                • sgarland 83 days ago
                  I haven’t played with chroot since Gentoo (which for me, was quite a while ago), so I may be incorrect, but isn’t that approach more limited in its customization? As in, you can install some packages, but if you wanted to add other repos, configure 3rd party software, etc. you’re out of luck.
                  • pxc 83 days ago
                    Nah you can add other repos in a chroot! The only thing you can't really do afaik is test running a different kernel; for that you've got to actually boot into the system.

                    If you dual-boot multiple Linux systems you can still administer any of the ones you're not currently running via chroot at any time, and that works fine whether you've got third-party repositories or not. A chroot is also what you'd use to reinstall the bootloader on a system where Windows has nuked the MBR or the EFI vars or whatever.

                    There might be some edge cases like software that requires a physical hardware token to be installed for licensing purposes is very aggressive, so it might also try to check if it's running in a chroot, container, or VM and refuse to play nice or something like that. But generally you can do basically anything in a chroot that you might do in a local container, and 99% of what you might do in a local VM.

              • stargrazer 83 days ago
                I miss saltstack. I did that whole litany of steps with one tool plus preseed.
                • elevation 71 days ago
                  Saltstack is still around!
            • kasey_junk 84 days ago
              I think the thread is more about how docker was a reaction to the vagrant/packer ecosystem that was deemed overweight but was in many ways was a “docker like thing” but VMs.
              • tptacek 84 days ago
                Oh, yeah, I'm not trying to prosecute, I've just always been Packer-curious.
            • yjftsjthsd-h 84 days ago
              What's a better way to make VM images?
    • vundercind 84 days ago
      Docker’s the best cross-distro rolling-release package manager and init system for services—staying strictly out of managing the base system, which is great—that I know of. I don’t know of anything that’s even close, really.

      All the other stuff about it is way less important to me than that part.

      • pxc 84 days ago
        This is wrong in pretty much every way I can imagine.

        Docker's not a package manager. It doesn't know what packages are, which is part of why the chunks that make up Docker containers (image layers) are so coarse. This is also part of why many Docker images are so huge: you don't know exactly the packages you need, strictly speaking, so you start from a whole OS. This is also why your Dockerfiles all invoke real package managers— Docker can't know how to install packages if it doesn't know what they are!

        It's also not cross-platform, or at least 99.999% of images you might care about aren't— they're Linux-only.

        It's also not a service manager, unless you mean docker-compose (which is not as good as systemd or any number of other process supervisors) or Docker Swarm (which has lost out to Kubernetes). (I'm not sure what you even mean by 'init system for containers' since most containers don't include an init system.)

        There actually are cross-platform package managers out there, too. Nix, Pkgsrc, Homebrew, etc. All of those I mentioned and more have rolling release repositories as well. ('Rolling release' is not a feature of package managers; there is no such thing as a 'rolling release package manager'.)

        • icedchai 83 days ago
          When I read the parent comment, I was picturing package manager and init system in quotes. Docker is the "package manager" for people who don't want to package their apps with a real package manager. It's a "service manager" and "init system" that can restart your services (containers) on boot or when they fail.
          • vundercind 83 days ago
            Right, it gives me the key functionality of those systems in a way that’s decoupled from the base OS, so I can just run some version of Debian for five years or however long it’s got security updates, and not have to worry about things like services I want to run needing newer versions of lots of things than some old Debian has. Major version updates of my distro, or trying to get back ported newer packages in, have historically been the biggest single source of “fuck, now nothing works and there goes my weekend” running home servers, to the point of making ROI on them not so great and making me dread trying to add or update services I was running. This approach means I can install and run just about any service at any recent-ish version without touching the state of the OS itself. The underlying tech of docker isn’t directly important to me, nor any of the other features it provides, in this context, just that it’s separate from the base OS and gives me a way to “manage packages” and run services in that decoupled manner.
            • icedchai 83 days ago
              I agree with all this and use it similarly. I hate when updating the OS breaks my own apps, or does something annoying like updating a dependency (like Postgres)... Docker is perfect for this.
          • pxc 83 days ago
            You're right. I read the commenter I was replying to very badly. In my later discussion with them we covered a bit better how Docker can cover some of the same uses as package managers as well as the continued vitality of package management in the era of containers. It was a much better conversation by the end than it was at the beginning, thanks to their patience and good faith engagement.
        • vundercind 84 days ago
          > This is wrong in pretty much every way I can imagine.

          Nope! It’s not wrong in any way at all!

          You’re thinking of how it’s built. I’m thinking of what it does (for me).

          I tell it a package (image) to fetch, optionally at a version. It has a very large set of well maintained up-to-date packages (images). It’s built-in, I don’t even have to configure that part, though I can have it use other sources for packages if I want to. It fetches the package. If I want it to update it, I can have it do that too. Or uninstall it. Or roll back the version. I am 100% for-sure using it as a package manager, and it does that job well.

          Then I run a service with a simple shell script (actually, I combine the fetching and running, but I’m highlighting the two separate roles it performs for me). It takes care of managing the process (image, which is really just a very-fat process for these purposes). It restarts it if it crashes, if I like. It auto-starts it when the machine reboots—all my services come back up on boot, and I’ve never touched systemd (which my Debian uses), Docker is my interface to that and I didn’t even have to configure it to do that part. I’m sure it’s doing systemd stuff under the hood, at least to bring the docker daemon up, but I’ve never touched that and it’s not my interface to managing my services. The docker command is. Do I see what’s running with systemd or ps? No, with docker. Start, restart, or stop a service? Docker.

          I’ve been running hobbyist servers at home (and setting up and administrating “real” ones for work) since 2000 or so and this is the smoothest way to do it that I’ve seen, at least for the hobbyist side. Very nearly the only roles I’m using Docker to fill, in this scenario, are package manager and service manager.

          I don’t care how it works—I know how, but the details don’t matter for my use case, just the outcomes. The outcome is that I have excellent, updated, official packages for way more services than are in the Debian repos, that leave my base system entirely alone and don’t meaningfully interact with it, with config that’s highly portable to any other distro, all managed with a common interface that would also be the same on any other distro. I don’t have to give any shits about my distro, no “oh if I want to run this I have to update the whole damn distro to a new major version or else manually install some newer libraries and hope that doesn’t break anything”, I just run packages (images) from Docker, update them with Docker, and run them with Docker. Docker is my UI for everything that matters except ZFS pool management.

          > It's also not cross-platform, or at least 99.999% of images you might care about aren't— they're Linux-only.

          I specifically wrote cross-distro for this reason.

          > There actually are cross-platform package managers out there, too. Nix, Pkgsrc, Homebrew, etc.

          Docker “packages” have a broader selection and better support than any of those, as far as services/daemons go; it’s guaranteed to keep everything away from the base system and tidy for better stability; and it provides a common interface for configuring where to put files & config for easier and more-confident backup.

          I definitely use it mainly as a package manager and service manager, and find it better than any alternative for that role.

          • pxc 84 days ago
            > You’re thinking of how it’s built. I’m thinking of what it does (for me).

            I've read your reply and I hear you (now). But as far as I'm concerned package management is a little more than that. Not everything that installs or uninstalls software is a package manager-- for instance I would say that winget and Chocolatey are hardly package managers, despite their pretensions (scoop is closer). I think of package management, as an approach to managing software and as a technology, as generally characterized by things like and including: dependency tracking, completeness (packages' dependencies are themselves all packaged, recursively, all the way down), totality (installing software by any means other than the package manager is not required to have a practically useful system), minimal redundancy of dependencies common to multiple packages, collective aggregation and curation of packages, transparency (the unit the software management tool operates on, the package, tracks the versions of the software contained in it and the versions of the software contained in its dependencies), exclusivity (packaged software does not self-update; updates all come through the package manager), etc. Many of these things come in degrees, and many package managers do not have all of them to the highest degree possible. But the way Docker gets software running on your system just isn't meaningfully aligned with that paradigm, and this also impacts the way Docker can be used. I won't enumerate Docker's deviations from this archetype because it sounds like you already have plenty of relevant experience and knowledge.

            > I don’t care how it works—I know how, but the details don’t matter for my use case, just the outcomes.

            When there's a vuln in your libc or some similar common dependency, Docker can't tell you about which of your images contains it because it has no idea what glibc or liblzma are. The whole practice of generating SBOMs is about trying to recover or regenerate data that is already easily accessible in any competent package manager (and indeed, the tools that generate SBOMs for container images depend on actual package managers to get that data, which is why their support comes distro-by-distro).

            Managing Docker containers is also complicated in some ways that managing conventional packages (even in other containerized formats like Flatpak, Snap, and AppImage) isn't, in that you have to worry about bind mounts and port forwarding. How the software works leads to a radically different sort of practice. (Admittedly maybe that's still a bit distant from very broad outcomes like 'I have postgres running'.)

            > The outcome is that I have [many services] that leave my base system entirely alone and don’t meaningfully interact with it, with config that’s highly portable to any other distro, all managed with a common interface that would also be the same on any other distro.

            This is indeed a great outcome. But when you achieve it with Docker, the practice by means of which you've achieved it is not really a package management discipline but something else. And that is (sadly, to me) part of the appeal, right? Package management can be a really miserable paradigm when your packages all live in a single, shared global namespace (the paths on your filesystem, starting with /). Docker broke with that paradigm specifically to address that pain.

            But that's not the end of the story! That same excellent outcome is also achievable by better package managers than ol' yum/dnf and apt! And when you go that route, you also get back the benefits of the old regime like the ability to tell what's on your system and easily patch small pieces of it once-and-for-all. Nix and Guix are great for this and work in all the same scenarios, and can also readily generate containers from arbitrary packages for those times you need the resource management aspect of containers.

            > The outcome is that I have [...] official packages

            For me, this is not a benefit. I think the curation, integration, vetting, and patching that coordinated software distributions do is extremely valuable, and I expect the average software developer to be much worse at packaging and systems administration tasks than the average contributor to a Linux distro is. To me, this feels like a step backwards into chaos, like apps self-updating or something like that. It makes me think of all the absolutely insane things I've seen Java developers do with Maven and Gradle, or entire communities of hobbyists who depend on software whose build process is so brittle and undocumented that seemingly no one knows how to build it and Docker has become the sole supported distribution mechanism.

            > I specifically wrote cross-distro for this reason.

            My bad! Although that actually widens the field of contenders to include Guix, which is excellent, and arguably also Flatpak, which still aligns fairly well with package management as an approach despite being container-based.

            > Docker “packages” have a broader selection and better support than any of those, as far as services/daemons go

            I suppose this is an advantage of a decentralized authority-to-publish, like we also see in the AUR or many language-specific package repositories, and also of freedom from the burden of integration, since all docker image authors have to do is put together any runtime at all that runs. :-\

            > service manager

            Ok. So you're just having dockerd autostart your containers, then, no docker-compose or Docker Swarm or some other layer on top? Does that even have a notion of dependencies between services? That feels like table stakes for me for 'good service manager'.

            PS: thanks for giving a charitable and productive reply to a comment where I was way gratuitously combative about a pet peeve/hobby horse of mine for no good reason

            • vundercind 83 days ago
              Oh no, you’re fine, thanks for responding in kind, I get where you’re coming from now too. Maybe it’s clearer to label my use of it as a software manager or something like that. It does end up being my main interface for nearly everything that matters on my home server.

              Like, I’m damn near running Docker/Linux, in the pattern of gnu/Linux or (as started as a bit of a joke, but is less so with each year) systemd/Linux, as far as the key parts that I interact with and care about and that complete the OS for me.

              As a result, some docker alternatives aren’t alternatives for me—I want the consistent, fairly complete UI for the things I use it for, and the huge library of images, largely official. I can’t just use raw lxc or light VMs instead, as that gets me almost nothing of what I’m currently benefiting from.

              I haven’t run into a need to have dependent services (I take the SQLite option for anything that has it—makes backups trivial) but probably would whip up a little docker-compose for if I ever need that. In work contexts I usually just go straight for docker-compose, but with seven or eight independent services on my home server I’ve found I prefer tiny shell scripts for each one.

              [edit] oh, and I get what you mean about it not really doing things like solving software dependencies—it’s clearly not suitable as, like, a system package manager, but fills the role well enough for me when it comes to the high-level “packages” I’m intending to use directly.

              • pxc 83 days ago
                > As a result, some docker alternatives aren’t alternatives for me—I want the consistent, fairly complete UI for the things I use it for, and the huge library of images, largely official.

                This kind of uniformity of interface and vastness are among the things that made me fall in love with Linux package management as a kid, too. I can see how there's a vital similarity there that could inspire the language you used in your first comment.

                > I can’t just use raw lxc or light VMs instead, as that gets me almost nothing of what I’m currently benefiting from.

                Right, those will give you the isolation, but primarily what you want is the low commitment and uniform interface to just try something (and if turns out to be good enough, why not leave it running like that for a few years).

                I sometimes kind of enjoy packaging work, and even at work I sometimes prefer to build a container from scratch myself when we're doing container deployments, rather than using a vendor one. In fact we've got one ECS deployment where we're using a vendor container where the version of the software that I've got running locally through a package I wrote works fine, but the vendor container at that same version mysteriously chokes on ECS, so we've reverted to the LTS release of the vendor one. Building my own Nix package for that software involved some struggle: some reading of its build tooling and startup code, some attempts to build from source manually, some experiments with wrapping and modifying prebuilt artifacts, some experiments with different runtimes, etc. But it also taught me enough about the application that I am now prepared to debug or replace that mysteriously-busted vendor container before that deployment moves to production.

                At the same time, I don't think that pain would be worth it for me for a homelab deployment of this particular software. At work, it's pretty critical stuff so having some mastery of the software feels a bit more relevant. But if I were just hoping to casually try it at home, I'd have likely moved on or just resorted to the LTS image and diminished my opinion of upstream a little without digging deeper.

                The (conceptually) lightweight service management point is well-taken. In systemd, which I generally like working with, you always have to define at least one dependency relation (e.g., between your new service and one of the built-in targets) just to get `systemctl enable` to work for it! On one level I like the explicitness of that, but on another I can understand seeing it as actually unnecessary overhead for some use cases.

    • benreesman 84 days ago
      Namespaces and cgroups and LXC and the whole alphabet soup, the “Docker Industrial Complex” to borrow your inspired term, this stuff can make sense if you rack your own gear: you want one level of indirection.

      As I’ve said many times, putting a container on a serverless on a Xen hypervisor so you can virtualize while you virtualize? I get why The Cloud wants this, but I haven’t the foggiest idea why people sit still for it.

      As a public service announcement? If you’re paying three levels of markup to have three levels of virtual machine?

      You’ve been had.

      • Spivak 83 days ago
        You're only virtualizing once. Serverless/FaaS is just a way to run a container, and a container is just a Linux process with some knobs to let different software coexist more easily. You're still just running VMs, same as you always were, but just have a new way of putting the software you want to run on them.
    • tptacek 84 days ago
      Jails/Zones are not pretty much as secure as a VM. They're materially less secure: they leave cotenant workloads sharing a single kernel (not just the tiny slice of the kernel KVM manages). Most kernel LPEs are probably "Jail" escapes, and it's not feasible to filter them out with system call sandboxing, because LPEs occur in innocuous system calls, too.
    • dhx 83 days ago
      The article doesn't read to me to be an argument about whether sharing a kernel is better or worse (multiple virtual machines each with their own kernel versus multiple containers isolated by a single kernel).

      The article instead reads to me as an argument for isolating customers to their own customer-specific systems so there is no web server daemon, database server, file system path or other shared system used by multiple customers.

      As an aside to the article, two virtual machines each with their own kernel are generally forced to communicate with each in more complex ways through network protocols which add more complexity and increase risk of implementation flaws and vulnerabilities existing. Two processes in different cgroups with a common kernel have other simpler communication options available such as being able to read the same file directly, UNIX domain sockets, named pipes, etc.

      • didntcheck 83 days ago
        Yep, the article just seems to be talking about single tenancy vs multi tenancy. The VMs vs containers thing seems mostly orthogonal
    • nimish 84 days ago
      Clear Containers/Kata Containers/firecracker VMs showed that there isn't really a dichotomy here. Why we aren't all using HW assisted containers is a mystery.
      • tptacek 84 days ago
        It's not at all mysterious: to run hardware-virtualized containers, you need your compute hosted on a platform that will allow KVM. That's a small, expensive, tenuously available subset of AWS, which is by far the dominant compute platform.
        • Spivak 84 days ago
          So… Lambda, Fargate, and EC2. The only thing you can't really do this with is EKS.

          Like Firecracker was made by AWS to run containers on their global scale KVM, EC2.

          • tptacek 84 days ago
            Lambda and Fargate are implementations of the idea, not a way for you yourself to do any kind of KVM container provisioning. You can't generally do this on EC2; you need special instances for it.

            For a variety of reasons, I'm pretty familiar with Firecracker.

            • Spivak 84 days ago
              What I'm I missing? AWS offers (virtual) hardware backed containers as a service, I would go so far as to say that a significant number of people are running vm backed containers.

              And I've been at a few shops where EC2 is used as the poor-man's-firecracker by building containers and then running 1(ish) per VM. AWS's architecture actively encourages this because that's by far the easiest security boundary to manipulate. The moment you start thinking about two privilege levels in the same VM you're mostly on your own.

              The number of people running production workloads who, knowingly or not, believe that the security boundary is not between containers but between the vms enclosing those containers is probably almost everyone.

              • ryapric 84 days ago
                >What I'm I missing?

                The parent isn't talking about e.g. EC2 as a virtualized platform, they're talking about EC2 not being a virtualization platform. With few machine-type exceptions, EC2 doesn't support nested virtualization -- you can't run e.g. KVM on EC2.

              • kasey_junk 84 days ago
                I think the argument is you need to be running nitro (I think, it’s been awhile?) instances to take advantage of kvm isolation
      • turtlebits 84 days ago
        Engineers are lazy, especially Ops. Until it's easier to get up and running and there are tangible benefits, people won't care.
    • mountainriver 84 days ago
      Docker is fantastic and VMs are fantastic.

      I honestly can’t imagine running all the services we have without containers. It would be wildly less efficient and harder to develop on.

      VMs are wonderful when you need the security

    • tomjen3 84 days ago
      If anything Docker is underused. You should have a very good reason to make a deploy that is not Docker, or (if you really need the extra security) a VM that runs one thing only (and so is essentially a more resource requiring Docker).

      If you don’t, then it becomes much harder to answer the question of what exactly is deployed on a given server and what it takes to bring it up again if it goes down hard. If you but everything in Docker files, then the answer is whatever is set in the latest docker-compose file.

    • ranger207 84 days ago
      Docker's good at packaging, and Kubernetes is good at providing a single API to do all the infra stuff like scheduling, storage, and networking. I think that if someone sat down and tried to create a idealized VM management solution that covered everything between "dev pushes changes" to "user requests website" then it'd probably have a single image for each VM to run (like Docker has a single image for each container to run) then management of VM hosts, storage, networking, and scheduling VMs to run on which host would wind up looking a lot like k8s. You could certainly do that with VMs but for various path dependency reasons people do that with containers instead and nobody's got a well adopted system for doing the same with VMs
      • crabbone 83 days ago
        I'm sorry, but:

        * Docker isn't good at packaging. When people talk about packaging, they usually understand it to include dependency management. For Docker to be good at packaging it should be able to create dependency graphs and allow users to re-create those graphs on their systems. Docker has no way of doing anything close to that. Aside from that, Docker suffers from the lack of reproducible builds, lack of upgrade protocols... It's not good at packaging... maybe it's better than something else, but there's a lot of room for improvement.

        * Kubernetes doesn't provide a single API to do all the infra stuff. In fact, it provides so little, it's a mystery why anyone would think that. All those stuff like "storage", "scheduling", "networking" that you mentioned comes as add-ons (eg. CSI, CNI) which aren't developed by Kubernetes, aren't following any particular rules, have their own interfaces... Not only that, Kubernetes' integration with CSI / CNI is very lacking. For example, there's no protocol for upgrading these add-ons when upgrading Kubernetes. There's no generic interface that these add-ons have to expose to the user in order to implement common things. It's really anarchy what's going on there...

        There are lots of existing VM management solutions, eg. OpenStack, VSphere -- you don't need to imagine them, they exist. They differ from Kubernetes in many ways. Very superficially, yet importantly, they don't have an easy way to automate them. For very simple tasks Kubernetes offers a very simple solution for automation. I.e. write some short YAML file. Automating eg. ESX comes down to using a library like govmomi (or something that wraps it, like Terraform). But, in the mentioned case, Terraform only managed deployment, and doesn't take care of the post-deployment maintenance... and so on.

        However, the more you deal with the infra, the more you realize that the initial effort is an insignificant fraction of the overall complexity of the task you need to deal with. And that's where the management advantages of Kubernetes start to seem less appealing. I.e. you realize that you will have to write code to manage your solution, and there will be a lot of it... and a bunch of YAML files won't cut it.

        • ranger207 83 days ago
          Docker's dependency management solution is "include everything you need and specify a standard interface for the things you can't include like networking." There's no concern about "does the server I'm deploying to have the right version of libssl" because you just include the version you need. At most, you have to have "does the server I'm deploying to have the right version of Docker/other container runtime for the features my container uses" which are a much smaller subset of changes. Reproducible builds, yeah, but that's traditionally more a factor of making sure your build scripts are reproducible than the package management itself. Or to put it another way, dockerfiles are just as reproducible as .debs or .rpms. Upgrading is replacing the container with a new one

          Kubernetes is an abstraction layer that (mostly) hides the complexity of storage networking etc. Yeah the CNIs and CSIs are complex, but for the appdev it's reduced to "write a manifest for a PV and a PVC" or "write a manifest for a service and/or ingress". In my company ops has standardized that so you add a key to your values.yaml and it'll template the rest for you. Ops has to deal with setting up that stuff in the first place, which you have to do regardless, but it's better than every appdev setting up their own way to do things

          My company's a conglomerate of several acquisitions. I'm from a company that was heavy into k8s, and now I'm working on getting everyone else's apps that are currently just deployed to a VM into a container and onto a k8s cluster instead. I might shouldn't've said k8s was an API per se, but it is a standardized interface that covers >90% of what people want to do. It's much easier to debug everything when it's all running on top of k8s using the same k8s concepts than it is debugging why each idiosyncratic VM isn't working. Could you force every app to use the same set of features on VMs? Want a load balancer, just add a value to your config and the deployment process will add your VM to the F5? Yeah, it's possible, but we'd have to build it, or find a solution offered by a particular vendor. k8s already has that written and everyone uses it

          • crabbone 83 days ago
            This is super, super, super naive. You, essentially, just solved for the case of one. But now you need to solve for N.

            Do you seriously believe you will never be in a situation where you have to run... two containers?.. With two different images? If my experience is anything to go by, even postcard Web sites often use 3-5 containers. I just finished deploying a test of our managed Kubernetes (technically, it uses containerd, but it could be using Docker). And it has ~60 containers. And this is just the management part. I.e. no user programs are running there. It's a bunch of "operators", CNIs, CSIs etc.

            In other words: if your deployment was so easy that it could all fit into a single container -- you didn't have a dependency problem in the first place. But once you get realistic size deployment, you now have all the same problems. If libssl doesn't implement the same version of TLS protocol in two containers -- you are going to have a bad time. But now you also amplified this problem because you need certificates in all containers! Oh and what a fun it is to manage certificates in containers!

            > Kubernetes is an abstraction layer that (mostly) hides the complexity of storage networking etc

            Now, be honest. You didn't really use it, did you? The complexity in eg. storage may manifest in many different ways. None of them have anything to do with Kubernetes. Here are some examples: how can multiple users access the same files concurrently? How can the same files be stored (replicated) in multiple places concurrently? What about first and second together? Should replication happen at the level of block device or filesystem? Should snapshots be incremental or full? Should user ownership be encoded into storage, or should there be an extra translation layer? Should storage allow discards when dealing with encryption? And many, many more.

            Kubernetes doesn't help you with these problems. It cannot. It's not designed to. You have all the difficult storage problems whether you have Kubernetes or not. What Kubernetes offers is a possibility for the storage vendors to expose their storage product through it. Which is nothing new. All those storage products can be exposed through some other means as well.

            In practice, some storage vendors who choose to expose their products through Kubernetes usually end up with a limited subset of the storage functionality exposed in such a way. So, not only storage through Kubernetes doesn't solve your problems: it adds more of them. Now you may have to work around the restrictions of Kubernetes if you want to use some unavailable features (think, for example all the Ceph CLI that you are missing when using Ceph volumes in Kubernetes: it's hundreds of commands that are suddenly unavailable to you).

            ----

            You seem like an enthusiastic person. And you probably truly believe what you write about this stuff. But you went way above your head. You aren't really an infra developer. You kind of don't even really recognize the general patterns and problems of this field. And that's OK. You don't have to be / do that. You just happened to be a new car owner who learned how to change oil on your own, and you are trying to preach to a seasoned mechanic about the benefits and downsides of different engine designs :) Don't take it to heart. It's one of those moments where maybe years later you'll suddenly recall this conversation and feel a spike of embarrassment. Everyone has that.

            • ranger207 82 days ago
              Looking at my company's Rancher dashboard, it looks like I'm currently running about 7500 pods. Assuming 1.5 containers/pod (probably high) then I'm not running 1 container, I'm running about 11 thousand containers right now. Please don't assume I can't understand what you're saying because of any particular level of experience. Your points are just as understandable regardless.

              I'm not sure there's a real usecase for running multiple versions of the same app at the same time tbh. If the devs have a new version they're tying to push out then first their branch has to pass automated tests before it can be merged to master, (mostly) ensuring old functionality doesn't fail. Then our deployment pipeline deploys it to staging, makes sure everything is healthy and readiness probes are returning 200, then deploys it to prod, makes sure everything comes up, and finally switches the k8s service to point to the new pod versions. If anything breaks at that point, the old pods are still around and I can swap the k8s services to point to the old deploy instantly.

              If, for example, two versions of libssl are somehow treating the same protocol version differently, then that'd be detected on staging at the latest. If the devs know they need to upgrade protocol versions from (for example) TLS 1.2 to TLS 1.3, then they'll deploy a version that runs on TLS 1.2 and 1.3, then once everything is working deploy a version that works only on TLS 1.3. Nothing actually takes production traffic until we're fully assured it's healthy. We haven't had a maintenance or upgrade outage for at least 3 years.

              Could all this be replicated on a VM platform assuming it has an appropriate API? Definitely. But k8s has all this covered already. How do I switch traffic from the old pods to the new pods? The deployment pipeline runs `kubectl apply -f ingress.yaml` and k8s patches all the load balancer configs to point to the new pods. That's the entirety of what I would need to do if it wasn't already automated.

              Certificate management is also pretty easy. Each pod pulls a cert from our PKI (Hashicorp Vault) when it starts up. If the leaf cert expires (unlikely because pods are usually replaced by a new version well before then) then the app throws an exception, the pod goes unhealthy, k8s restarts the pod, the new pod gets a new cert, and it's good for another ~year. This is completely automated by k8s.

              Cert management for the k8s nodes themselves actually does involve VMs a bit. Some of our clusters are on AWS EC2 and are set up with autoscaling groups so that if a node has too little usage it'll be downscaled, so if a cert is close to expiring then the node as a whole goes unhealthy, k8s automatically removes all pods from that node and spins up new replicas on other nodes, EC2 detects that load is low and downscales that node, and if spinning up new pods caused the other nodes to have too much utilization then EC2 will spin up new nodes with new certs and everything will be fine for another ~year. Other clusters run on on-prem VMs and we haven't completely automated that yet so those are still manual restarts.

              Every few years the root cert will expire and we'll have to restart all the pods or nodes at once. Pods are easy; just redeploy and they'll all get the new cert, or worst case I can run `kubectl delete --all pods` and the PodDisruptionBudgets will ensure that there's a rolling rollout. For nodes, I'll scale up the cluster (increase min replicas in ec2 or add more nodes through the VM platform) so there's a bunch of nodes with the new root cert then drain all the existing nodes which will cause k8s to spin up new app pods on the new uncordoned nodes, then shot down all the old VMs or let ec2 handle it.

              You're right that k8s doesn't help with app-level storage issues like concurrent access, nor does it help with storage-level issues like backups and replication. I should've been more specific that k8s helps with how the apps connect to storage. While migrating VM-deployed apps to containers I've found a few ways they've done it: config files specifying connection strings, hardcoded strings in code, pulling values from secrets management, requiring that the VM have the fileshare mounted already, etc. In k8s there's one way to do it: the app's manifest includes a PV and a PVC. Ops handles how k8s connects to the storage from there. This isn't really a k8s advantage; you could tell all your devs to use some internal library that abstracts storage and let ops write or maintain that library too. But that really only works with one company at a time, while when we onboard an acquisition that uses k8s they've already got PVs set up so we just have to migrate those. My point in saying that k8s abstracts connecting to storage was mostly about how it's an industry standard interface specifically for connecting to storage, which helps eliminate having to figure out how each individual app connects. If security makes a firewall rule that blocks all your VMs from hitting storage then for VM-deployed apps I've got to look "ok did the devs change this config file? Did someone forget to mount the fileshare or did an update break that? Is it some third option I've never seen before?" while for our k8s-deployed apps I've got one place to start looking using kubectl.

              Another point I didn't address is that yes this does require specific app architectures. The pods have to be stateless, databases are not in k8s and certainly not running alongside the app itself in the same container or pod, concurrent file access is not generally my problem, and security's wacky firewall rules can be fun to implement when I can't say what IP a particular app has. But I think the tradeoffs are generally worth it.

              You're right I'm not the most experienced at large scale infrastructure problems outside of k8s. I've managed or helped manage a couple of small server racks and a single 6-rack datacenter before, and I work closely with the non-k8s infrastructure team at my current company, but I'm not the one deciding what we're going to do to get off of VMWare for example. What I can say though is that between my past experience and the companies we've acquired, there's a lot more variation and lack of best practices among the companies that don't use k8s compared to the ones that do. With the non-k8s companies I have to familiarize myself with the idiosyncratic way each they handle every aspect of their infrastructure; with the k8s companies I already know at least half of their infrastructure.

    • cryptonector 83 days ago
      Jails/Zones are just heavy-duty containers. They're still not VMs. Not that VMs are enough either, given all the side-channels that abound.
    • TheNewsIsHere 82 days ago
      I feel the exact same way.

      There are so many use cases that get shoved into the latest, shiniest box just because it’s new and shiny.

      A colleague of mine once suggested running a CMS we manage for customers on a serverless stack because “it would be so cheap”. When you face unexpected traffic bursts or a DDoS, it becomes very expensive, very fast. Customers don’t really want to be billed per execution during a traffic storm.

      It would also have been far outside the normal environment that CMS expects, and wouldn’t have been supported by any of our commercial, vendored dependencies.

      Our stack is so much less complicated without running everything in Docker, and perhaps ironically, about half of our stack runs in Kubernetes. The other half is “just software on VMs” we manage through typical tools like SSH and Ansible.

    • ganoushoreilly 84 days ago
      Docker is great, way overused 100%. I believe a lot of it started as "cost savings" on resource usage. Then it became the trendy thing for "scalability".

      When home enthusiasts build multi container stacks for their project website, it gets a bit much.

      • Yodel0914 83 days ago
        > When home enthusiasts build multi container stacks for their project website, it gets a bit much.

        I don't know - docker has been a godsend for running my own stuff. I can get a docker-compose file working on my laptop, then run it on my VPS with a pretty high certainty that it will work. Updating has also (to date) been incredibly smooth.

      • applied_heat 84 days ago
        Solves dependency version hell also
        • theLiminator 84 days ago
          Solves it in the same sense that it's a giant lockfile. It doesn't solve the other half where updates can horribly break your system and you run into transitive version clashes.
          • bornfreddy 84 days ago
            But at least you can revert back to the original configuration (as you can with VM, too).
          • Spivak 84 days ago
            It solves it in the sense that it empowers the devs to update their dependencies on their own time and ops can update the underlying infrastructure fearlessly. It turned a coordination problem into a non-problem.
          • ktosobcy 83 days ago
            Having been running a VPS with manually maintained services moving to docker saved me a lot of headache... things definitely breaks less often (almost never) and if they do it's quite easy to revert back to previous version...
        • sitkack 84 days ago
          It doesn't solve it, it makes it tractable so you can use the scientific method to fix problems as opposed to voodoo.
    • icelancer 84 days ago
      Same. We're still managing ESXi here at my company. Docker/K8s/etc are nowhere close to prod and probably never will be. Been very pleased with that decision.

      I will say that Docker images get one HUGE use case at our company - CUDA images with consistent environments. CUDA/pytorch/tensorflow hell is something I couldn't imagine dealing with when I was in college studying CS a few decades ago.

    • m463 84 days ago
      I've always hated the docker model of the image namespace. It's like those cloud-based routers you can buy.

      Docker actively prevents you from having a private repo. They don't want you to point away from their cloud.

      Redhat understood this and podman allows you to have a private docker infrastructure, disconnected from docker hub.

      For my personal stuff, I would like to use "FROM scratch" and build my personal containers in my own ecosystem.

      • Carrok 84 days ago
        > Docker actively prevents you from having a private repo.

        In what ways? I use private repos daily with no issues.

        • m463 84 days ago
          If you reference a container without a domain, you pull from docker.io

          With podman, you can control this with

            $HOME/.config/containers/registries.conf 
          
          or

            /etc/containers/registries.conf
          
          with docker, not possible (though you can hack mirrors)

          https://stackoverflow.com/questions/33054369/how-to-change-t...

          • Cyph0n 84 days ago
            Or even easier: just fully qualify all images. With Podman:

            nginx => docker.io/library/nginx

            linuxserver/plex => docker.io/linuxserver/plex

          • Carrok 84 days ago
            So.. just use a domain. This seems like a nothing burger.
            • m463 83 days ago
              Not all dockerfiles (especially multi-stage builds) are easily sanitized for this.

              think FROM python:latest or FROM ubuntu:20.04 AS build

              They've put deliberate barriers in the way of using docker commands without accessing their cloud.

      • icedchai 84 days ago
        One of the first things I did was set up a my own container registry with Docker. It's not terribly difficult.
      • mountainriver 84 days ago
        Huh? In what way does docker prevent you from having a private repo? Its a couple clicks to get on any cloud
    • diego_sandoval 84 days ago
      > and Docker industrial complex that developed around solving problems created by themselves or solved decades ago.

      From my perspective, it's the complete opposite: Docker is a workaround for problems created decades ago (e.g. dynamic linking), that could have been solved in a better manner, but were not.

      • ktosobcy 83 days ago
        there are flatpacks/appimage/whatever but they are linux-only (mostly) and still lack something akin of docker-compose...
    • ktosobcy 83 days ago
      > Modern "containers" were invented to make things more reproducible ( check ) and simplify dev and deployments ( NOT check ).

      Why?

      I have my RPi4 and absolutely love docker(-compose) - deploying stuff/services on in it just a breeze compared to previous clusterf*k of relying on system repository for apps (or if something doesnt work)... with docker compose I have nicely separated services with dedicated databases in required version (yes, I ran into an issue that one service required newer and another older version of the database, meh)

      As for development - I do development natively but again - docker makes it easier to test various scenarios...

      • skydhash 83 days ago
        I’ve been using LXC/Incus as lightweight VMs (my home server is an old mac mini) and I think too many software is over reliant on Docker. It’s very much “ship your computer” with a bunch of odd scripts in addition to the Dockerfiles.
        • ktosobcy 83 days ago
          Could you elaborate the "ship your computer"? Majority of the images are base OS (which is as lean as possible) and then just the app...

          To that end, full blown VM seems even more "ship your computer" thing?

          Btw. isn't LXC base for the Docker as well? It looks somewhat similar to docker and podman?

    • turtlebits 84 days ago
      Honestly, it really doesn't matter whether it's VMs or Docker. The docker/container DX is so much better than VMWare/QEMU/etc. Make it easy to run workloads in VMs/Firecracker/etc and you'll see people migrate.
      • packetlost 84 days ago
        I mean, Vagrant was basically docker before docker. People used it. But it turns out the overhead over booting a full VM + kernel adds latency which is undesirable for development workloads. The techniques used by firecracker could be used, but I suspect the overhead of allocating a namespace and loading a process will always be less than even restoring from a frozen VM, so I wouldn't hold my breath on it swinging back in VM's direction for developer workloads ever.
        • yjftsjthsd-h 84 days ago
          It would be interesting to see a microvm (kata/firecracker/etc.) version of vagrant. And open source, of course. I can't see any technical reason why it would be particularly difficult.
          • packetlost 84 days ago
            I don't think they're that valuable tbh. Outside of cases where you're running completely untrusted code or emulating a different architecture, there's no strong reason to pick any VM over one of the several container paradigms.
            • yjftsjthsd-h 84 days ago
              One more usecase - which I admit is niche - is that I want a way to run multiple OSs as transparently as possible. A microvm that boots freebsd in less than a second and that acts almost like a native container would be excellent for certain development work.

              Edit: Actually it's not just cross-OS work in the freebsd/linux sense; it would also be nice for doing kernel dev work. Edit my linux kernel module, compile, and then spawn a VM that boots and starts running tests in seconds.

              • packetlost 83 days ago
                Yeah, there are definitely still cases, but they're when you specifically want/need a different kernel, which as you said are rather niche.
          • mountainriver 84 days ago
            Oh they exist! Several of them in fact, they have never picked up a ton of steam though
    • gryfft 84 days ago
      I've been meaning to do a bhyve deep dive for years, my gut feelings being much the same as yours. Would appreciate any recommended reading.
      • Gud 84 days ago
        Read the fine manual and handbook.
    • tiffanyh 84 days ago
      Are Jails/Zones/Docker even security solutions?

      I always used them as process isolation & dependency bundling.

    • analognoise 84 days ago
      What do you think of Nix/NixOS?
      • reddit_clone 84 days ago
        But that comes _after_ you have chosen VMs over Containers yes?

        If you are using VMs, I think NixOs/Guix is a good choice. Reproducible builds, Immutable OS, Immutable binaries and Dead easy rollback.

        It still looks somewhat futuristic. Hopefully gets traction.

      • egberts1 84 days ago
        Nix is trying to be like macOS's DMG but its image file is bit more parse-able.
    • cryptonector 83 days ago
      I mean, yeah, but things like rowhammer and Spectre/Meltdown, and many other side-channels are a big deal. VMs are not really enough to prevent abuse of the full panoply of side-channels known and unknown.
    • khana 83 days ago
      [dead]
  • ploxiln 84 days ago
    > we operate in networks where outbound MQTT and HTTPS is simply not allowed (which is why we rely on encrypted DNS traffic for device-to-Console communication)

    HTTPS is not allowed (locked down for security!), so communication is smuggled over DNS? uhh ... I suspect that a lot of what the customer "security" departments do, doesn't really make sense ...

    • jmprspret 84 days ago
      DNS tunneling, or smuggling through DNS requests, is like a known malware C2 method. Seems really weird to (ab)use it for ""security""
      • thinkst 83 days ago
        Product builders can learn loads from malware in terms of deployment and operational ease. Malware needs to operate without any assistance in unknown environments. Nobody is allowing outbound comms deliberately for malware, so tunnel methods were developed.

        Networks have these capabilities, inherently they're part of the specs. But only malware seems to realise that and use it. We love reusing offensive techniques for defence (see our Canarytokens stuff), and DNS comms fits that perfectly. Our customers get an actual 2-minute install, not a 2-minute-and-then-wait-a-week-for-the-firewall-rules install.

        • marcosdumay 83 days ago
          The problem is that when you apply the malware lessons to your software, every anti-virus starts to work against you.
          • jmprspret 83 days ago
            That could be true. Especially those that opt for a heuristic/application anomalous behaviour approach. But then, you can add white listing and exceptions to most AV products.
        • jmprspret 83 days ago
          I didn't mean to imply that so for security was a bad thing. Now I read back my comment I see that is exactly how it sounds.

          I agree with you

          • thinkst 83 days ago
            We've got several product features that have been driven by our offensive security background... thanks for the prod, we'll blog it.
  • tptacek 84 days ago
    The cool kids have been combining containers and hardware virtualization for something like 10 years now (back to QEMU-Lite and kvmtool). Don't use containers if the abstraction gets in your way, of course, but if they work for you --- as a mechanism for packaging and shipping software and coordinating deployments --- there's no reason you need to roll all the way back to individually managed EC2 instances.

    A short survey on this stuff:

    https://fly.io/blog/sandboxing-and-workload-isolation/

    • mwcampbell 84 days ago
      Since you're here, I was just thinking about how feasible it would be to run a microVM-per-tenant setup like this on Fly. I guess it would require some automation to create a Fly app for each customer. Is this something you all have thought about?
      • tptacek 84 days ago
        Extraordinarily easy. It's a design goal of the system. I don't want to crud up the thread; this whole "container vs. VM vs. dedicated hardware" debate is dear to my heart. But feel free to drop me a line if you're interested in our take on it.
        • heeton 83 days ago
          I’m also interested in your take on it, if you wanted to publish a response publicly. Would love something like this for enterprise SaaS clients.
      • DAlperin 84 days ago
        Also to add, we already have lots of customers who use this model.
  • bobbob1921 84 days ago
    My big struggle with docker/containers vs VMs is the storage layer (on containers). I’m sure it’s mostly lack of experience / knowledge on my end, but I never have a doubt or concern that my storage is persistent and clearly defined when using a VM based workload. I cannot say the same for my docker/container based workloads, I’m always a tad concerned about the persistence of storage, (or the resource management in regards to storage). This becomes even more true as you deal with networked storage on both platforms
    • amluto 84 days ago
      It absolutely boggles my mind that read-only mode is not the default in Docker. By default, every container has an extra, unnamed, writable volume: its own root. Typo in your volume mount? You’re writing to root, and you will lose data.

      Of course, once this is fixed and you start using read-only containers, one wonders why “container” exists as a persistent, named concept.

      • danhor 83 days ago
        Because unless you resort to stuff like in-ram overlayfs which will also result in data loss, a lot of system software assumes it can write anywhere and will bitterly complain if not, even if it's not "real" data, and can be very annoying to fix. That's fina for carefully engineered containers, but the usual thrown together stuff docker started with gets a lot more annoying.
    • imp0cat 84 days ago
      Mount those paths that you care about to local filesystem. Otherwise, you're always one `docker system prune -a -f --volumes` from a disaster.
  • stacktrust 84 days ago
    A modern virtualization architecture can be found in the OSS pKVM L0 nested hypervisor for Android Virtualization Framework, which has some architectural overlap with HP/Bromium AX L0 + [Hyper-V | KVM | Xen] L1 + uXen L2 micro-VMs with copy-on-write memory.

    A Bromium demo circa 2014 was a web browser where every tab was an isolated VM, and every HTTP request was an isolated VM. Hundreds of VMs could be launched in a couple of hundred milliseconds. Firecracker has some overlap.

    > Lastly, this approach is almost certainly more expensive. Our instances sit idle for the most part and we pay EC2 a pretty penny for the privilege.

    With many near-idle server VMs running identical code for each customer, there may be an opportunity to use copy-on-memory-write VMs with fast restore of unique memory state, using the techniques employed in live migration.

    Xen/uXen/AX: https://www.platformsecuritysummit.com/2018/speaker/pratt/

    pKVM: https://www.youtube.com/watch?v=9npebeVFbFw

  • mikewarot 84 days ago
    It's nice to see the Principle Of Least Access (POLA) in practical use. Some day, we'll have operating systems that respect it as well.

    As more people wake up to the realization that we shouldn't trust code, I expect that the number of civilization wide outages will decrease.

    Working in the cloud, they're not going to be able to use my other favorite security tool, the data diode. Which can positively guarantee ingress of control, while still allowing egress of reporting data.

    • nrr 84 days ago
      If you're coming by after the fact and scratching your head at what a data diode is, Wikipedia's page on the subject is a decent crib document. <https://en.wikipedia.org/wiki/Unidirectional_network>
    • fsflover 84 days ago
      > Some day, we'll have operating systems that respect it as well.

      Qubes OS has been relying on it for many years. My daily driver, can't recommend it enough.

  • fsckboy 84 days ago
    just as a meta idea, i'm mystified that systems folks find it impossible to create protected mode operating systems that are protected, and then we all engage in wasteful kluges like VMs.

    i'm not anti-VM, they're great technology, i just don't think it should be the only way to get protection. VMs are incredibly inefficient... what's that you say, they're not? ok, then why aren't they integrated into protected mode OSes so that they will actually be protected?

    • ploxiln 84 days ago
      The industry tends to do this everywhere: we have a system to contain things, we made a mess of it, now we want to contain separate instances of the systems.

      For example, in AWS or GCP, you can isolate stuff for different environments or teams with security groups and IAM policies. You can separate them with separate VPCs that can't talk to each other. In GCP you can separate them with "projects". But soon that's not enough, companies want separate AWS accounts for separate teams or environments, and they need to be grouped under a parent org account, and you can have policies that grant ability to assume roles cross-account ... then you need separate associated groups of AWS accounts for separate divisions!

      It really never ends, companies will always want to take whatever nested mess they have, and instead of cleaning it up, just nest it one level further. That's why we'll be running wasm in separate processes in separate containers in separate VMs on many-core servers (probably managed with another level of virtualization, but who can tell).

    • toast0 84 days ago
      Windows has Virtualization Based Security [1], where if your system has the right hardware and the right settings, it will use the virtualization support to get you a more protected environment. IO-MMU seems like it was designed for virtualization, but you can use it in a non-virtualized setting too, etc.

      [1] https://learn.microsoft.com/en-us/windows-hardware/design/de...

    • dale_glass 83 days ago
      Security is easier when the attack surface is limited.

      An OS provides a huge amount of functionality and offers access to vast amounts of complex shared resources. Anywhere in that there can be holes.

      A VM is conceptually simpler. We don't have to prove there's no way to get to a root exploit from a myriad services running as root but available to a normal application. We're concerned about things like that a VM won't access a disk belonging to another. Which is a far simpler problem.

    • Bognar 83 days ago
      VMs as an isolation concept at the processor level are actually quite efficient, but unfortunately we use that to run whole operating systems which impose their own inefficiency. Micro-VMs that just run a process without an OS (or with an OS shim) are possible but we don't yet have good frameworks for building and using them.
    • bigbones 84 days ago
      Because it would defeat the purpose. Turns out we don't trust the systems folks all that much
    • Veserv 84 days ago
      [flagged]
      • tptacek 84 days ago
        This Theo quote from 18 years ago gets brought up a lot. It's referring to a different era in virtualization (it practically predates KVM, and certainly widespread use of KVM). You can more or less assume he's talking about running things under VMWare.

        In the interim:

        * The Linux virtualization interface has been standardized --- everything uses the same small KVM interface

        * Security research matured and, in particular, mobile device jailbreaking have made the LPE attack surface relevant, so people have audited and fuzzed the living hell out of KVM

        * Maximalist C/C++ hypervisors have been replaced by lightweight virtualization, which codebases are generally written in memory-safe Rust.

        At the very least, the "nearly full kernel" thing is totally false now; that "extra" kernel (the userland hypervisor) is now probably the most trustworthy component in the whole system.

        I would be surprised if even Theo stuck up for that argument today, but if he did, I think he'd probably get rinsed.

        • Veserv 84 days ago
          Are you claiming it has no security vulnerabilities? If yes, care to present a proof. If no, then please estimate how big of a bug bounty would result in a reported critical vulnerability.

          If I put up a 1 G$ bug bounty, do you think somebody would be able to claim it within a year? How about 10 M$? Please justify this in light of Google only offering 250 k$ [1] for a vulnerability that would totally compromise the security foundation of the multi-billion (trillion?) dollar Google Cloud.

          Please also justify why the number you present is adequate for securing the foundation of the multi-trillion dollar cloud industry. I will accept that element on its face if you say the cost would be 10 G$, but then I will demand basic proof such as formal proofs of correctness.

          [1] https://security.googleblog.com/2024/06/virtual-escape-real-...

          • tptacek 84 days ago
            I have no idea who you're talking to, but nobody on this thread has claimed anything has "no security vulnerabilities". If you think there isn't an implicit 7-figure bounty on KVM escapes, we are operating from premises too far apart for further discussion to be productive.

            My bigger problem though: I gave you a bunch of substantive, axiomatic arguments, and you responded to none of them. Of the three of them, which were you already aware of? How did your opinion change after learning about the other ones? You cited a 2007 Theo argument in 2024, so I'm going to have trouble with the idea that you were aware of all of them; again, I think even Theo would be correcting your original post.

            later

            You've written about the vulnerability brokers you know in other posts here; I assume we can just have a substantive, systems based debate about this claim, without needing to cite Theo or Joanna Rutkowska or whatever.

            • Veserv 84 days ago
              You presented arguments, but did not present any substantive, quantitative effects attributed to those changes. You have presented no quantitative means of evaluating security.

              Furthermore, you have presented no empirical evidence that those changes actually result in meaningful security. No, I do not mean “better”, I mean meaningful, as in can protect against commercially-motivated hackers.

              None of the systems actually certified to protect against state-actors used such a nonsensical process as imagining improvements and then just assuming things are better. Show a proof of correctness and a NSA pentest that fails to find any vulnerabilities, then we can start talking. Barring that, the explicit, above-board bug bounty provides a okay lower bound on security. You really need a more stable process, but it is at least a starting point.

              And besides that, a 7-figure number is paltry. Google Cloud brings in, what, 11 figures? The operations of a billion dollar company should not be secured to a level of only a few million dollars.

              So again, proofs of correctness and demonstrated protection against teams with tens to hundreds of millions in budget (i.e team of 5 competent offensive security specialists for 2 years, NSO group for a year, etc.). Anything less is insufficient to bet trillions of dollars of commerce and countless lives on.

              • tptacek 84 days ago
                So that's a no, then.

                Actual LOL at "an NSA pentest".

                Slightly later

                A friend points out I'm being too harsh here, and that lots of products do in fact get NSA pentests. They just never get the pentest report. We regret the error.

                • Veserv 84 days ago
                  I see, you are unfamiliar with actual high security development and certification.

                  Common Criteria SKPP [1].

                  “AVA_VLA EXP.4.3E The NSA evaluator shall perform independent penetration testing.

                  AVA_VLA _EXP.4.4E The NSA evaluator shall determine that the TOE is resistant to penetration attacks performed by an attacker possessing a high attack potential.”

                  Which has been successfully certified against [2] for use in the F-22 and F-35.

                  Sneering dismissal of the existence of high security systems with actual empirical evidence while demanding recognition for unproven systems is ridiculous.

                  Oh, and by the way, the SKPP, like Level A1, required formal specifications and proofs. So, no, a demand to prove such systems are secure is not a winning move.

                  [1] https://www.commoncriteriaportal.org/nfs/ccpfiles/files/ppfi...

                  [2] https://www.commoncriteriaportal.org/nfs/ccpfiles/files/epfi...

                  • kasey_junk 84 days ago
                    I’m sort of curious what your actual hypothesis is now. Are you suggesting that kvm a) has the same security surface as a general purpose os b) that its not a high enough value target to surface commercial vulnerabilities or c) that more modern development techniques, such as using Rust, don’t limit that surface beyond traditional OS’s.

                    Not for nothing but it’s hard to follow what you are even arguing.

                    • Veserv 84 days ago
                      1. Commercially motivated attacks against mere mid-sized entities (100 M$-1 G$ in revenue) can empirically derive 10s of M$ from successful attacks. Adequate security must make such attacks unprofitable. This constitutes a absolute, rock-bottom, minimum standard for effective enterprise security.

                      2. We have always known that such attacks would eventually become feasible to execute once the hackers matured. This was so obvious that government security standards/certifications such as the Rainbow Series codified in the 80s already considered such threats and placed thwarting them as the middle-levels of security. So, we have always known that "adequate" security against commercially motivated attackers has always demanded protection against skilled teams.

                      3. The commercial IT systems in regular use have never once, over 40 years, demonstrated the levels of security needed to protect against such commercially motivated attacks. These techniques and processes have failed continuously to achieve adequate security despite claiming to solve it every year for 40 years. At some point you need to stop listening.

                      4. There are systems that did achieve adequate security against commercially motivated attackers and even security against state actors in those 40 years. These standards were not pie-in-the-sky unreachable goals. They were "practical" if you actually cared about security.

                      5. KVM is in the commercial IT system category, being derived and developed by people who have never once deployed, developed, designed, or likely even seen a system known to have adequate security. Such systems DO NOT get the benefit of the doubt. There is no metric of evaluation, no means of evaluating if the theorized improvements achieve adequacy. In fact, you would be hard-pressed to find literally anybody who would stand up and say: "KVM is unhackable by any team with a budget of 10 M$" (budget including the average salary for the team members so you do not get a talented team doing it just to prove it can be done). Nobody will vouch for the system claiming it achieves even the bare-minimum requirements for adequate security I stated above. That it might be "better" than the Linux kernel is irrelevant; bad is also better than terrible, but it is still bad. At the end of the day they are not meaningfully different; they are all inadequate and unfit for purpose.

                      So, you can disagree on two primary positions:

                      1. 10 M$ is too high of a standard for mid-size enterprise security.

                      2. Commercial IT systems, such as KVM, achieve the 10 M$ level.

                      You could also argue that I am being reductive by defining security as the cost for a successful attack. But that is silly because it is the one metric that exactly aligns with the operational goals. Every other metric is a means of helping quantify the cost of a successful attack. To put it another way, if you had a magic genie that told you that number, you would not even bother with any other metrics; you have a direct line to what matters.

                      In summary, it is none of the above. The betterness of KVM is irrelevant except if it achieves adequacy. There is no evidence of that and until there is, KVM is not categorically distinct from any other inadequate system. So, the original point still holds, the reason these "secure VM" techniques are not applied to make a "secure OS" is that there are no "secure VM" techniques to be found in that corner of the world.

                      • kasey_junk 83 days ago
                        I can disagree on one major point. There are talented teams that cost well over 10M annually tasked, effectively full time, with finding exploits in kvm.

                        I’m still not sure I understand what that means for your argument but a kvm exploit, especially a jailbreak, would be one of the highest value exploits in the world.

                        • Veserv 83 days ago
                          And is the net result of those teams that they find, on average, fewer than 1 security vulnerability per year? That KVM has, on average, less than one security fix per year?

                          To quote the KVM escape Google Project Zero published in 2021 [1]:

                          "While we have not seen any in-the-wild exploits targeting hypervisors outside of competitions like Pwn2Own, these capabilities are clearly achievable for a well-financed adversary. I’ve spent around two months on this research, working as an individual with only remote access to an AMD system. Looking at the potential ROI on an exploit like this, it seems safe to assume that more people are working on similar issues right now and that vulnerabilities in KVM, Hyper-V, Xen or VMware will be exploited in-the-wild sooner or later."

                          A single, albeit highly capable, individual found a critical vulnerability in 2 months of work. KVM was already mature and the foundation of AWS at that time and people were already saying that it was highly secure and that it must be highly secure since it would be such a high value target, so logically it must be secure since only an incompetent would poorly secure high value targets, thus reverse logic means it must be secure. Despite that, 2 person-months to find an escape. What can we conclude? They actually are incompetent at security because they did poorly secure high value targets, and that entire train of logic is just wishful thinking.

                          Crowdstrike must have good deployment practices because it would be catastrophic if they, like, I dunno, mass pushed a broken patch and bricked millions of machines, and only an incompetent would use poor deployment practices on such a critical system, therefore they must have good deployment practices. Turns out, no, people are incompetent all the time. The criticality of systems is almost entirely divorced from those systems actually being treated critically unless you have good processes which is emphatically and empirically not the case in commercial IT software as a whole, let alone commercial IT software security.

                          That quote further illustrates how, despite how easy such an attack was to develop, no in-the-wild exploits were observed. Therefore, the presence of absence of known vulnerabilities and "implicit 7-figure bounty"s is no indication that exploits are hard to develop. The entire notion of some sort of bizarre ambient, osmotic proof of security just wrong-headed. You need actual, direct audits, with no discovered exploits to establish concrete evidence for a level of security. If you put a team with a budget of 10 M$ on it and they find 10 vulnerabilities, you can be fairly confidence that the development processes can not weed out vulnerabilities that require 10 M$, or possibly even 1 M$ effort to identify. You need repeated competent teams to fail to find anything at a level of effort to establish any sense of a lower bound.

                          Actually, now that I am looking at that post, it says:

                          "Even though KVM’s kernel attack surface is significantly smaller than the one exposed by a default QEMU configuration or similar user space VMMs, a KVM vulnerability has advantages that make it very valuable for an attacker:

                          ...

                          Due to the somewhat poor security history of QEMU, new user space VMMs like crosvm or Firecracker are written in Rust, a memory safe language. Of course, there can still be non-memory safety vulnerabilities or problems due to incorrect or buggy usage of the KVM APIs, but using Rust effectively prevents the large majority of bugs that were discovered in C-based user space VMMs in the past.

                          Finally, a pure KVM exploit can work against targets that use proprietary or heavily modified user space VMMs. While the big cloud providers do not go into much detail about their virtualization stacks publicly, it is safe to assume that they do not depend on an unmodified QEMU version for their production workloads. In contrast, KVM’s smaller code base makes heavy modifications unlikely (and KVM’s contributor list points at a strong tendency to upstream such modifications when they exist)."

                          So, this post already post-dates the key technologies tptacek mentioned that supposedly made modern hypervisors so "secure" such as: "everything uses the same small KVM interface", "Maximalist C/C+ hypervisors have been replaced with lightweight virtualization, which codebases are generally written in memory-safe Rust".

                          KVM, Rust for the VMM, despite that one person on the Google Project Zero team invalidated the security in 2 months. Goes to show how effective and secure it actually was after those vaunted improvements and how my prediction that it would be easily broken despite such changes was correct, where as tptacek got it wrong.

                          [1] https://googleprojectzero.blogspot.com/2021/06/an-epyc-escap...

                          • tptacek 83 days ago
                            I don't think you understand the argument, which is not that the systems component stack for virtualized workloads is entirely memory safe, but rather that the world Theo was describing in 2007 no longer exists. For instance, based on this comment, I'm not sure you understand what Theo was referring to when he described the second operating system implicated in that stack.
                            • Veserv 83 days ago
                              [flagged]
                              • tptacek 83 days ago
                                I've written roughly 250 words in this whole thread, following your original response to me. You've written 450 words in this single comment, which purports to summarize my argument. You're starting to sound like Vizzini. Unfortunately for you: I have built up a resistance to iocane powder, largely by actually reading the code you're talking about.

                                My experience, when simple, falsifiable, or technically specific arguments are met with walls of abstraction, is that there's not much chance a productive, well-informed debate is about to ensue.

                                I made my point about the currency of Ceiling Theo's take on virtualization. I'm happy with where it stands. I don't at this point even understand what we're arguing about, so here's where I'll bow out.

                                • Veserv 83 days ago
                                  I literally presented a direct counterexample falsifying your claim that KVM + Rust results in highly secure virtualization. Even though you have rejected even attempting to define what would constitute highly secure, by no reasonable metric is 2 person-months highly secure.

                                  All you have done is present things that you imagine improve things, and then demand I prove your imagination wrong without presenting any actual empirical evidence for your position.

                                  They rewrote a component in Rust, my imagination thinks it just has to be very secure. 2 person-months. Fat lot of good that did. New boss, same as the old boss. Just like I said.

      • elric 84 days ago
        Hah, I was going to post the same quote when I read the parent comment. Glad to see I'm not the only grump who remembers TDR quotes.

        But he's right. And with the endless stream of leaky CPUs and memory (spectre, rowhammer, etc) he's even more right now than he was 17 years ago.

        There are all kinds of things being done to mitigate multi-tenant security risks in the Confidential Computing space (with Trusted Execution Environments, Homomorphic Encryption, or even Secure Multiparty Computation), but these are all incredibly complex and largely bolted on to an insecure base.

        It's just really, *really*, hard to make something non-trivial fully secure. "It depends on your threat model" used to be a valid statement, but with everyone running all of their code on top of basically 3 platforms owned by megacorps, I'm not sure even that is true anymore.

        • tptacek 84 days ago
          Microarchitectural attacks are an even bigger problem for shared-kernel multitenant systems!
      • fsflover 84 days ago
        Show me a recent escape from VT-d and then you will have a point.
        • Veserv 84 days ago
          VT-x. You should get the name of the technology right before defending it. VT-d is the I/O virtualization technology.

          When did it become customary to defend people making claims of security instead of laughing in their face even though history shows them such claims to be a endless clown parade?

          How about you present the extraordinary evidence needed to support the extraordinary claim that there are no vulnerabilities? I will accept simple forms of proof such as a formal proof of correctness or a unclaimed 10 M$ bug bounty that has never been claimed.

          • tptacek 84 days ago
            Not an especially impressive flex, but I'm not above trying to dunk on people for misspelling things either, so I'm not going to high-horse you about it (obviously i am).

            The history of KVM and hardware virtualization is not an endless clown parade.

            Find a vulnerability researcher to talk to about OpenBSD sometime, though.

            https://isopenbsdsecu.re/

            • yjftsjthsd-h 84 days ago
              > Find a vulnerability researcher to talk to about OpenBSD sometime, though.

              > https://isopenbsdsecu.re/

              Notice that at no point does anyone actually show up with a working exploit.

            • fsflover 83 days ago
              I did mean VT-d, not VT-x. This is what Qubes OS uses for virtualization; my daily driver. Last escape was in 2006 by the Qubes founder.
            • Veserv 84 days ago
              OpenBSD is not secure by any measure. That Theo happens to be right about the endless clown parade is independent of his ability to develop a secure operating system.

              I mean, jeez, even Joanna Rutkowska acknowledges the foundations are iffy enough to only justify claiming “reasonably secure” for Qubes OS.

              You are making a extraordinary claim of security which stands diametrically opposed to the consensus that things are easily hacked. You need to present extraordinary evidence to support such a claim. You can see my other reply for what I would consider minimal criteria for evidence.

              • tptacek 84 days ago
                So far all I'm seeing here are appeals to the names of people who I don't believe agree with your take. You're going to need to actually defend the argument you made.
                • Veserv 84 days ago
                  You mean your argument that it is hard to find a vulnerability, despite the fact that commercial systems in Unix kernel lineage have historically been easy to hack and have never once demonstrated high robustness?

                  You have not even established what level of security you are arguing has been achieved. This is not even moving the goalposts, this is Calvinball.

                  I contend that a major cloud service, that runs trillions of dollars of commerce is at least as important as a fighter jet. The F-35 demanded a operating system certified according to the SKPP which follows in the heels of the Orange Book Level A1. That demanded a formal specification, formal proofs, and a failed penetration test by the NSA.

                  Do you contend that KVM has reached such a standard? Or do you argue that such a standard is too high? What standard should be expected? How do you verify such a standard has been achieved? How does that trace to operational security goals?

                  The operational security goal commerce needs is for the expected value of an attack to be unprofitable. How are you verifying your axiomatic arguments are moving that needle?

                  Thinking that everything is insecure and all that matters is “better” is not even binary thinking, it is unary thinking. There is no meaningful discussion to be had until you:

                  1. Establish a measure and level of security that matches operational goals.

                  2. Demonstrate proposed mechanisms empirically achieve such goals or have a track record of achieving the desired level of quality such that the reputation may provide some coarse substitute for evidence.

                  Until that point it is: “Dude, trust me. I have been wrong every time before, but I totally got it this time.”

                  • tptacek 84 days ago
                    Do you feel like this is going well for you?
                    • Veserv 83 days ago
                      Yes. You have been unable to provide any objective or quantitative means of evaluating security and reject any attempt to do so. You even reject attempts to characterize what would constitute an objective minimum bar for adequacy.

                      "I can not evaluate my work, I demand you do not evaluate my work, and I do not even know what my goal is, but I can tell you hot or cold."

                      What you have there is not engineering, it is art and is why commercial IT software security is a joke.

  • jonathanlydall 84 days ago
    Sure, it’s an option which eliminates the possibility of certain types of errors, but it’s costing you the ability to pool computing resources as efficiently as you could have with a multi-tenant approach.

    The author did acknowledge it’s a trade off, but the economics of this trade off may or may not make sense depending on how much you need to charge your customers to remain competitive with competing offerings.

  • vin10 84 days ago
    > If you wouldn't trust running it on your host, you probably shouldn't run it in a container as well.

    - From a Docker/Moby Maintainer

  • ianpurton 83 days ago
    I've solved the same problem but used Kubernetes namespaces instead.

    Each customer gets their own namespace and a namespace is locked down in terms of networking and I deploy Postgres in each namespace using the Postgres operator.

    I've built an operator for my app, so deploying the app into a namespace is as simple as deploying the manifest.

  • jefurii 84 days ago
    Using VMs as the unit allows them to move to another provider if they need to. They could even move to something like an on-prem Oxide rack if they wanted. [Yes I know, TFA lists this as a "false benefit" i.e. something they think doesn't benefit them.]
  • smitty1e 84 days ago
    > Switching to another provider would be non-trivial, and I don’t see the VM as a real benefit in this regard. The barrier to switching is still incredibly high.

    This point is made in the context of VM bits, but that switching cost could (in theory, haven't done it myself) be mitigated using, e.g. Terraform.

    The brace-for-shock barrier at the enterprise level is going to be exfiltrating all of that valuable data. Bezos is running a Hotel California for that data: "You can checkout any time you like, but you can never leave" (easily).

    • tetha 84 days ago
      Heh. We're in the process of moving a service for a few of our larger customers over due to some variety of emergencies, let's keep it at that.

      It took us 2-3 days of hustling to get the stuff running and production ready and providing the right answers. This is the "Terraform and Ansible-Stuff" stage of a real failover. In a full infrastructure failover, I'd expect it to take us 1-2 very long days to get 80% running and then up to a week to be fully back on track and another week of shaking out strange issues. And then a week or two of low-availability from the ops-team.

      However, for 3 large customers using that product, cybersecurity and compliance said no. They said no about 5-6 weeks ago and project to have an answer somewhere within the next 1-2 months. Until then, the amount of workarounds and frustration growing around it is rather scary. I hope I can contain it to some places in which there is no permanent damage for the infrastructure.

      Tech isn't necessarily the hardest thing in some spaces.

  • SunlitCat 84 days ago
    VMs are awesome for what they can offer. Docker (and the like) are kinda a lean VM for a specific tool scenario.

    What I would like to see, would be more App virtualization software which isolates the app from the underlying OS enough to provide an safe enough cage for the app.

    I know there are some commercial offerings out there (and a free one), but maybe someone can chime in has some opinions about them or know some additional ones?

    • peddling-brink 84 days ago
      That’s what containers attempt to do. But it’s not perfect. Adding a layer like gvisor helps, but again the app is still interacting with the host kernel so kernel exploits are still possible. What additional sandboxing are you thinking of?
      • SunlitCat 84 days ago
        Maybe I am a bit naive, but in my mind it's just a simple software running between the OS and the tool in question which runs said software in some kind of virtualization, passing all requests to the OS after a check what they might want to do.

        I know that's what said tools are offering, but installing (and running) docker on Windows feels like loading up a whole other OS insides OS, so that even VM (Software) looks lean compared to that!

        But I admit, that I have no real experience with docker and the like.

    • stacktrust 84 days ago
      HP business PCs ship with SureClick based on OSS uXen, https://news.ycombinator.com/item?id=41071884
      • SunlitCat 84 days ago
        Thank you for sharing, didn't know that one!
        • stacktrust 84 days ago
          It's from the original Xen team. Subsequently cloned by MS as MDAG (Defender Application Guard).
          • SunlitCat 84 days ago
            Cool! I know MDAG and actually it's a pretty neat concept, kinda.
  • er4hn 84 days ago
    One thing I wasn't able to grok from the article is orchestration of VMs. Are they using AWS to manage the VM lifecycles, restart them, etc?

    Last time I looked into this for on-prem the solutions seemed very enterprise, pay the big bux, focused. Not a lot in the OSS space. What do people use for on-prem VM orchestration that is OSS?

    • jinzo 84 days ago
      Depends what is your scale, but I used oVirt and Proxmox in the past, and it was (especially oVirt) very enterprisey but OSS.
  • JohnCClarke 84 days ago
    Question: Could you get the customer isolation by running all console access through customer specific lambdas which simply add a unique (and secret) header to all requests. Then you can run a single database with sets of tables keyed by that secret header value.

    Would give you very nearly as good isolation for much lower cost.

  • osigurdson 84 days ago
    When thinking about multi-tenancy, remember that your bank doesn't have a special VM or container, just for you.
    • dspillett 84 days ago
      No, but they do have their own VM/container(s) separate from all the other banks that use the same service, with persisted data in their own storage account with its own encryption keys, etc.

      We deal with banks in DayJob - they have separate VMs/containers for their own UAT & training environments, and when the same bank that works in multiple regulatory jurisdictions they usually have systems servicing those separated too as if there were completely separate entities (only bringing aggregate data back together for higher-up reporting purposes).

    • 01HNNWZ0MV43FF 84 days ago
      My bank doesn't even have 2FA
      • jmnicolas 84 days ago
        Mine neither and they use a 6 numbers pincode! This is ridiculous, in comparison my home wifi password is 60+ random chars long.
        • dspillett 83 days ago
          Mine, FirstDirect in the UK, recently dropped the password from “between 5 and 9 case-sensitive alphanumeric characters” to “exactly six digits” and claimed that this was just as secure as before…¹²

          My guess is that either they were cutting support costs and wanted to reduce the number of calls from people who forgot their more complicated password!. Either that or they are trying to integrate a legacy system, don't have the resources/access to improve that, so reduced everything else down to its level. When raised one on of their public facing online presences someone pointed out that it is no less than other online banks do, but if they are happy being just as good but no better than other banks there is nothing for me to be loyal to should another bank come up with a juicy looking offer.

          ----

          [1] because of course 13,759,005,982,823,100 possible combinations is no better than exactly 1,000,000 where you know most people are going to use some variant of a date of birth/marriage and makes shoulder-surfing attacks no more difficult </snark>

          [2] The only way it is really just as secure as before is if there is a significant hole elsewhere so it doesn't matter what options are available there. Going from zero security to zero security is just as secure as before, no lie!

        • leononame 84 days ago
          But they do ask you only two digits of the pin on each try and they probably will lock your account after three incorrect attempts. Not saying 6 digits is secure, but it's better than everyone using "password" if they have a string policy on incorrect attempts.

          And don't hm they have 2FA for executing transactions?

          I'm pretty sure banks are some of the most targeted IT systems. I don't trust them blindly, but when it comes to online security, I trust that they built a system that's reasonably well secured and other cases, I'd get my money back, similar to credit cards.

  • sim7c00 83 days ago
    i wish nanoVMs were better. its a cool concept leveraging the actual VM extensions for security. but all the ones i've seen hardly get into user-mode, dont have stack protectors or other trivial security features enabled etc. (smap/smep) making it super insecure anyway.

    maybe someday that market will boom a bit more, so we can run hypervisors with vms in there that host single application kind of things. like a BSD kernel that runs postgres as its init process or something. (i know thats oversimplified probarbly ::P).

    there's a lot of room in the VM space for improvement ,but pretty much all of it is impossible if you need to load an entire OS multi-purpose-multi-user into the vm.....

  • Melatonic 84 days ago
    Eventually we'll get a great system managing some form of micro VM that lots of people use and we have years of documentation and troubleshooting on

    Until then the debate between VM and Containerisation will continue

  • solatic 83 days ago
    There's nothing in Kubernetes and containers that prevents you from running single-tenant architectures (one tenant per namespace), or from colocating all single-tenant services on the same VM, and preventing multiple customers from sharing the same VM (pod affinity and anti-affinity).

    I'm not sure why the author doesn't understand that he could have his cake and eat it too.

  • Havoc 83 days ago
    So you end up with thousands of near idle AWS instances?

    There has got to be a better middle ground. Like mult tenant but strong splits ( each customer on db etc )

  • coppsilgold 84 days ago
    If you think about it virtualization is just a narrowing of the application-kernel interface. In a standard setting the application has a wide kernel interface available to it with dozens (ex. seccomp) to 100's of syscalls. A vulnerablility in any one of which could result in full system compromise.

    With virtualization the attack surface is narrowed to pretty much just the virtualization interface.

    The problem with current virtualization (or more specifically, the VMM's) is that it can be cumbersome, for example memory management is a serious annoyance. The kernel is built to hog memory for cache and etc. but you don't want the guest to be doing that - since you want to overcommit memory as guests will rarely use 100% of what is given to them (especially when the guest is just a jailed singular application), workarounds such as free page reporting and drop_caches hacks exist.

    I would expect eventually to see high performance custom kernels for a application jails - for example: gVisor[1] acts as a syscall interceptor (and can use KVM too!) and a custom kernel. Or a modified linux kernel with patched pain points for the guest.

    In effect what virtualization achieves is the ability to rollback much of the advantage of having an operating system in the first place in exchange for securely isolating the workload. But because the workload expects an underlying operating system to serve it, one has to be provided to it. So now you have a host operating system and a guest operating system and some narrow interface between the two to not be a complete clown show. As you grow the interface to properly slave the guest to the host to reduce resource consumption and gain more control you will eventually end up reimagining the operating system perhaps? Or come full circle to the BSD jail idea - imagine the host kernel having hooks into every guest kernel syscall, is this not a BSD jail with extra steps?

    [1] <https://gvisor.dev/>

  • JackSlateur 83 days ago
    Boarf

    This can be boiled down to "we use AWS' built-in security, not our own". Using EC2 instances is then nothing but a choice. You could do the exact same thing with containers (with fargate, perhaps ?) : one container per tenant, no relations between containers => same things (but cheaper).

  • udev4096 83 days ago
    > Nothing here will earn us a speaking invite to CNCF events

    This made me laugh for some reason

  • Thaxll 83 days ago
    You could use different nodepool per customers using the same k8s control plane.
  • kkfx 84 days ago
    As much stuff you add as much attack surface you have. Virtualized infra are a commercial need, an IT and Operation OBSCENITY definitively never safe in practice.
  • javier_e06 83 days ago
    Months ago I went to the movie theater. Why a $20.00 USD bill in my hand I asked the young one (yes I am that old) for a medium pop corn. "Credit Card" only. He warned me. "You have to take cash" I reminded me. He directed me to the box office where I had to purchase a $20 USD gift card which I then used to purchase the pop-corn. I never used the remaining balance. Management does not trust the crew of low wage minions, with cash, who would?

    I had my popcorn right? What is the complain here?

    I network comes done, stores will have no choice but to hand the food for free.

    I am currently not trouble shooting my solutions. I am trouble shooting the VM.