There’s a ton of talk about gVisor containers here at KubeCon in Copenhagen. So what’s the craic?
Before this week, we had two major choices for containers:
1. Namespaced containers
2. Hypervisor containers
Namespaced containers are what Docker has always done – leverage kernel namespaces to “isolate” processes. Throw in things like cgroups and union filesystems/snapshotters and we’ve got Docker containers. The problem, is that despite being called “containers” they’re pretty crap at containing. This isn’t great from a security perspective!
Yes, it’s technically possible to lash a bunch of Seccomp and SELinux stuff on top namespaced containers to make them secure. But nobody does – it’s too complicated.
Enter hypervisor containers…
This is where every container gets its own VM. Docker supports these as well.
From a security perspective, these are waaaay more secure than namespaced containers. But they carry a pretty significant performance overhead. So you get your security at a pretty high price. I’m not a fan of these as they throw away a bunch of the main benefits of containers!
When I first heard about gVisor I got excited. I thought be a way to get rid of clunky hypervisor containers. Then I read the readme file on the GitHub repo, and I got a bit sad. I’ll tell you why in a minute.
To start a gVisor container in Docker your just pass the `–runtime=runsc` flag to the `docker run` command the. This starts a container with the runsc runtime – which is OCI compliant. Thumbs up! This spins up a container with an associated gVisor instance. The gVisor layer effectively sits in between the container and the host kernel – and it’s a one-to-one relationship. So 10 of these runsc containers requires 10 gVisor instances. Kinda like hypervisor containers…. only lighter-weight, faster, and more flexible. But sadly…. less secure.
And that’s what made me sad – if I need a high degree of isolation, I still have to pick hypervisor containers. It feels like gVisor sits in an awkward place between namespaced containers and hypervisor containers – they’re slower and less flexible than namespaced containers, but faster and less flexible than hypervisor containers (faster start times but support less syscalls). Not the end of the world, just not the silver bullet I’d hoped for. In fact, I now have an extra decision to make when deploying containers.
A few more things…
This is new… or is it
[UPDATED: 7th May 2018]
I doubt very much that gVisor is a battle-hardened internal Google project that they’ve decided to open-source. My gut-feel is that it’s probably more like the relationship between Kubernetes and Borg – Kubernetes is an open-source project, built from the ground up, based on principles learned from Borg. So gVisor is something new (ish) based on something used internally at Google. This has important implications on the maturity and production-readiness of the gVisor code. So it’s early days, expect bugs, and expect it to get a lot better in the future. Probably not for production unless you wanna be featured on the news!
I’ve been pinged by a few people at Google telling me that gVisor is indeed a pure open sourcing of an internal tool. The stuff that glues it into the likes of Docker and Kubernetes is what’s new.
It’s Linux-only. Sorry Windows.
It implements around 200 of the 400+ Linux syscalls. This means that it’s not suitable for every container (if your container makes syscalls that are not supported by gVisor then it’s not for you). Don’t get me wrong though… 200 syscalls is a lot, and no doubt the most popular ones. But more will be added in the future.
On the topic of syscalls, if your container makes a lot of syscalls, you should expect to see a performance overhead when using gVisor. That’s because gVisor is a user-space process making syscalls to the shared kernel in your behalf. Sorry folks, no free lunch. In fact, gVisor is a user-space program that emulates most of the behaviour of the Linux kernel. So your containers talk to gVisor, but think they’re talking to the host kernel.
There’s apparently a sensible seccomp profile applied to the gVisor process – so extra security going on there. And I think I was told that gVisor might carry around a 50MB overhead.
I expect gVisor to have a bright future and possibly replace vanilla namespaced containers as the go-to choice for production containers. We’ll still have hypervisor containers for those scenarios where extra isolation is required…