Posts for: #Namespaces

Rootless containers @ devconf.cz

The video of the rootless containers talk from Devconf.cz 2019 is finally available on YouTube. The talk covers how user namespaces, fuse-overlayfs, and slirp4netns come together to allow running containers entirely as an unprivileged user, without any setuid helpers beyond newuidmap and newgidmap, and discusses the remaining challenges around cgroup resource management and overlay storage performance that still need to be addressed for rootless containers to reach full feature parity.

[read more]

SUID binaries from a user namespace

Additional IDs that are allocated to a user through /etc/subuid and /etc/subgid must be considered as permanently allocated and never reused for any other user. The reason is that a setuid binary created inside a user namespace can retain access to any UID that was mapped in that namespace, even after the namespace is destroyed. If the same UID range is later assigned to a different user, that new user would inherit access to files owned by the old user’s containers.

[read more]

Network namespaces for unprivileged users

A couple of weekends ago I’ve played with libslirp and put together slirp-forwarder. The challenge with network namespaces for unprivileged users is that creating TAP or TUN devices requires privileges in the host network namespace. SliRP sidesteps this by emulating a full TCP/IP stack entirely in user space, so the helper process can forward traffic to the outside world using only normal socket operations, without needing any elevated capability.

SliRP emulates in userspace a TCP/IP stack. It can be used to circumvent the limitation of creating TAP/TUN devices in the host namespace for an unprivileged user. The program could run in the host namespace, receive messages from the network namespace where a TAP device is configured, and forward them to the outside world using unprivileged operations such as opening another connection to the destination host. Privileged operations are still not possible outside of the emulated network, as the helper program doesn’t gain any additional privilege that running as an unprivileged user.

[read more]

Become-root in a user namespace

I’ve cleaned up some C files I was using locally for hacking with user namespaces and uploaded them to a new repository on github: https://github.com/giuseppe/become-root. The tool creates a new user namespace and maps the caller to UID 0 inside it, while also mapping additional UIDs and GIDs from the ranges allocated in /etc/subuid and /etc/subgid. This is the foundation needed for rootless containers, which require a full UID/GID mapping — not just the single-UID mapping that unshare -r provides — to correctly represent file ownership inside container images.

[read more]

Current status (and problems) of running Buildah as non root

Having Buildah running in a user namespace opens the possibility of building container images as a non-root user. I’ve done some work to get Buildah running inside a user container, where it can still create and modify container images without any elevated privileges on the host. This is useful for CI environments and shared systems where granting root or setuid access is not acceptable.

There are still some open issues to get it fully working. The biggest open one is that overlayfs cannot be currently used as non root user. There is some work going on, but this will require changes in the kernel and the way extended attributes work for overlay. The alternative is far from ideal and it is to use the vfs storage driver, but it is a good starting point to get things moving and see how far we get. (Another possibility that doesn’t require changes in the kernel would be an OSTree storage for Buildah, but that is a different story).

[read more]

Use bubblewrap as an unprivileged user to run systemd images

bubblewrap is a sandboxing tool that allows unprivileged users to run containers. I was recently working on a way to allow unprivileged users to take advantage of bubblewrap to run regular system images that use systemd. To do so, it was necessary to modify bubblewrap to retain a controlled set of Linux capabilities inside the sandbox. Without those capabilities, systemd cannot perform the privilege-separation steps it needs at startup, even when running as UID 0 inside a user namespace.

[read more]