I’ve cleaned up some C files I was using locally for hacking with user namespaces and uploaded them to a new repository on github: https://github.com/giuseppe/become-root.
Creating an user namespace can be easily done with unshare(1) and get the current user mapped to root with unshare -r COMMAND but it doesn’t support the mapping of multiple uids/gids. For doing that it is necessary to use the suid newuidmap and newgidmap tools, that allocates multiple uids/gids to unprivileged users accordingly to the configuration files:
- /etc/subuid: for additional UIDs
- /etc/subgid: for additional GIDs
$ grep gscrivano /etc/subuid
$ become-root cat /proc/self/uid_map
0 1000 1
1 110000 65536
The uid_map file under /proc shows the mappings used by the process.
become-root doesn’t allow any customization, it statically maps the current user to the root in the user namespace and any additional uid/gid are mapped starting from 1.
One feature that might be nice to have is to allow the creation of other namespaces as part of the same unshare syscall, such as creating a mount or network namespace, but I’ve not added this feature as I am not using it, I rely on unshare(1) for more features. PR are welcome.
The project I was working on in the last weeks was moved under the github.com/containers umbrella.
With Linux 4.18 it will be possible to mount a FUSE file system in an user namespace. fuse-overlayfs is an implementation in user space of the overlay file system already present in the Linux kernel, but that can be mounted only by the root user. Union file systems were around for a long time, allowing multiple layers to be stacked on top of each other where usually the last one is the only writeable.
Overlay is an union file system widely used for mounting OCI image. Each OCI image is made up of different layers, each layer can be used by different images. A list of layers, stacked on each other gives the final image that is used by a container. The last level, that is writeable, is specific for the container. This model enables different containers to use the same image that is accessible as read-only from the lower layers of the overlay file system.
The current implementation of the overlay file system is done directly in the kernel, at a very low level, allowing non privileged users to use it directly poses some security risks. In the longer term, once the security aspect is resolved, non privileged users will probably be able to mount directly an overlay file system.
For now, given the new feature in Linux 5.18, having an implementation of the overlay union in user space will enable rootless containers to use the same storage as containers running as root.
On Fedora Rawhide, where Linux 4.18 is available, it is already possible to take a taste of it with:
podman --storage-opt overlay2.fuse_program=/usr/bin/fuse-overlayfs run ...
The previous command tells podman to mount an overlay file system using the specified FUSE helper instead of mounting it directly through the kernel.