cgroup v2 OOM group

One annoying issue with setting a memory limit for a container is that the OOM killer kernel process can leave the container in an inconsistent state with only some processes terminated.

[Read More]

playing with seccomp notifications in the OCI runtime

A couple weekends ago I've played with seccomp user notifications and how they can be used in the OCI containers stack.

Seccomp user notifications are a powerful Linux kernel feature, that delegates syscalls handling to a userland program.

[Read More]

SUID binaries from a user namespace

Additional IDs that are allocated to a user through /etc/subuid and /etc/subgid must be considered as permanently allocated and never reused for any other user. Even if the container/user namespace where they are used is destroyed, it is possible to forge a SUID binary that will keep access to any ID present in the user namespace. This simple C program is enough to keep access to an UID that was allocated to a user namespace: [Read More]

disposable rootless sessions

would be nice to have a way to “fork” the current session and be able to revert all the changes done, without any leftover on the file system. Playing with fuse-overlayfs, a FUSE implementation of the overlay file system and thus usable by rootless users, I realized how that is so easy to achieve, just by setting the overlay lowerdir to ‘/’ and using a temporary directory for the upper dir. [Read More]