seccomp made easy

seccomp is a kernel feature that restricts what syscalls can be used by a process.

Almost every container runs with seccomp enabled to restrict its access to syscalls.

[Read More]

cgroup v2 OOM group

One annoying issue with setting a memory limit for a container is that the OOM killer kernel process can leave the container in an inconsistent state with only some processes terminated.

[Read More]

playing with seccomp notifications in the OCI runtime

A couple weekends ago I've played with seccomp user notifications and how they can be used in the OCI containers stack.

Seccomp user notifications are a powerful Linux kernel feature, that delegates syscalls handling to a userland program.

[Read More]

SUID binaries from a user namespace

Additional IDs that are allocated to a user through /etc/subuid and /etc/subgid must be considered as permanently allocated and never reused for any other user. Even if the container/user namespace where they are used is destroyed, it is possible to forge a SUID binary that will keep access to any ID present in the user namespace. This simple C program is enough to keep access to an UID that was allocated to a user namespace: [Read More]