I have been working on a new functionality for the prctl syscall that addresses a common security concern with container runtimes. The /proc/self/exe symlink, which points to the executable of the running process, was the key ingredient in CVE-2019-5736, a vulnerability that allowed a malicious container to overwrite the container runtime binary on the host. The workaround deployed at the time — re-execing from a copy or using a read-only bind mount — treats the symptom rather than the cause.
Posts for: #Security
SUID binaries from a user namespace
Additional IDs that are allocated to a user through /etc/subuid and /etc/subgid must be considered as permanently allocated and never reused for any other user. The reason is that a setuid binary created inside a user namespace can retain access to any UID that was mapped in that namespace, even after the namespace is destroyed. If the same UID range is later assigned to a different user, that new user would inherit access to files owned by the old user’s containers.