it happened a few times in the past that users wonder why they see two
/sys/fs/cgroup mounts in their unprivileged container.
When working with unprivileged containers in Podman, users often
notice two /sys/fs/cgroup mounts if the container is not using a new
network namespace.
The Limitation of Unprivileged Users
An unprivileged user, by definition, lacks certain permissions that
are available to the root user. One of these limitations is the
inability to mount a fresh /sys filesystem within a new user
namespace, unless there is already a /sys filesystem mounted and
accessible in the current namespace, and that the user namespace also
owns the current network namespace.
When such conditions are not met, Podman uses a bind mount from the
/sys filesystem of the host to provide the container with a /sys
filesystem.
Cross-Namespace Bind Mounts
A consequence of a bind mount that crosses two user namespaces is the
kernel automatically ’locking’ the new mount, treating it as a single
entity. This has the effect of preventing the inner container from
unmounting the /sys/fs/cgroup mount, as it is considered part of the
/sys mount itself.
New cgroup mount
The /sys/fs/cgroup mount, embedded within the /sys mount, refers to
the host environment’s cgroup mount. A fresh /sys/fs/cgroup mount is
needed for the container, which is then mounted on top of the existing
embedded mount.
The consequence of this approach is the appearance of two
/sys/fs/cgroup mounts within the container, as it can seen in the
following example:
|
|