hide the current process executable file


I have been working on a new functionality for the prctl syscall utility that addresses a common security concern with container runtimes.

On a Linux system, under /proc it is possible to find many interesting files about a process, one of them is /proc/[pid]/exe, that points to the executable file that was used to launch the process. The man page for proc(5) states the following:

/proc/[pid]/exe
       Under Linux 2.2 and later, this file is a symbolic link containing the actual pathname of the executed command.  This symbolic  link  can  be  dereferenced
       normally;  attempting  to open it will open the executable.  You can even type /proc/[pid]/exe to run another copy of the same executable that is being run
       by process [pid].  If the pathname has been unlinked, the symbolic link will contain the string '(deleted)' appended to the original pathname.  In a multiā€
       threaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3)).

This can be useful in certain situations, but it can also pose a security risk. In particular, the /proc/self/exe file became popular with the CVE-2019-5736 vulnerability, which allowed attackers to escape from a container and gain access to the host!

Much was written about this vulnerability, in particular I suggest to learn more about the issue here: https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736/.

The short version is that the attacker was able to overwrite the container runtime executable file on the host taking advantage of the /proc/self/exe file.

The workaround that was implemented in the container runtimes was to use a read-only bind mount, or using a copying of the runtime executable and then using it to re-exec itself before handing the container execution.

To solve the root problem, I've proposed a new option for prctl() called PR_HIDE_SELF_EXE. This feature makes any access to /proc/self/exe always return ENOENT, effectively preventing a process from being able to access its own executable file.

It would have been better if the kernel didn't allow such kind of issues at all, but at this point any change on how /proc/self/exe works would be a breaking change. Instead prctl(PR_HIDE_SELF_EXE) is not a breaking change since a program must opt-in to use this feature and it won't affect programs that don't use it.

Once the PR_HIDE_SELF_EXE flag has been set, it cannot be cleared; it will be automatically cleared when the process calls again execve.

Here there is an example of how it would look:

#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/prctl.h>

int main()
{
        int fd;
        errno = 0;
        fd = open("/proc/self/exe", O_RDONLY);
        printf("Got fd: %d (%m)\n", fd);
        close(fd);
        prctl(PR_HIDE_SELF_EXE, 1, 0, 0, 0);
        errno = 0;
        fd = open("/proc/self/exe", O_RDONLY);
        printf("Got fd: %d (%m)\n", fd);
        return 0;
}

and then running it:

$ gcc -o hide hide.c
$ ./hide
Got fd: 3 (Success)
Got fd: -1 (No such file or directory)