Hide the current process executable file

I have been working on a new functionality for the prctl syscall that addresses a common security concern with container runtimes. The /proc/self/exe symlink, which points to the executable of the running process, was the key ingredient in CVE-2019-5736, a vulnerability that allowed a malicious container to overwrite the container runtime binary on the host. The workaround deployed at the time — re-execing from a copy or using a read-only bind mount — treats the symptom rather than the cause.

On a Linux system, under /proc it is possible to find many interesting files about a process, one of them is /proc/[pid]/exe, that points to the executable file that was used to launch the process. The man page for proc(5) states the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


/proc/[pid]/exe
	Under Linux 2.2 and later, this file is a symbolic link containing
	the actual pathname of the executed command.  This symbolic link
	can be dereferenced normally; attempting to open it will open the
	executable.
    You can even type /proc/[pid]/exe to run another copy of the same
    executable that is being run by process [pid].  If the pathname
	has been unlinked, the symbolic link will contain the string
	'(deleted)' appended to the original pathname.  In a
    multi‐threaded process, the contents of this symbolic link are
    not available if the main thread has already terminated (typically
    by calling pthread_exit(3)).

This can be useful in certain situations, but it can also pose a security risk. In particular, the /proc/self/exe file became popular with the CVE-2019-5736 vulnerability, which allowed attackers to escape from a container and gain access to the host!

Much was written about this vulnerability, in particular I suggest to learn more about the issue here: https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736/.

The short version is that the attacker was able to overwrite the container runtime executable file on the host taking advantage of the /proc/self/exe file.

The workaround that was implemented in the container runtimes was to use a read-only bind mount, or using a copying of the runtime executable and then using it to re-exec itself before handing the container execution.

To solve the root problem, I’ve proposed a new option for prctl() called PR_HIDE_SELF_EXE. This feature makes any access to /proc/self/exe always return ENOENT, effectively preventing a process from being able to access its own executable file.

It would have been better if the kernel didn’t allow such kind of issues at all, but at this point any change on how /proc/self/exe works would be a breaking change. Instead prctl(PR_HIDE_SELF_EXE) is not a breaking change since a program must opt-in to use this feature and it won’t affect programs that don’t use it.

Once the PR_HIDE_SELF_EXE flag has been set, it cannot be cleared; it will be automatically cleared when the process calls again execve.

Here there is an example of how it would look:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/prctl.h>

int main()
{
        int fd;
        errno = 0;
        fd = open("/proc/self/exe", O_RDONLY);
        printf("Got fd: %d (%m)\n", fd);
        close(fd);
        prctl(PR_HIDE_SELF_EXE, 1, 0, 0, 0);
        errno = 0;
        fd = open("/proc/self/exe", O_RDONLY);
        printf("Got fd: %d (%m)\n", fd);
        return 0;
}

and then running it:

1
2
3
4


$ gcc -o hide hide.c
$ ./hide
Got fd: 3 (Success)
Got fd: -1 (No such file or directory)