A few days ago, Qualys dropped CVE-2021-4034, which they have called “Pwnkit”. While Alpine itself was not directly vulnerable to this issue due to different engineering decisions made in the way musl and glibc handle SUID binaries, this is intended to be a deeper look into what went wrong to enable successful exploitation on GNU/Linux systems.

a note on blaming systemd

Before we get into this, I have seen a lot of people on Twitter blaming systemd for this vulnerability. It should be clarified that systemd has basically nothing to do with polkit, and has nothing at all to do with this vulnerability, systemd and polkit are separate projects largely maintained by different people.

We should try to be empathetic toward software maintainers, including those from systemd and polkit, so writing inflammatory posts blaming systemd or its maintainers for polkit does not really help to fix the problems that made this a useful security vulnerability.

the theory behind exploiting CVE-2021-4034

For an idea of how one might exploit CVE-2021-4034, lets look at blasty’s “blasty vs pkexec” exploit. Take a look at the code for a few minutes, and come back here. There are multiple components to this exploit that have to all come together to make it work. A friend on IRC described it as a “rube goldberg machine” when I outlined it to him.

The first component of the exploit is the creation of a GNU iconv plugin: this is used to convert data from one character set to another. The plugin itself is the final step in the pipeline, and is used to gain the root shell.

The second component of the exploit is using execve(2) to arrange for pkexec to be run in a scenario where argc < 1. Although some POSIX rules lawyers will argue that this is a valid execution state, because the POSIX specification only says that argv[0] should be the name of the program being run, I argue that it is really a nonsensical execution state under UNIX, and that defensive programming against this scenario is ridiculous, which is why I sent a patch to the Linux kernel to remove the ability to do this.

The third component of the exploit is the use of GLib by pkexec. GLib is a commonly used C development framework, and it contains a lot of helpful infrastructure for developers, but that framework comes at the cost of a large attack surface, which is undesirable for an SUID binary.

The final component of the exploit is the design decision of the GLIBC authors to attempt to sanitize the environment of SUID programs rather than simply ignore known-harmful environmental variables when running as an SUID program. In essence, Qualys figured out a way to bypass the sanitization entirely. When these things combine, we are able to use pkexec to pop a root shell, as I will demonstrate.

how things went wrong

Now that we have an understanding of what components are involved in the exploit, we can take a look at what happens from beginning to end. We have our helper plugin, which launches the root shell, and we have an understanding of the underlying configuration and its design flaws. How does all of this come together?

The exploit itself does not happen in blasty-vs-pkexec.c, that just sets up the necessary preconditions for everything else to fall into place, and then runs pkexec. But it runs pkexec in a way that basically results in an execution state that could be described as a weird machine: it uses execve(2) to launch it in an execution state where there are no arguments provided, not even an argv[0].

Because pkexec is running in this weird state that it was never designed to run in, it executes as normal, except that we wind up in a situation where argv[1] is actually the beginning of the program’s environment. The first value in the environment is lol, which is a valid argument, but not a valid environment variable, since it is missing a value. If we run pkexec lol in a terminal, we get:

[kaniini@localhost ~]$ pkexec lol
Cannot run program lol: No such file or directory

The reason why this is interesting is because that message is actually generated by g_log(), and that’s where the fun begins. In initializing the GLog subsystem, there is a code path where g_utf8_validate() gets called on argv[0]. When running as a weird machine, this validation fails, because argv[0] is NULL. This results in GLib trying to convert argv[0] to UTF-8, which uses iconv, a libc function.

On GLIBC, the iconv function is provided by the GNU libiconv framework, which supports loading plugins to add additional character sets, from a directory specified as GCONV_PATH. Normally, GCONV_PATH is removed from an SUID program’s environment because GLIBC sanitizes the environment of SUID programs, but Qualys figured out a way to glitch the sanitization, and so GCONV_PATH remains in the environment. As a result, we get a root shell as soon as it tries to convert argv[0] to UTF-8.

where do we go from here?

On Alpine and other musl-based systems, we do not use GNU libiconv, so we are not vulnerable to blasty’s PoC, and musl also makes a more robust decision: instead of trying to sanitize the environment of SUID programs, it just ignores variables which would lead to musl loading additional code, such as LD_PRELOAD entirely when running in SUID mode.

This means that ultimately three things need to be fixed: pkexec itself should be fixed (which has already been done), to close the vulnerability on older kernels, the kernel itself should be fixed to disallow this weird execution state (which my patch does), and GLIBC should be fixed to ignore dangerous environmental variables instead of trying to sanitize them.