Categories
Uncategorized

the FSF’s relationship with firmware is harmful to free software users

The FSF has an unfortunate relationship with firmware, resulting in policies that made sense in the late 1980s, but actively harm users today, through recommending obsolescent equipment, requiring increased complexity in RYF-certified hardware designs and discouraging both good security practices and the creation of free replacement firmware. As a result of these policies, deficient hardware often winds up in the hands of those who need software freedom the most, in the name of RYF-certification.

the FSF and microcode

The normal Linux kernel is not recommended by the FSF, because it allows for the use of proprietary firmware with devices. Instead, they recommend Linux-libre, which disables support for proprietary firmware by ripping out code which allows for the firmware to be loaded on to devices. Libreboot, being FSF-recommended, also has this policy of disallowing firmware blobs in the source tree, despite it being a source of nothing but problems.

The end result is that users who deploy the FSF-recommended firmware and kernel wind up with varying degrees of broken configurations. Worse yet, the Linux-libre project removes warning messages which suggest a user may want to update their processor microcode to avoid Meltdown and Spectre security vulnerabilities.

While it is true that processor microcode is a proprietary blob, from a security and reliability point of view, there are two types of CPU: you can have a broken CPU, or a less broken CPU, and microcode updates are intended to give you a less broken CPU. This is particularly important because microcode updates fix real problems in the CPU, and Libreboot has patches which hack around problems caused by deficient microcode burned into the CPU at manufacturing time, since it’s not allowed to update the microcode at early boot time.

There is also a common misconception about the capabilities of processor microcode. Over the years, I have talked with numerous software freedom advocates about the microcode issue, and many of them believe that microcode is capable of reprogramming the processor as if it were an FPGA or something. In reality, the microcode is a series of hot patches to the instruction decode logic, which is largely part of a fixed function execution pipeline. In other words, you can’t microcode update a CPU to add or substantially change capabilities.

By discouraging (or outright inhibiting in the case of Linux-libre) end users to exercise their freedom (a key tenet of software freedom being that the user has agency to do whatever she wants with her computer) to update their processor microcode, the FSF pursues a policy which leaves users at risk for vulnerabilities such as Meltdown and Spectre, which were partially mitigated through a microcode update.

Purism’s Librem 5: a case study

The FSF “Respects Your Freedom” certification has a loophole so large you could drive a truck through it called the “secondary processor exception”. This is because it knows that generally speaking, entirely libre devices do not presently exist that have the capabilities people want. Purism used this loophole to sell a phone that had proprietary software blobs while passing it off as entirely free. The relevant text of the exception that allowed them to do this was:

However, there is one exception for secondary embedded processors. The exception applies to software delivered inside auxiliary and low-level processors and FPGAs, within which software installation is not intended after the user obtains the product. This can include, for instance, microcode inside a processor, firmware built into an I/O device, or the gate pattern of an FPGA. The software in such secondary processors does not count as product software.

Purism was able to accomplish this by making the Librem 5 have not one, but two processors: when the phone first boots, it uses a secondary CPU as a service processor, which loads all of the relevant blobs (such as those required to initialize the DDR4 memory) before starting the main CPU and shutting itself off. In this way, they could have all the blobs they needed to use, without having to worry about them being user visible from PureOS. Under the policy, that left them free and clear for certification.

The problem of course is that by hiding these blobs in the service processor, users are largely unaware of their existence, and are unable to leverage their freedom to study, reverse engineer and replace these blobs with libre firmware, a remedy that would typically be made available to them as part of the four freedoms.

This means that users of the Librem 5 phone are objectively harmed in three ways: first, they are unaware of the existence of the blobs to begin with, second they do not have the ability to study the blobs, and third, they do not have the ability to replace the blobs. By pursing RYF certification, Purism released a device that is objectively worse for the practical freedom of their customers.

The irony, of course, is that Purism failed to gain certification at the end of this effort, creating a device that harmed consumer freedoms, with increased complexity, just to attempt to satisfy the requirements of a certification program they ultimately failed to gain certification from.

The Novena laptop: a second case study

In 2012, Andrew “bunnie” Huang began a project to create a laptop with the most free components he could find, called the Novena open laptop. It was based on the Freescale (now NXP) i.MX 6 CPU, which has an integrated Vivante GPU and WiFi radio. Every single component in the design had data sheets freely available, and the schematic itself was published under a free license.

But because the SoC used required blobs to boot the GPU and WiFi functionality, the FSF required that these components be mechanically disabled in the product in order to receive certification, despite an ongoing effort to write replacement firmware for both components. This replacement firmware was eventually released, and people are using these chips with that free firmware today.

Had bunnie chosen to comply with the RYF certification requirements, customers which purchased the Novena laptop would have been unable to use the integrated GPU and WiFi functionality, as it was physically disabled on the board, despite the availability of free replacement firmware for those components. Thankfully, bunnie chose not to move forward on RYF certification, and thus the Novena laptop can be used with GPU acceleration and WiFi.

the hardware which remains

In practice, it is difficult to get anything much more freedom-respecting than the Novena laptop. From a right-to-repair perspective, the Framework laptop is very good, but it still uses proprietary firmware. It is, however, built on a modern x86 CPU, and could be a reasonable target for corebooting, especially now that the embedded controller firmware’s source code has been released under a free license.

However, because of the Intel ME, the Framework laptop will rightly never be RYF-certified. Instead, the FSF promotes buying old thinkpads from 2009 with Libreboot pre-installed. This is a total disservice to users, as a computer from 2009 is totally obsolete now, and as discussed above, Intel CPUs tend to be rather broken without their microcode updates.

My advice is to ignore the RYF certification program, as it is actively harmful to the practical adoption of free software, and just buy whatever you can afford that will run a free OS well. At this point, total blob-free computing is a fool’s errand, so there are a lot of AMD Ryzen-based machines that will give you decent performance and GPU acceleration without the need for proprietary drivers. Vendors which use coreboot for their systems and open the source code for their embedded controllers should be at the front of the line. But the FSF will never suggest this as an option, because they have chosen unattainable ideological purity over the pragmatism of recommending what the market can actually provide.

Categories
Uncategorized

delegation of authority from the systems programming perspective

As I have been griping on Twitter lately, about how I dislike the design of modern UNIX operating systems, an interesting conversation about object capabilities came up with the author of musl-libc. This conversation caused me to realize that systems programmers don’t really have a understanding of object capabilities, and how they can be used to achieve environments that are aligned with the principle of least authority.

In general, I think this is largely because we’ve failed to effectively disseminate the research output in this area to the software engineering community at large — for various reasons, people complete their distributed systems degrees and go to work in decentralized finance, as unfortunately, Coinbase pays better. An unfortunate reality is that the security properties guaranteed by Web3 platforms are built around object capabilities, by necessity – the output of a transaction, which then gets consumed for another transaction, is a form of object capability. And while Web3 is largely a planet-incinerating Ponzi scheme run by grifters, object capabilities are a useful concept for building practical security into real-world systems.

Most literature on this topic try to describe these concepts in the framing of, say, driving a car: by default, nobody has permission to drive a given car, so it is compliant with the principle of least authority, meanwhile the car’s key can interface with the ignition, and allow the car to be driven. In this example, the car’s key is an object capability: it is an opaque object, that can be used to acquire the right to drive the car. Afterwards, they usually go on to describe the various aspects of their system without actually discussing why anybody would want this.

the principle of least authority

The principle of least authority is hopefully obvious, it is the idea that a process should only have the rights that are necessary for the process to complete. In other words, the calculator app shouldn’t have the right to turn on your camera or microphone, or snoop around in your photos. In addition, there is an expectation that the user should be able to express consent and make a conscious decision to grant rights to programs as they request them, but this isn’t necessarily required: a user can delegate her consent to an agent to make those decisions on her behalf.

In practice, modern web browsers implement the principle of least authority. It is also implemented on some desktop computers, such as those running recent versions of macOS, and it is also implemented on iOS devices. Android has also made an attempt to implement security primitives that are aligned with the principle of least authority, but it has various design flaws that mean that apps have more authority in practice than they should.

the object-capability model

The object-capability model refers to the use of object capabilities to enforce the principle of least authority: processes are spawned with a minimal set of capabilities, and then are granted additional rights through a mediating service. These rights are given to the requestor as an opaque object, which it references when it chooses to exercise those rights, in a process sometimes referred to as capability invocation.

Because the object is opaque, it can be represented in many different ways: as a digital signature (like in the various blockchain platforms), or it can simply be a reference to a kernel handle (such as with Capsicum’s capability descriptors). Similarly, invocation can happen directly, or indirectly. For example, a mediation service which responds to a request to turn on the microphone with a sealed file descriptor over SCM_RIGHTS, would still be considered an object-capability model, as it is returning the capability to listen to the microphone by way of providing a file descriptor that can be used to read the PCM audio data from the microphone.

Some examples of real-world object capability models include: the Xen hypervisor’s xenbus and event-channels, Sandbox.framework in macOS (derived from FreeBSD’s Capsicum), Laurent Bercot’s s6-sudod, the xdg-portal specification from freedesktop.org and privilege separation as originally implemented in OpenBSD.

capability forfeiture, e.g. OpenBSD pledge(2)/unveil(2)

OpenBSD 5.9 introduced the pledge(2) system call, which allows a process to voluntarily forfeit a set of capabilities, by restricting itself to pre-defined groups of syscalls. In addition, OpenBSD 6.4 introduced the unveil(2) syscall, which allows a process to voluntarily forfeit the ability to perform filesystem I/O except to a pre-defined set of paths. In the object capability vocabulary, this is referred to as forfeiture.

The forfeiture pattern is seen in UNIX daemons as well: because you historically need root permission (or in Linux, the CAP_NET_BIND_SERVICE process capability bit) to bind to a port lower than 1024, a daemon will start as root, and then voluntarily drop itself down to an EUID which does not possess root privileges, effectively forfeiting them.

Because processes start with high levels of privilege and then gives up those rights, this approach is not aligned with the principle of least authority, but does have tangible security benefits in practice.

Linux process capabilities

Since Linux 2.2, there has been a feature called process capabilities, the CAP_NET_BIND_SERVICE mentioned above is one of the capabilities that can be set on a binary. These are basically used as an alternative to setting the SUID bit on binaries, and are totally useless outside of that use case. They have nothing to do with object capabilities, although an object capability system could be designed to facilitate many of the same things SUID binaries presently do today.

Further reading

Fuchsia uses a combination of filesystem namespacing and object capability mediation to restrict access to specific devices on a per-app basis. This could be achieved on Linux as well with namespaces and a mediation daemon.

Categories
Uncategorized

glibc is still not Y2038 compliant by default

Most of my readers are probably aware of the Y2038 issue by now. If not, it refers to 3:14:07 UTC on January 19, 2038, when 32-bit time_t will overflow. The Linux kernel has internally switched to 64-bit timekeeping several years ago, and Alpine made the jump to 64-bit time_t with the release of Alpine 3.13.

In the GNU/Linux world, the GNU libc started to support 64-bit time_t in version 2.34. Unfortunately for the rest of us, the approach they have used to support 64-bit time_t is technically deficient, following in the footsteps of other never-fully-completed transitions.

the right way to transition to new ABIs

In the musl C library, which Alpine uses, and in many other UNIX C library implementations, time_t is always 64-bit in new code, and compatibility stubs are provided for code requiring the old 32-bit functions. As code is rebuilt over time, it automatically becomes Y2038-compliant without any effort.

Microsoft went another step further in msvcrt: you get 64-bit time_t by default, but if you compile with the _USE_32BIT_TIME_T macro defined, you can still access the old 32-bit functions.

It should be noted that both approaches described above introduce zero friction to get the right thing.

how GNU handles transitions in GNU libc

The approach the GNU project have taken for transitioning to new ABIs is the exact opposite: you must explicitly ask for the new functionality, or you will never get it. This approach leaves open the possibility for changing the defaults in the future, but they have never ever done this so far.

This can be observed with the large file support extension, which is needed to handle files larger than 2GiB, you must always build your code with -D_FILE_OFFSET_BITS=64. And, similarly, if you’re on a 32-bit system, and you do not build your app with -D_TIME_BITS=64, it will not be built using an ABI that is Y2038-compliant.

This is the worst possible way to do this. Consider libraries: what if a dependency is built with -D_TIME_BITS=64, and another dependency is built without, and they need to exchange struct timespec or similar with each other? Well, in this case, your program will likely crash or have weird behavior, as you’re not consistently using the same struct timespec in the program you’ve compiled.

Fortunately, if you are targeting 32-bit systems, and you would like to manipulate files larger than 2GiB or have confidence that your code will continue to work in the year 2038, there are Linux distributions that are built on musl, like Alpine, which will allow you to not have to deal with the idiosyncrasies of the GNU libc.

Categories
Uncategorized

stop defining feature-test macros in your code

If there is any change in the C world I would like to see in 2022, it would be the abolition of #define _GNU_SOURCE. In many cases, defining this macro in C code can have harmful side effects ranging from subtle breakage to miscompilation, because of how feature-test macros work.

When writing or studying code, you’ve likely encountered something like this:

#define _GNU_SOURCE
#include <string.h>

Or worse:

#include <stdlib.h>
#include <unistd.h>
#define _XOPEN_SOURCE
#include <string.h>

The #define _XOPEN_SOURCE and #define _GNU_SOURCE in those examples are defining something known as a feature-test macro, which is used to selectively expose function declarations in the headers. These macros are necessary because some standards have conflicting definitions of functions and thus are aliased to other symbols, allowing co-existence of the conflicting functions, but only one version of that function may be defined at a time, so the feature-test macros allow the user to select which definitions they want.

The correct way to use these macros is by defining them at compile time with compiler flags, e.g. -D_XOPEN_SOURCE or -std=gnu11. This ensures that the declared feature-test macros are consistently defined while compiling the project.

As for the reason why #define _GNU_SOURCE is a thing? It’s because we have documentation which does not correctly explain the role of feature-test macros. Instead, in a given manual page, you might see language like “this function is only enabled if the _GNU_SOURCE macro is defined.”

To find out the actual way to use those macros, you would have to read feature_test_macros(7), which is usually not referenced from individual manual pages, and while that manual page shows the incorrect examples above as bad practice, it understates how much of a bad practice it actually is, and it is one of the first code examples you see on that manual page.

In conclusion, never use #define _GNU_SOURCE, always use compiler flags for this.

Categories
Uncategorized

to secure the supply chain, you must properly fund it

Yesterday, a new 0day vulnerability dropped in Apache Log4j. It turned out to be worse than the initial analysis: because of recursive nesting of substitutions, it is possible to execute remote code in any program which passes user data to Log4j for logging. Needless to say, the way this disclosure was handled was a disaster, as it was quickly discovered that many popular services were using Log4j, but how did we get here?

Like many projects, Log4j is only maintained by volunteers, and because of this, coordination of security response is naturally more difficult: a coordinated embargo is easy to coordinate, if you have a dedicated maintainer to do it. In the absence of a dedicated maintainer, you have chaos: as soon as a commit lands in git to fix a bug, the race is on: security maintainers are scurrying to reverse engineer what the bug you fixed was, which is why vulnerability embargoes can be helpful.

It turns out that like many other software projects in the commons, Log4j does not have a dedicated maintainer, while corporations make heavy use of the project, and so, as usual, the maintainers have to beg for scraps from their fellow peers or the corporations that use the code. Incidentally, one of the Log4j maintainers’ GitHub sponsors profile is here, if you would like to contribute some money to his cause.

When corporations sponsor the maintenance of the FOSS projects they use, they are effectively buying an insurance policy that guarantees a prompt, well-coordinated response to security problems. The newly established Open Source Program Offices at these companies should ponder which is more expensive: $100k/year salary for a maintainer of a project they are heavily dependent upon, or millions in damages from data breaches when a security vulnerability causes serious customer data exposure, like this one.

Categories
Uncategorized

open cores, ISAs, etc: what is actually open about them?

In the past few years, with the launch of RISC-V, and IBM’s OpenPOWER initiative (backed up with hardware releases such as Talos) there has been lots of talk about open hardware projects, and vendors talking about how anyone can go and make a RISC-V or OpenPOWER CPU. While there is a modicum of truth to the assertion that an upstart company could start fabricating their own RISC-V or OpenPOWER CPUs tomorrow, the reality is a lot more complex, and it basically comes down to patents.

Components of a semiconductor design

The world of semiconductors from an intellectual property point of view is a complex one, especially as the majority of semiconductor companies have become “fabless” companies, meaning that they outsource the production of their products to other companies called foundries. This is even true of the big players, for example, AMD has been a fabless company since 2009, when they spun off their foundry division into its own company called GlobalFoundries.

Usually semiconductors are designed with an automated electronics design language such as Verilog or VHDL. When a company wishes to make a semiconductor, they contract out to a foundry, which provides the company with a customized Verilog or VHDL toolchain, which generates the necessary data to generate a semiconductor according to the foundry’s processes. When you hear about a chip being made on “the TSMC 5nm process” or “Intel 7 process,” (sidenote: the Intel 7 process is actually 10nm) this is what they are talking about.

The processes and tooling used by the foundries are protected by a combination of copyright and relevant patents, which are licensed to the fabless company as part of the production contract. However, these contracts are complicated: for example, in some cases, the IP rights for the generated silicon mask for a semiconductor may actually belong to the foundry, not the company which designed it. Other contracts might impose a vendor exclusivity agreement, where the fabless company is locked into using one, and only one foundry for their chip fabrication needs.

As should be obvious by now, there is no situation where these foundry processes and tools are open source. At best, the inputs to these tools are, and this is true for RISC-V and OpenPOWER: there are VHDL cores, such as the Microwatt OpenPOWER core and Alibaba’s XuanTie RISC-V core, which can be downloaded and, with the appropriate tooling and contracts, synthesized into ASICs that go into products. These inputs are frequently described as SIP cores, or IP cores, short for Semiconductor Intellectual Property Core, the idea being that you can license a set of cores, wire them together, and have a chip.

The value of RISC-V and OpenPOWER

As discussed above, a company looking to make a SoC or similar chip would usually license a bunch of IP cores and glue them together. For example, they might license a CPU core and memory controller from ARM, a USB and PCIe controller from Synopsys, and a GPU core from either ARM or Imagination Technologies. None of the IP cores in the above configuration are open source, the company making the SoC pays a royalty to use all of the licensed IP cores in their product. Notable vendors in this space include MediaTek and Rockchip, but there are many others.

In practice, it is possible to replace the CPU core in the above designs with one of the aforementioned RISC-V or OpenPOWER ones, and there are other IP cores that can be used from, for example, the OpenCores project to replace others. However, that may, or may not, actually reduce licensing costs, as many IP cores are licensed as bundles, and there are usually third-party patents that have to be licensed.

Patents

Ultimately, we come to the unavoidable topic, patents. Both RISC-V and OpenPOWER are described as patent-free, or patent-unencumbered, but what does that actually mean? In both cases, it means that the ISA itself is unencumbered by patents… in the case of RISC-V, the ISA itself is patent-free, and in the case of OpenPOWER, there is a very liberal patent licensing pool.

But therein lies the rub: in both cases, the patent situation only covers the ISA itself. Implementation details and vendor extensions are not covered by the promises made by both communities. In other words, SiFive and IBM still have entire portfolios they can assert against any competitor in their space. RISC-V, as noted before, does not have a multilateral patent pool, and these microarchitectural patents are not covered by the OpenPOWER patent pool, as that covers the POWER ISA only.

This means that anybody competing with SiFive or IBM respectively, would have to be a patent licensee, if they are planning to produce chips which compete with SiFive or IBM, and these licensing costs are ultimately passed through to the companies licensing the SoC cores.

There are steps which both communities could take to improve the patent problems: for example, RISC-V could establish a patent pool, and require ecosystem participants to cross-license their patents through it, and IBM could widen the scope of the OpenPOWER patent pool to cover more than the POWER ISA itself. These steps would significantly improve the current situation, enabling truly free (as in freedom) silicon to be fabricated, through a combination of a RISC-V or OpenPOWER core and a set of supporting cores from OpenCores.

Categories
Uncategorized

On centralized development forges

Since the launch of SourceForge in 1999, development of FOSS has started to concentrate in centralized development forges, the latest one of course being GitHub, now owned by Microsoft. While the centralization of development talent achieved by GitHub has had positive effects on software development output towards the commons, it is also a liability: GitHub is now effectively a single point of failure for the commons, since the overwhelming majority of software is developed there.

In other words, for the sake of convenience, we have largely traded our autonomy as software maintainers to GitHub, GitLab.com, Bitbucket and SourceForge, all of which are owned by corporate interests which, by definition, are aligned with profitability, not with our interests as maintainers.

It is indeed convenient to use GitHub or GitLab.com for software development: you get all the pieces you need in order to maintain software with modern workflows, but it really does come at a cost: SourceForge, for example, was caught redistributing Windows builds of projects under their care with malware.

While GitHub or the other forges besides SourceForge have not yet attempted anything similar, it does serve as a reminder that we are trusting forges to not tamper with the packages we release as maintainers. There are other liabilities too, for example, a commercial forge may unilaterally decide to kick your project off of their service, or terminate the account of a project maintainer.

In order to protect the commons from this liability, it is imperative to build a more robust ecosystem, one which is a federated ecosystem of software development forges, which are either directly run by projects themselves, or are run by communities which directly represent the interests of the maintainers which participate in them.

Building a community of islands

One of the main arguments in favor of centralization is that everyone else is already using a given service, and so you should as well. In other words, the concentrated social graph. However, it is possible to build systems which allow the social graph to be distributed across multiple instances.

Networks like the ActivityPub fediverse (what many people incorrectly call the Mastodon network), despite their flaws, demonstrate the possibility of this. To that end, ForgeFed is an adaptation of ActivityPub allowing development forges to federate (share social graph data) with other forges. With proliferation of standards like ForgeFed, it is possible to build a replacement ecosystem that is actually trustworthy and representative of the voices and needs of software maintainers.

ForgeFed is moving along, albeit slowly. There is a reference implementation called Vervis, and there is work ongoing to integrate ForgeFed into Gitea and Gitlab CE. As this work comes to fruition, forges will be able to start federating with each other.

A side-note on Radicle

A competing proposal, known as Radicle, has been making waves lately. It should be ignored: it is just the latest in “Web3” cryptocurrency grifting, the software development equivalent to NFT mania. All problems solved by Radicle have better solutions in traditional infrastructure, or in ForgeFed. For example, to use Radicle, you must download a specialized client, and then download a blockchain with that client. This is not something most developers are going to want to do in order to just send a patch to a maintainer.

Setting up my own forge with CI

Treehouse, the community I started by accident over labor day weekend, is now offering a gitea instance with CI. It is my intention that this instance become communally governed, for the benefit of participants in the Treehouse community. We have made some modifications to gitea UI to make it more tolerable, and plan to implement ForgeFed as soon as patches are available, but it is admittedly still a work in progress. Come join us in #gitea on the Treehouse discord!

I have begun moving my own projects to this gitea instance. If you’re interested in doing the same, the instance is open to anybody who wants to participate. I will probably be publishing the specific kubernetes charts to enable this setup on your own infrastructure in the next few days, as I clean them up to properly use secrets. I also plan to do a second blog outlining the setup once everything is figured out.

It is my goal that we can move from large monolithic forges to smaller community-oriented ones, which federate with each other via ForgeFed to allow seamless collaboration without answering to corporate interests. Realization of this effort is a high priority of mine for 2022, and I intend to focus as much resources as I can on it.

Categories
Uncategorized

On CVE-2019-5021

A few years ago, it was discovered that the root account was not locked out in Alpine’s Docker images. This was not the first time that this was the case, an actually exploitable case of this was first fixed with a hotfix in 2015, but when the hotfix was replaced with appropriate use of /etc/securetty, the regression was inadvertently reintroduced for some configurations.

It should be noted that I said some configurations there. Although CVE-2019-5021 was issued a CVSSv2 score of 9.8, in reality I have yet to find any Alpine-based docker image that is actually vulnerable to CVE-2019-5021. Of course, this doesn’t mean that Alpine shouldn’t have been locking out the root user on its minirootfs releases: that was a mistake, which I am glad was quickly rectified.

Lately, however, there have been a few incidents involving CVE-2019-5021 involving less than honest actors in the security world. For example, a person named Donghyun Lee started mass-filing CVEs against Alpine-based images without actually verifying if the image was vulnerable or not, which Jerry Gamblin called out on Twitter last year. Other less than honest actors, have focused instead on attempting to use CVE-2019-5021 to sell their remediation solutions, implying a risk of vulnerability, where most likely none actually exists.

So, what configurations are actually vulnerable to CVE-2019-5021? Well, you must install both the shadow and linux-pam packages in the container to have any possibility of vulnerability to this issue. I have yet to find a single container which has installed these packages: think about it, Docker containers do not run multi-user, so there is no reason to configure PAM inside them. In essence, CVE-2019-5021 was a vulnerability due to the fact that the PAM configuration was not updated to align with the new Busybox configuration introduced in 2015.

And, for that matter, why is being able to escalate to root scary in a container? Well, if you are running a configuration without UID namespaces, root in the container is equivalent to root on the host: if the user can pivot outside the container filesystem, they can have full root access to the machine. Docker-in-docker setups with an Alpine-based container providing the Docker CLI, for example, would be easy to break out of if they were running PAM in a misconfigured way.

But in practice, nobody combines PAM with the Alpine Docker images, as there’s no reason to do so. Accordingly, be wary of marketing materials discussing CVE-2019-5021, in practice your configuration was most likely never vulnerable to it.

Categories
Uncategorized

the problematic GPL “or later” clause

The GNU General Public License started life as the GNU Emacs Public License in 1987 (the linked version is from February 1988), and has been built on the principle of copyleft: the use of the copyright system to enforce software freedom through licensing. This prototype version of the GPL was used for other packages, such as GNU Bison (in 1988), and Nethack (in 1989), and was most likely written by Richard Stallman himself.

This prototype version was also referred to as the GNU General Public License in a 1988 bulletin, so we can think of it in a way as GPLv0. This version of the GPL however, was mothballed, in Feburary 1989, with the publication of the GPLv1. One of the new features introduced in the newly rewritten GPLv1 license, was the “or later” clause:

7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and “any later version”, you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation.

Section 7 of the GNU General Public License version 1

The primary motive for the version upgrade clause, at the time, was quite simple: the concept of using copyright to enforce software freedom, was, at the time, a new and novel concept, and there was a concern that the license might have flaws or need clarifications. Accordingly, to streamline the process, they added the version upgrade clause to allow authors to consent to using new versions of the GPL as an alternative. Indeed, in the January 1991 release of the GNU Bulletin, plans to release the GPLv2 were announced as an effort to clarify the license:

We will also be releasing a version 2 of the ordinary GPL. There are no real changes in its policies, but we hope to clarify points that have led to misunderstanding and sometimes unnecessary worry.

GNU Bulletin volume 1, number 10, “New library license

After that, not much happened in the GNU project regarding licensing for a long time, until the GPLv3 drafting process in 2006. From a governance point of view, the GPLv3 drafting process was a significant accomplishment in multi-stakeholder governance, as outlined by Eben Moglen.

However, for all of the success of the GPLv3 drafting process, it must be noted that the GPL is ultimately published by the Free Software Foundation, an organization that many have questioned the long-term viability of lately. When the “or later version” clause was first introduced to the GPL, it was unthinkable that the Free Software Foundation could ever be in such a state of affairs, but now it is.

And this is ultimately the problem: what happens if the FSF shuts down, and has to liquidate? What if an intellectual property troll acquires the GNU copyright assignments, or acquires the trademark rights to the FSF name, and publishes a new GPL version? There are many possibilities to be concerned about, but developers can do two things to mitigate the damage.

First, they can stop using the “or later” clause in new GPL-licensed code. This will, effectively, limit those projects from being upgraded to new versions of the GPL, which may be published by a compromised FSF. In so doing, projects should be able to avoid relicensing discussions, as GPLv3-only code is compatible with GPLv3-or-later: the common denominator in this case is GPLv3.

Second, they can stop assigning copyright to the FSF. In the event that the FSF becomes compromised, for example, by an intellectual property troll, this limits the scope of their possible war chest for malicious GPL enforcement litigation. As we have learned from the McHardy cases involving Netfilter, in a project with multiple copyright holders, effective GPL enforcement litigation is most effective when done as a class action. In this way, dilution of the FSF copyright assignment pool protects the commons over time from exposure to malicious litigation by a compromised FSF.

Categories
Uncategorized

an inside look into the illicit ad industry

So, you want to work in ad tech, do you? Perhaps this will be a cautionary tale…

I have worked my entire life as a contractor. This has had advantages and disadvantages. For example, I am free to set my own schedule, and undertake engagements at my own leisure, but as a result my tax situation is more complicated. Another advantage is that sometimes, you get involved in an engagement that is truly fascinating. This is the story of such an engagement. Some details have been slightly changed, and specific names are elided.

A common theme amongst contractors in the technology industry is to band together to take on engagements which cannot be reasonably handled by a single contractor. Our story begins with such an engagement: a friend of mine ran a bespoke IT services company, which provided system administration, free software consulting and development. His company also handled the infrastructure deployment needs of customers who did not want to build their own infrastructure. I frequently worked with my friend on various consulting engagements over the years, including this one.

One day, I was chilling in IRC, when I got a PM from my friend: he had gotten an inquiry from a possible client that needed help reverse engineering a piece of obfuscated JavaScript. I said something like “sounds like fun, send it over, and I’ll see what I come up with.” The script in question was called popunder.js and did exactly what you think it does. The customer in question had started a popunder ad network, and needed help adapting this obfuscated popunder script to work with his system, which he built using a software called Revive Adserver, a fork of the last GPL version of OpenX.

I rolled my eyes and reverse engineered the script for him, allowing him to adapt it for his ad network. The adaptation was a success, and he wired me a sum that was triple my quoted hourly rate. This, admittedly, resulted in me being very curious about his business, as at the time, I was not used to making that kind of money. Actually, I’m still not.

A few weeks passed, and he approached me with a proposition: he needed somebody who could reverse engineer the JavaScript programs delivered by ad networks and figure out how the scripts worked. As he was paying considerably more than my advertised hourly rate, I agreed, and got to work reverse engineering the JavaScript programs he required. It was nearly a full time job, as these programs kept evolving.

In retrospect, he probably wasn’t doing anything with the reports I wrote on each piece of JavaScript I reverse engineered, as that wasn’t the actual point of the exercise: in reality, he wanted me to become familiar with the techniques ad networks used to detect fraud, so that we could develop countermeasures. In other words, the engagement evolved into a red-team type engagement, except that we weren’t testing the ad networks for their sake, but instead ours.

so-called “domain masking”: an explanation

Years ago, you might have browsed websites like The Pirate Bay and saw advertising for a popular game, or some sort of other advertisement that you wouldn’t have expected to see on The Pirate Bay. I assure you, brands were not knowingly targeting users on TPB: they were being duped via a category of techniques called domain masking.

This is a type of scam that black-hat ad networks do in order to launder illicit traffic into clean traffic: they will set up fake websites and apply for advertisements on those websites through a shell company. This gives them a clean advertising feed to serve ads from. The next step is to launder the traffic by serving those tags on empty pages on the website, so that you can use them with an <iframe> tag. After that, you use an ad server to rotate the various <iframe> tags, and done.

For a long time, this type of fraud went undetected, as the industry was not even aware that it was a thing, or perhaps it was aware, but didn’t care, as they promised more and more traffic to brands that they couldn’t otherwise fulfill. Either way, the clean networks started to talk about cracking down on domain masking, or as the black-hat networks call it, arbitrage or ROI. That means that the attack outlined above with just using <iframe> was quickly shut down, and thus began a back and forth cold war between the black-hat networks and their shell companies, and legitimate networks like Google.

non-human traffic detection

At first, the heuristics deployed by the ad networks were quite simple: they just started to check document.location.host and send it along to the ad server. If an ad tag is placed on an unauthorized domain, the account the ad tag belonged to would be flagged for an audit. But in order to make sure that the URL or domain name was appropriately escaped for inclusion in an HTTP GET request, they had to call window.encodeURIComponent(). This means that the first countermeasure we developed was something similar to:

(function () {
   let o = window.encodeURIComponent;
   window.encodeURIComponent = function (x) {
     if (x === "thepiratebay.org") return o("cleansite.com");
     return o(x);
   }
})();

This countermeasure worked for a very long time, with some networks, lasting several years. Google solved this attack by simply writing their own implementation of encodeURIComponent and protecting it behind a closure. Other networks tried to do things like:

var isFraud = false;

delete window.encodeURIComponent.toString;
if (window.encodeURIComponent.toString().indexOf("native code") < 0) {
  isFraud = true;
}

This led to countermeasures like patching XMLHTTPRequest itself:

(function() {
   let x = window.XMLHTTPRequest.prototype.open;
   window.XMLHTTPRequest.prototype.open = function (method, url, ...) {
     // code which would parse and rewrite the URL or POST payload here
     return x.bind(this, method, newurl, ...);
   };
})();

The cycle of patching on both sides is ongoing to this day. A friend of mine on Twitter referred to this tug-of-war as “core war,” which is an apt description: all of the involved actors are trying to patch each other out of being able to commit or detect subterfuge, and your browser gets slower and slower as more mitigations and countermeasures are layered on. If you’re not using an ad blocker yet, stop reading this, and install one: your browser will suddenly be a lot more performant.

enter thistle: a proxy between php-fpm and nginx

When it came to evading the automated non-human traffic detection deployed by ad networks, our game was impeccable: with each round of mitigation and countermeasure, we would only lose a few ad tags, if that. However, we kept getting caught by human review, because they would look at the referrer header and see something like http://cleansite.com/adserver/720x.php, which I mean, is totally suspect, right?

So, we decided what we needed to do was interdict the traffic, and respond with something else if appropriate, namely bare ad tags if a requesting client was already known to us. To do this, we wrote a proxy server which I named thistle, since the thistle flower is known to be the favorite of the most mischievous of faeries, and we were certainly up to mischief! The way it worked is that an <iframe> would enter at a specified URL, which would then tumble through several more URLs (blog articles), in order to ensure the referrer header always matched a real article on the fake website.

This was highly successful: we never got caught again by a manual audit, at least not for that reason.

the cycling of domains

Most advertising traffic is bought and sold using a protocol called OpenRTB, which allows so-called trading desks to buy and sell ad spots in real time, based on historical performance data. This means that, in order to keep CPM rates up, we would have to cycle in and out domains that the trading bots hadn’t seen in a while, or ever.

And that is where the operation started to break down: the fellow I was writing all this code for had his website people applying for ad tags without using an anonymizing VPN. At some point, an auditor noticed that all of these different sites were all made by people with the same IP, even though they had different shell companies, and so on, and shut the whole thing down, by sharing that intelligence with the other ad networks. It was fun while it lasted, though.

a popular torrent website, greed, the FBI, and the loss of almost all of our traffic

By the time that this operation started going off the rails, the overwhelming majority of our traffic was coming from a popular torrent website, which was eventually shut down by the feds, which basically was the end of the operation, as we lost almost all of our traffic.

I figured that was coming when the FBI contacted the person I was working for, asking if we knew anything about our advertising being featured on said torrent website. What ultimately resulted in the shutdown of the website, however, was quite funny: the owner of it was greedy, so the FBI offered to buy ads from him directly. He set up a new ad spot on the torrent website, and then sent bank wire instructions to the FBI agent investigating him, at which point they seized the website and shut it down.

Shortly afterward, the company went out of business, as there wasn’t enough traffic to keep the operation running anymore.

ad tech: capitalism refined to its purest?

As my friend Maia put it, “ad tech is about trying to scam the rest of ad tech as hard as possible, while trying to not get scammed too hard yourself.”

One can therefore argue that ad tech, like the crypto craze, is just capitalism refined to its purest: there is nothing of actual value being bought and sold at prices that are far in excess of what little value actually exists. And, like crypto, ad tech is responsible for substantial carbon dioxide emissions.

In short, do everyone a favor, and use a damn ad blocker. Oh, and don’t work in ad tech. I have wilder stories to tell about that engagement, but sometimes things are better left unsaid.