Categories
Uncategorized

to secure the supply chain, you must properly fund it

Yesterday, a new 0day vulnerability dropped in Apache Log4j. It turned out to be worse than the initial analysis: because of recursive nesting of substitutions, it is possible to execute remote code in any program which passes user data to Log4j for logging. Needless to say, the way this disclosure was handled was a disaster, as it was quickly discovered that many popular services were using Log4j, but how did we get here?

Like many projects, Log4j is only maintained by volunteers, and because of this, coordination of security response is naturally more difficult: a coordinated embargo is easy to coordinate, if you have a dedicated maintainer to do it. In the absence of a dedicated maintainer, you have chaos: as soon as a commit lands in git to fix a bug, the race is on: security maintainers are scurrying to reverse engineer what the bug you fixed was, which is why vulnerability embargoes can be helpful.

It turns out that like many other software projects in the commons, Log4j does not have a dedicated maintainer, while corporations make heavy use of the project, and so, as usual, the maintainers have to beg for scraps from their fellow peers or the corporations that use the code. Incidentally, one of the Log4j maintainers’ GitHub sponsors profile is here, if you would like to contribute some money to his cause.

When corporations sponsor the maintenance of the FOSS projects they use, they are effectively buying an insurance policy that guarantees a prompt, well-coordinated response to security problems. The newly established Open Source Program Offices at these companies should ponder which is more expensive: $100k/year salary for a maintainer of a project they are heavily dependent upon, or millions in damages from data breaches when a security vulnerability causes serious customer data exposure, like this one.

Categories
Uncategorized

open cores, ISAs, etc: what is actually open about them?

In the past few years, with the launch of RISC-V, and IBM’s OpenPOWER initiative (backed up with hardware releases such as Talos) there has been lots of talk about open hardware projects, and vendors talking about how anyone can go and make a RISC-V or OpenPOWER CPU. While there is a modicum of truth to the assertion that an upstart company could start fabricating their own RISC-V or OpenPOWER CPUs tomorrow, the reality is a lot more complex, and it basically comes down to patents.

Components of a semiconductor design

The world of semiconductors from an intellectual property point of view is a complex one, especially as the majority of semiconductor companies have become “fabless” companies, meaning that they outsource the production of their products to other companies called foundries. This is even true of the big players, for example, AMD has been a fabless company since 2009, when they spun off their foundry division into its own company called GlobalFoundries.

Usually semiconductors are designed with an automated electronics design language such as Verilog or VHDL. When a company wishes to make a semiconductor, they contract out to a foundry, which provides the company with a customized Verilog or VHDL toolchain, which generates the necessary data to generate a semiconductor according to the foundry’s processes. When you hear about a chip being made on “the TSMC 5nm process” or “Intel 7 process,” (sidenote: the Intel 7 process is actually 10nm) this is what they are talking about.

The processes and tooling used by the foundries are protected by a combination of copyright and relevant patents, which are licensed to the fabless company as part of the production contract. However, these contracts are complicated: for example, in some cases, the IP rights for the generated silicon mask for a semiconductor may actually belong to the foundry, not the company which designed it. Other contracts might impose a vendor exclusivity agreement, where the fabless company is locked into using one, and only one foundry for their chip fabrication needs.

As should be obvious by now, there is no situation where these foundry processes and tools are open source. At best, the inputs to these tools are, and this is true for RISC-V and OpenPOWER: there are VHDL cores, such as the Microwatt OpenPOWER core and Alibaba’s XuanTie RISC-V core, which can be downloaded and, with the appropriate tooling and contracts, synthesized into ASICs that go into products. These inputs are frequently described as SIP cores, or IP cores, short for Semiconductor Intellectual Property Core, the idea being that you can license a set of cores, wire them together, and have a chip.

The value of RISC-V and OpenPOWER

As discussed above, a company looking to make a SoC or similar chip would usually license a bunch of IP cores and glue them together. For example, they might license a CPU core and memory controller from ARM, a USB and PCIe controller from Synopsys, and a GPU core from either ARM or Imagination Technologies. None of the IP cores in the above configuration are open source, the company making the SoC pays a royalty to use all of the licensed IP cores in their product. Notable vendors in this space include MediaTek and Rockchip, but there are many others.

In practice, it is possible to replace the CPU core in the above designs with one of the aforementioned RISC-V or OpenPOWER ones, and there are other IP cores that can be used from, for example, the OpenCores project to replace others. However, that may, or may not, actually reduce licensing costs, as many IP cores are licensed as bundles, and there are usually third-party patents that have to be licensed.

Patents

Ultimately, we come to the unavoidable topic, patents. Both RISC-V and OpenPOWER are described as patent-free, or patent-unencumbered, but what does that actually mean? In both cases, it means that the ISA itself is unencumbered by patents… in the case of RISC-V, the ISA itself is patent-free, and in the case of OpenPOWER, there is a very liberal patent licensing pool.

But therein lies the rub: in both cases, the patent situation only covers the ISA itself. Implementation details and vendor extensions are not covered by the promises made by both communities. In other words, SiFive and IBM still have entire portfolios they can assert against any competitor in their space. RISC-V, as noted before, does not have a multilateral patent pool, and these microarchitectural patents are not covered by the OpenPOWER patent pool, as that covers the POWER ISA only.

This means that anybody competing with SiFive or IBM respectively, would have to be a patent licensee, if they are planning to produce chips which compete with SiFive or IBM, and these licensing costs are ultimately passed through to the companies licensing the SoC cores.

There are steps which both communities could take to improve the patent problems: for example, RISC-V could establish a patent pool, and require ecosystem participants to cross-license their patents through it, and IBM could widen the scope of the OpenPOWER patent pool to cover more than the POWER ISA itself. These steps would significantly improve the current situation, enabling truly free (as in freedom) silicon to be fabricated, through a combination of a RISC-V or OpenPOWER core and a set of supporting cores from OpenCores.

Categories
Uncategorized

On centralized development forges

Since the launch of SourceForge in 1999, development of FOSS has started to concentrate in centralized development forges, the latest one of course being GitHub, now owned by Microsoft. While the centralization of development talent achieved by GitHub has had positive effects on software development output towards the commons, it is also a liability: GitHub is now effectively a single point of failure for the commons, since the overwhelming majority of software is developed there.

In other words, for the sake of convenience, we have largely traded our autonomy as software maintainers to GitHub, GitLab.com, Bitbucket and SourceForge, all of which are owned by corporate interests which, by definition, are aligned with profitability, not with our interests as maintainers.

It is indeed convenient to use GitHub or GitLab.com for software development: you get all the pieces you need in order to maintain software with modern workflows, but it really does come at a cost: SourceForge, for example, was caught redistributing Windows builds of projects under their care with malware.

While GitHub or the other forges besides SourceForge have not yet attempted anything similar, it does serve as a reminder that we are trusting forges to not tamper with the packages we release as maintainers. There are other liabilities too, for example, a commercial forge may unilaterally decide to kick your project off of their service, or terminate the account of a project maintainer.

In order to protect the commons from this liability, it is imperative to build a more robust ecosystem, one which is a federated ecosystem of software development forges, which are either directly run by projects themselves, or are run by communities which directly represent the interests of the maintainers which participate in them.

Building a community of islands

One of the main arguments in favor of centralization is that everyone else is already using a given service, and so you should as well. In other words, the concentrated social graph. However, it is possible to build systems which allow the social graph to be distributed across multiple instances.

Networks like the ActivityPub fediverse (what many people incorrectly call the Mastodon network), despite their flaws, demonstrate the possibility of this. To that end, ForgeFed is an adaptation of ActivityPub allowing development forges to federate (share social graph data) with other forges. With proliferation of standards like ForgeFed, it is possible to build a replacement ecosystem that is actually trustworthy and representative of the voices and needs of software maintainers.

ForgeFed is moving along, albeit slowly. There is a reference implementation called Vervis, and there is work ongoing to integrate ForgeFed into Gitea and Gitlab CE. As this work comes to fruition, forges will be able to start federating with each other.

A side-note on Radicle

A competing proposal, known as Radicle, has been making waves lately. It should be ignored: it is just the latest in “Web3” cryptocurrency grifting, the software development equivalent to NFT mania. All problems solved by Radicle have better solutions in traditional infrastructure, or in ForgeFed. For example, to use Radicle, you must download a specialized client, and then download a blockchain with that client. This is not something most developers are going to want to do in order to just send a patch to a maintainer.

Setting up my own forge with CI

Treehouse, the community I started by accident over labor day weekend, is now offering a gitea instance with CI. It is my intention that this instance become communally governed, for the benefit of participants in the Treehouse community. We have made some modifications to gitea UI to make it more tolerable, and plan to implement ForgeFed as soon as patches are available, but it is admittedly still a work in progress. Come join us in #gitea on the Treehouse discord!

I have begun moving my own projects to this gitea instance. If you’re interested in doing the same, the instance is open to anybody who wants to participate. I will probably be publishing the specific kubernetes charts to enable this setup on your own infrastructure in the next few days, as I clean them up to properly use secrets. I also plan to do a second blog outlining the setup once everything is figured out.

It is my goal that we can move from large monolithic forges to smaller community-oriented ones, which federate with each other via ForgeFed to allow seamless collaboration without answering to corporate interests. Realization of this effort is a high priority of mine for 2022, and I intend to focus as much resources as I can on it.

Categories
Uncategorized

On CVE-2019-5021

A few years ago, it was discovered that the root account was not locked out in Alpine’s Docker images. This was not the first time that this was the case, an actually exploitable case of this was first fixed with a hotfix in 2015, but when the hotfix was replaced with appropriate use of /etc/securetty, the regression was inadvertently reintroduced for some configurations.

It should be noted that I said some configurations there. Although CVE-2019-5021 was issued a CVSSv2 score of 9.8, in reality I have yet to find any Alpine-based docker image that is actually vulnerable to CVE-2019-5021. Of course, this doesn’t mean that Alpine shouldn’t have been locking out the root user on its minirootfs releases: that was a mistake, which I am glad was quickly rectified.

Lately, however, there have been a few incidents involving CVE-2019-5021 involving less than honest actors in the security world. For example, a person named Donghyun Lee started mass-filing CVEs against Alpine-based images without actually verifying if the image was vulnerable or not, which Jerry Gamblin called out on Twitter last year. Other less than honest actors, have focused instead on attempting to use CVE-2019-5021 to sell their remediation solutions, implying a risk of vulnerability, where most likely none actually exists.

So, what configurations are actually vulnerable to CVE-2019-5021? Well, you must install both the shadow and linux-pam packages in the container to have any possibility of vulnerability to this issue. I have yet to find a single container which has installed these packages: think about it, Docker containers do not run multi-user, so there is no reason to configure PAM inside them. In essence, CVE-2019-5021 was a vulnerability due to the fact that the PAM configuration was not updated to align with the new Busybox configuration introduced in 2015.

And, for that matter, why is being able to escalate to root scary in a container? Well, if you are running a configuration without UID namespaces, root in the container is equivalent to root on the host: if the user can pivot outside the container filesystem, they can have full root access to the machine. Docker-in-docker setups with an Alpine-based container providing the Docker CLI, for example, would be easy to break out of if they were running PAM in a misconfigured way.

But in practice, nobody combines PAM with the Alpine Docker images, as there’s no reason to do so. Accordingly, be wary of marketing materials discussing CVE-2019-5021, in practice your configuration was most likely never vulnerable to it.

Categories
Uncategorized

the problematic GPL “or later” clause

The GNU General Public License started life as the GNU Emacs Public License in 1987 (the linked version is from February 1988), and has been built on the principle of copyleft: the use of the copyright system to enforce software freedom through licensing. This prototype version of the GPL was used for other packages, such as GNU Bison (in 1988), and Nethack (in 1989), and was most likely written by Richard Stallman himself.

This prototype version was also referred to as the GNU General Public License in a 1988 bulletin, so we can think of it in a way as GPLv0. This version of the GPL however, was mothballed, in Feburary 1989, with the publication of the GPLv1. One of the new features introduced in the newly rewritten GPLv1 license, was the “or later” clause:

7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and “any later version”, you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation.

Section 7 of the GNU General Public License version 1

The primary motive for the version upgrade clause, at the time, was quite simple: the concept of using copyright to enforce software freedom, was, at the time, a new and novel concept, and there was a concern that the license might have flaws or need clarifications. Accordingly, to streamline the process, they added the version upgrade clause to allow authors to consent to using new versions of the GPL as an alternative. Indeed, in the January 1991 release of the GNU Bulletin, plans to release the GPLv2 were announced as an effort to clarify the license:

We will also be releasing a version 2 of the ordinary GPL. There are no real changes in its policies, but we hope to clarify points that have led to misunderstanding and sometimes unnecessary worry.

GNU Bulletin volume 1, number 10, “New library license

After that, not much happened in the GNU project regarding licensing for a long time, until the GPLv3 drafting process in 2006. From a governance point of view, the GPLv3 drafting process was a significant accomplishment in multi-stakeholder governance, as outlined by Eben Moglen.

However, for all of the success of the GPLv3 drafting process, it must be noted that the GPL is ultimately published by the Free Software Foundation, an organization that many have questioned the long-term viability of lately. When the “or later version” clause was first introduced to the GPL, it was unthinkable that the Free Software Foundation could ever be in such a state of affairs, but now it is.

And this is ultimately the problem: what happens if the FSF shuts down, and has to liquidate? What if an intellectual property troll acquires the GNU copyright assignments, or acquires the trademark rights to the FSF name, and publishes a new GPL version? There are many possibilities to be concerned about, but developers can do two things to mitigate the damage.

First, they can stop using the “or later” clause in new GPL-licensed code. This will, effectively, limit those projects from being upgraded to new versions of the GPL, which may be published by a compromised FSF. In so doing, projects should be able to avoid relicensing discussions, as GPLv3-only code is compatible with GPLv3-or-later: the common denominator in this case is GPLv3.

Second, they can stop assigning copyright to the FSF. In the event that the FSF becomes compromised, for example, by an intellectual property troll, this limits the scope of their possible war chest for malicious GPL enforcement litigation. As we have learned from the McHardy cases involving Netfilter, in a project with multiple copyright holders, effective GPL enforcement litigation is most effective when done as a class action. In this way, dilution of the FSF copyright assignment pool protects the commons over time from exposure to malicious litigation by a compromised FSF.

Categories
Uncategorized

an inside look into the illicit ad industry

So, you want to work in ad tech, do you? Perhaps this will be a cautionary tale…

I have worked my entire life as a contractor. This has had advantages and disadvantages. For example, I am free to set my own schedule, and undertake engagements at my own leisure, but as a result my tax situation is more complicated. Another advantage is that sometimes, you get involved in an engagement that is truly fascinating. This is the story of such an engagement. Some details have been slightly changed, and specific names are elided.

A common theme amongst contractors in the technology industry is to band together to take on engagements which cannot be reasonably handled by a single contractor. Our story begins with such an engagement: a friend of mine ran a bespoke IT services company, which provided system administration, free software consulting and development. His company also handled the infrastructure deployment needs of customers who did not want to build their own infrastructure. I frequently worked with my friend on various consulting engagements over the years, including this one.

One day, I was chilling in IRC, when I got a PM from my friend: he had gotten an inquiry from a possible client that needed help reverse engineering a piece of obfuscated JavaScript. I said something like “sounds like fun, send it over, and I’ll see what I come up with.” The script in question was called popunder.js and did exactly what you think it does. The customer in question had started a popunder ad network, and needed help adapting this obfuscated popunder script to work with his system, which he built using a software called Revive Adserver, a fork of the last GPL version of OpenX.

I rolled my eyes and reverse engineered the script for him, allowing him to adapt it for his ad network. The adaptation was a success, and he wired me a sum that was triple my quoted hourly rate. This, admittedly, resulted in me being very curious about his business, as at the time, I was not used to making that kind of money. Actually, I’m still not.

A few weeks passed, and he approached me with a proposition: he needed somebody who could reverse engineer the JavaScript programs delivered by ad networks and figure out how the scripts worked. As he was paying considerably more than my advertised hourly rate, I agreed, and got to work reverse engineering the JavaScript programs he required. It was nearly a full time job, as these programs kept evolving.

In retrospect, he probably wasn’t doing anything with the reports I wrote on each piece of JavaScript I reverse engineered, as that wasn’t the actual point of the exercise: in reality, he wanted me to become familiar with the techniques ad networks used to detect fraud, so that we could develop countermeasures. In other words, the engagement evolved into a red-team type engagement, except that we weren’t testing the ad networks for their sake, but instead ours.

so-called “domain masking”: an explanation

Years ago, you might have browsed websites like The Pirate Bay and saw advertising for a popular game, or some sort of other advertisement that you wouldn’t have expected to see on The Pirate Bay. I assure you, brands were not knowingly targeting users on TPB: they were being duped via a category of techniques called domain masking.

This is a type of scam that black-hat ad networks do in order to launder illicit traffic into clean traffic: they will set up fake websites and apply for advertisements on those websites through a shell company. This gives them a clean advertising feed to serve ads from. The next step is to launder the traffic by serving those tags on empty pages on the website, so that you can use them with an <iframe> tag. After that, you use an ad server to rotate the various <iframe> tags, and done.

For a long time, this type of fraud went undetected, as the industry was not even aware that it was a thing, or perhaps it was aware, but didn’t care, as they promised more and more traffic to brands that they couldn’t otherwise fulfill. Either way, the clean networks started to talk about cracking down on domain masking, or as the black-hat networks call it, arbitrage or ROI. That means that the attack outlined above with just using <iframe> was quickly shut down, and thus began a back and forth cold war between the black-hat networks and their shell companies, and legitimate networks like Google.

non-human traffic detection

At first, the heuristics deployed by the ad networks were quite simple: they just started to check document.location.host and send it along to the ad server. If an ad tag is placed on an unauthorized domain, the account the ad tag belonged to would be flagged for an audit. But in order to make sure that the URL or domain name was appropriately escaped for inclusion in an HTTP GET request, they had to call window.encodeURIComponent(). This means that the first countermeasure we developed was something similar to:

(function () {
   let o = window.encodeURIComponent;
   window.encodeURIComponent = function (x) {
     if (x === "thepiratebay.org") return o("cleansite.com");
     return o(x);
   }
})();

This countermeasure worked for a very long time, with some networks, lasting several years. Google solved this attack by simply writing their own implementation of encodeURIComponent and protecting it behind a closure. Other networks tried to do things like:

var isFraud = false;

delete window.encodeURIComponent.toString;
if (window.encodeURIComponent.toString().indexOf("native code") < 0) {
  isFraud = true;
}

This led to countermeasures like patching XMLHTTPRequest itself:

(function() {
   let x = window.XMLHTTPRequest.prototype.open;
   window.XMLHTTPRequest.prototype.open = function (method, url, ...) {
     // code which would parse and rewrite the URL or POST payload here
     return x.bind(this, method, newurl, ...);
   };
})();

The cycle of patching on both sides is ongoing to this day. A friend of mine on Twitter referred to this tug-of-war as “core war,” which is an apt description: all of the involved actors are trying to patch each other out of being able to commit or detect subterfuge, and your browser gets slower and slower as more mitigations and countermeasures are layered on. If you’re not using an ad blocker yet, stop reading this, and install one: your browser will suddenly be a lot more performant.

enter thistle: a proxy between php-fpm and nginx

When it came to evading the automated non-human traffic detection deployed by ad networks, our game was impeccable: with each round of mitigation and countermeasure, we would only lose a few ad tags, if that. However, we kept getting caught by human review, because they would look at the referrer header and see something like http://cleansite.com/adserver/720x.php, which I mean, is totally suspect, right?

So, we decided what we needed to do was interdict the traffic, and respond with something else if appropriate, namely bare ad tags if a requesting client was already known to us. To do this, we wrote a proxy server which I named thistle, since the thistle flower is known to be the favorite of the most mischievous of faeries, and we were certainly up to mischief! The way it worked is that an <iframe> would enter at a specified URL, which would then tumble through several more URLs (blog articles), in order to ensure the referrer header always matched a real article on the fake website.

This was highly successful: we never got caught again by a manual audit, at least not for that reason.

the cycling of domains

Most advertising traffic is bought and sold using a protocol called OpenRTB, which allows so-called trading desks to buy and sell ad spots in real time, based on historical performance data. This means that, in order to keep CPM rates up, we would have to cycle in and out domains that the trading bots hadn’t seen in a while, or ever.

And that is where the operation started to break down: the fellow I was writing all this code for had his website people applying for ad tags without using an anonymizing VPN. At some point, an auditor noticed that all of these different sites were all made by people with the same IP, even though they had different shell companies, and so on, and shut the whole thing down, by sharing that intelligence with the other ad networks. It was fun while it lasted, though.

a popular torrent website, greed, the FBI, and the loss of almost all of our traffic

By the time that this operation started going off the rails, the overwhelming majority of our traffic was coming from a popular torrent website, which was eventually shut down by the feds, which basically was the end of the operation, as we lost almost all of our traffic.

I figured that was coming when the FBI contacted the person I was working for, asking if we knew anything about our advertising being featured on said torrent website. What ultimately resulted in the shutdown of the website, however, was quite funny: the owner of it was greedy, so the FBI offered to buy ads from him directly. He set up a new ad spot on the torrent website, and then sent bank wire instructions to the FBI agent investigating him, at which point they seized the website and shut it down.

Shortly afterward, the company went out of business, as there wasn’t enough traffic to keep the operation running anymore.

ad tech: capitalism refined to its purest?

As my friend Maia put it, “ad tech is about trying to scam the rest of ad tech as hard as possible, while trying to not get scammed too hard yourself.”

One can therefore argue that ad tech, like the crypto craze, is just capitalism refined to its purest: there is nothing of actual value being bought and sold at prices that are far in excess of what little value actually exists. And, like crypto, ad tech is responsible for substantial carbon dioxide emissions.

In short, do everyone a favor, and use a damn ad blocker. Oh, and don’t work in ad tech. I have wilder stories to tell about that engagement, but sometimes things are better left unsaid.

Categories
Uncategorized

spelunking through the apk-tools dependency solver

In our previous episode, I wrote a high level overview of apk’s differences verses traditional package managers, which many have cited as a helpful resource for understanding the behavior of apk when it does something different than a traditional package manager would. But that article didn’t go into depth in enough detail to explain how it all actually works. This one hopefully will.

A high level view of the moving parts

Our adventure begins at the /etc/apk/world file. This file contains the basic set of constraints imposed on the system: every constraint listed here must be solvable in order for the system to be considered correct, and no transaction may be committed that is incorrect. In other words, the package management system can be proven to be in a correct state every time a constraint is added or removed with the apk add/del commands.

Note I used the word transaction there: at its core, apk is a transactional package manager, though we have not fully exploited the transactional capabilities yet. A transaction is created by copying the current constraint list (db->world), manipulating it with apk_deps_add and then committing it with apk_solver_commit. The commitment phase does pre-flight checks directly and returns an error if the transaction fails to pass.

This means that removing packages works the same way: you copy the current constraint set, remove the desired constraint, and then commit the result, which either errors out or updates the installed constraint set after the transaction is committed.

A deeper look into the solver itself

As noted above, the primary entry point into the solver is to call the apk_solver_commit function, which at the time that I am writing this, is located in the apk-tools source code at src/commit.c:679. This function does a few pre-flight checks and then calls into the solver itself, using apk_solver_solve, which generates the actual transaction to be committed. If there are errors, the generated transaction is discarded and a report is printed instead, otherwise the generated transaction is committed using apk_solver_commit_changeset.

In essence, the code in src/commit.c can be thought of as the middle layer between the applets and the core solver. The core solver itself lives in src/solver.c and as previously noted, the main entry point is apk_solver_solve, which generates a proposed transaction to satisfy the requested constraints. This function lives at src/solver.c:1021, and is the only entry point into the solver itself.

The first thing the solver does is alphabetically sort the constraint set. If you’ve noticed that /etc/apk/world is always in alphabetical order, this is a side effect of that sorting.

Once the world constraints (the ones in /etc/apk/world) are alphabetically ordered, the next step is to figure out what package, if any, presently satisfies the constraint. This is handled by the discover_name function, which is called recursively on every constraint applicable to the system, starting with the world constraint.

The next step is to generate a fuzzy solution. This is done by walking the dependency graph again, calling the apply_constraint function. This step does basic dependency resolution, removing possible solutions which explicitly conflict. Reverse dependencies (install_if) are partially evaluated in this phase, but complex constraints (such as those involving a version constraint or multiple solutions) are not evaluated yet.

Once basic constraints are applied to the proposed updated world, the next step is to walk the dependency graph again, reconsidering the fuzzy solution generated in the step above. This step is done by the reconsider_name function, which walks over parts of the dependency graph that are still ambiguous. Finally, packages are selected to resolve these ambiguities using the select_package function. Afterwards, the final changeset is emitted by the generate_changeset function.

A deep dive into reconsider_name and select_package

As should hopefully be obvious by now, the really complicated cases are handled by the reconsider_name function. These cases include scenarios such as virtual providers, situations where more than one package satisfies the constraint set, and so on. For these scenarios, it is the responsibility of the reconsider_name function to select the most optimal package. Similarly, it is the responsibility of the select_package function to check the work done by reconsider_name and finalize the package selection if appropriate by removing the constraint from the ambiguous list.

The primary purpose of the reconsider_name function is to use discover_name and apply_constraint to move more specific constraints upwards and downwards through the dependency graph, narrowing the possible set of packages which can satisfy a given restraint, ideally to one package or less. These simplified dependency nodes are then fed into select_package to deduce the best package selection to make.

The select_package function checks each constraint, and the list of remaining candidate packages, and then picks the best package for each constraint. This is done by calling compare_providers for each possible package and until the best one is found. The heuristics checked by compare_providers are, in order:

  1. The packages are checked to see if they are NULL or not. The one that isn’t NULL wins. This is mostly as a safety check.
  2. We check to see if the user is using --latest or not. If they are, then the behavior changes a little bit. The details aren’t so important, you can read the source if you really want to know. Basically, in this step, we determine how fresh a package is, in alignment with what the user’s likely opinion on freshness would be.
  3. The provider versions are compared, if applicable. Highest version wins.
  4. The package versions themselves are compared. Highest version wins.
  5. The already installed package is preferred if the version is the same (this is helpful in upgrade transactions to make them less noisy).
  6. The provider_priority field is compared. Highest priority wins. This means that provider_priority is only checked for unversioned providers.
  7. Finally, the earliest repository in /etc/apk/repositories is preferred if all else is the same.

Hopefully, this demystifies some of the common misconceptions around how the solver works, especially how provider_priority works. Personally, I think in retrospect, despite working on the spec and implementing it in apk-tools, that provider_priority was a mistake, and the preferred solution should be to always use versioned providers (e.g. provides="foo=100") instead. The fact that we have moved to versioning cmd: providers in this way demonstrates that provider_priority isn’t really a good design.

Next time: what is the maximum number of entries allowed in /etc/apk/repositories and why is it so low?

Categories
Uncategorized

It’s time to boycott AWS

I woke up this morning not planning to write anything on this blog, much less anything about AWS. But then, as I was eating breakfast, I read a horrifying story in Mother Jones about how an AWS employee was treated as he did his best to cope with his wife’s terminal cancer.

In the free software community, Amazon (more specifically AWS) has been criticized for years for taking a largely exploitative position concerning FOSS projects. These conversations frequently result in proposals to use licensing as a weapon against AWS. In general, I believe that it would be difficult to target AWS with licensing, as statutory licenses must be fair, reasonable and non-discriminatory. But the issue of exploitation remains: AWS takes from the commons of FOSS projects and productizes that work, frequently without giving anything back.

They are, of course, allowed to do this, but at the same time, in doing so, they have frequently undercut the efforts of developers to monetize the labor involved in software maintenance, which leads to projects adopting licenses like SSPL and Commons Clause, which are significantly problematic for the commons.

On top of this, licensing-based attacks are unlikely to be effective against AWS anyway, because in the process of productization, they wind up significantly modifying the software anyway. This means that it is only another step further to just completely rewrite the software, which is something they have done in the past, and will likely do again in the future.

But my issue isn’t just the exploitative relationship AWS has with the commons (which is largely specific to AWS by the way), but rather the corporate culture of AWS. When I read the story in Mother Jones this morning, I saw no reason to disbelieve it, as I have heard many similar stories in the past from AWS employees.

As participants in the technology industry, we are free to choose our suppliers. This freedom comes with a damning responsibility, however. When we choose to engage with AWS as a supplier, we are enabling and affirming the way they do business as a company. We are affirming their exploitation of the commons.

We are also affirming their exploitative practice of placing AWS employees on a “pivot” (their parlance for a Performance Improvement Plan), which involves working employees to the bone, saying they failed to meet their PIP objectives and then firing them.

The free software community must stand against both kinds of exploitation. We must stand against it by boycotting AWS until they recalibrate their relationship with the commons, and their relationship with their employees. We must also encourage the adoption and proliferation of humane, freedom-respecting technology.

Categories
Uncategorized

don’t do clever things in configure scripts

Recently, a new version of ncurses was released and pushed to Alpine. The maintainer of ncurses in Alpine successfully built it on his machine, so he pushed it to the builders, expecting it to build fine on them. Of course, it promptly failed to build from source on the builders, because make install did not install the pkg-config .pc files to the right location.

You might think, what a weird regression, and you’d be right. After all, pkg-config files are usually just installed to $libdir/pkgconfig in any sort of autotools-based build system. Indeed, in the past, this is what ncurses did as well. However, this was clearly too robust of a solution to the problem of determining where to install the pkg-config files, and so innovation™️ happened:

Yes, you are reading this right: it is scraping debug output to find the pkg-config search path. This kind of behavior is clever, and it should absolutely be avoided in anything designed to be portable, such as a configure script. In fact, really, a build system should follow established expectations, specifically, in the autotools case, it should just default to $libdir/pkgconfig and allow the end user to override that if it makes sense to. Doing anything else is going to lead to edge cases and problems, since it is more clever than the user expects it to be.

the portable way to query the search list

If you really need to query the pkg-config search list, use pkg-config --variable=pc_path pkg-config. It is supported by every implementation of pkg-config, and it is an interface we all agreed to support explicitly for this use case. This method is supported by freedesktop.org’s implementation, OpenBSD’s implementation and pkgconf, which are the three main implementations in production use.

There is absolutely no need to scrape debug output, which is absolutely subject to change and should never be considered a stable interface in any program.

Categories
Uncategorized

the Alpine release process

It’s almost Halloween, which means it’s almost time for an Alpine release, and all hands are on deck to make sure the process goes smoothly. But what goes into making an Alpine release? What are all the moving parts? Since we are in the process of cutting a new release series, I figured I would write about how it is actually done.

the beginning of the development cycle

The development cycle for an Alpine release is 6 months long: it begins immediately once the release is branched in aports.git: at that point, there is no longer a development freeze, and minor changes start flowing in.

Prior to the beginning of the development cycle, larger changes are proposed as system change proposals, an example of which being the change proposal introducing Rust to main for the Alpine 3.16 development cycle. The largest, most invasive proposals are coordinated by the Technical Steering Committee, while others may be coordinated by smaller teams, and individual maintainers. Anybody may create a system change proposal and drive it in Alpine, regardless of whether or not they have developer rights in the project.

As these system change proposals are accepted (possibly after a few rounds of revision), the underlying steps needed to implement the change are sequenced into the overall development schedule if needed. Otherwise, they are implemented at the discretion of the contributor driving the change proposal.

soft freeze

About three weeks before release time, we set up new builders and initiate a mass rebuild of the distribution for the next release. These new builders will continue to follow changes to the edge branch until the final release is cut, at which point they will be switched to follow the branch set up for the release.

At this point, the edge branch is limited to minor, low risk changes only, unless explicitly granted an exception by the TSC. Efforts are primarily focused on making bug fix changes only, such as resolving failure-to-build-from-source (FTBFS) issues discovered during the rebuild.

release candidates

The next step before release is to do a few test releases. Release candidates are automatically produced by the builders when a developer updates the alpine-base package and tags that commit with an appropriate git tag. These candidates get uploaded to the mirror network and users begin testing them, which usually results in a few bugs being reported which get fixed prior to the final release.

If you are curious, you can read the code that is run to generate the releases yourself, it is located in the aports.git repository in the scripts folder. The main driver of the release generation process is mkimage.sh.

the final release

A few days after each release candidate is cut, the TSC (or a release engineering team delegated by the TSC) evaluates user feedback from testing the new release, and a go/no-go decision is made on making the final release. If the TSC decides the release is not ready, the a new release candidate is made.

Otherwise, if the decision to release is made, then the aports.git tree is branched, the new builders are switched to following the new branch, and the final release is cut on that branch. At that point, the edge branch is reopened for unrestricted development.

Hopefully, in just a few days, we will have shipped the Alpine 3.15.0 release to the world, with very few release candidates required to do so. So far, the release process has largely gone smoothly, but only time will tell.