Categories
Uncategorized

open cores, ISAs, etc: what is actually open about them?

In the past few years, with the launch of RISC-V, and IBM’s OpenPOWER initiative (backed up with hardware releases such as Talos) there has been lots of talk about open hardware projects, and vendors talking about how anyone can go and make a RISC-V or OpenPOWER CPU. While there is a modicum of truth to the assertion that an upstart company could start fabricating their own RISC-V or OpenPOWER CPUs tomorrow, the reality is a lot more complex, and it basically comes down to patents.

Components of a semiconductor design

The world of semiconductors from an intellectual property point of view is a complex one, especially as the majority of semiconductor companies have become “fabless” companies, meaning that they outsource the production of their products to other companies called foundries. This is even true of the big players, for example, AMD has been a fabless company since 2009, when they spun off their foundry division into its own company called GlobalFoundries.

Usually semiconductors are designed with an automated electronics design language such as Verilog or VHDL. When a company wishes to make a semiconductor, they contract out to a foundry, which provides the company with a customized Verilog or VHDL toolchain, which generates the necessary data to generate a semiconductor according to the foundry’s processes. When you hear about a chip being made on “the TSMC 5nm process” or “Intel 7 process,” (sidenote: the Intel 7 process is actually 10nm) this is what they are talking about.

The processes and tooling used by the foundries are protected by a combination of copyright and relevant patents, which are licensed to the fabless company as part of the production contract. However, these contracts are complicated: for example, in some cases, the IP rights for the generated silicon mask for a semiconductor may actually belong to the foundry, not the company which designed it. Other contracts might impose a vendor exclusivity agreement, where the fabless company is locked into using one, and only one foundry for their chip fabrication needs.

As should be obvious by now, there is no situation where these foundry processes and tools are open source. At best, the inputs to these tools are, and this is true for RISC-V and OpenPOWER: there are VHDL cores, such as the Microwatt OpenPOWER core and Alibaba’s XuanTie RISC-V core, which can be downloaded and, with the appropriate tooling and contracts, synthesized into ASICs that go into products. These inputs are frequently described as SIP cores, or IP cores, short for Semiconductor Intellectual Property Core, the idea being that you can license a set of cores, wire them together, and have a chip.

The value of RISC-V and OpenPOWER

As discussed above, a company looking to make a SoC or similar chip would usually license a bunch of IP cores and glue them together. For example, they might license a CPU core and memory controller from ARM, a USB and PCIe controller from Synopsys, and a GPU core from either ARM or Imagination Technologies. None of the IP cores in the above configuration are open source, the company making the SoC pays a royalty to use all of the licensed IP cores in their product. Notable vendors in this space include MediaTek and Rockchip, but there are many others.

In practice, it is possible to replace the CPU core in the above designs with one of the aforementioned RISC-V or OpenPOWER ones, and there are other IP cores that can be used from, for example, the OpenCores project to replace others. However, that may, or may not, actually reduce licensing costs, as many IP cores are licensed as bundles, and there are usually third-party patents that have to be licensed.

Patents

Ultimately, we come to the unavoidable topic, patents. Both RISC-V and OpenPOWER are described as patent-free, or patent-unencumbered, but what does that actually mean? In both cases, it means that the ISA itself is unencumbered by patents… in the case of RISC-V, the ISA itself is patent-free, and in the case of OpenPOWER, there is a very liberal patent licensing pool.

But therein lies the rub: in both cases, the patent situation only covers the ISA itself. Implementation details and vendor extensions are not covered by the promises made by both communities. In other words, SiFive and IBM still have entire portfolios they can assert against any competitor in their space. RISC-V, as noted before, does not have a multilateral patent pool, and these microarchitectural patents are not covered by the OpenPOWER patent pool, as that covers the POWER ISA only.

This means that anybody competing with SiFive or IBM respectively, would have to be a patent licensee, if they are planning to produce chips which compete with SiFive or IBM, and these licensing costs are ultimately passed through to the companies licensing the SoC cores.

There are steps which both communities could take to improve the patent problems: for example, RISC-V could establish a patent pool, and require ecosystem participants to cross-license their patents through it, and IBM could widen the scope of the OpenPOWER patent pool to cover more than the POWER ISA itself. These steps would significantly improve the current situation, enabling truly free (as in freedom) silicon to be fabricated, through a combination of a RISC-V or OpenPOWER core and a set of supporting cores from OpenCores.

Categories
Uncategorized

On centralized development forges

Since the launch of SourceForge in 1999, development of FOSS has started to concentrate in centralized development forges, the latest one of course being GitHub, now owned by Microsoft. While the centralization of development talent achieved by GitHub has had positive effects on software development output towards the commons, it is also a liability: GitHub is now effectively a single point of failure for the commons, since the overwhelming majority of software is developed there.

In other words, for the sake of convenience, we have largely traded our autonomy as software maintainers to GitHub, GitLab.com, Bitbucket and SourceForge, all of which are owned by corporate interests which, by definition, are aligned with profitability, not with our interests as maintainers.

It is indeed convenient to use GitHub or GitLab.com for software development: you get all the pieces you need in order to maintain software with modern workflows, but it really does come at a cost: SourceForge, for example, was caught redistributing Windows builds of projects under their care with malware.

While GitHub or the other forges besides SourceForge have not yet attempted anything similar, it does serve as a reminder that we are trusting forges to not tamper with the packages we release as maintainers. There are other liabilities too, for example, a commercial forge may unilaterally decide to kick your project off of their service, or terminate the account of a project maintainer.

In order to protect the commons from this liability, it is imperative to build a more robust ecosystem, one which is a federated ecosystem of software development forges, which are either directly run by projects themselves, or are run by communities which directly represent the interests of the maintainers which participate in them.

Building a community of islands

One of the main arguments in favor of centralization is that everyone else is already using a given service, and so you should as well. In other words, the concentrated social graph. However, it is possible to build systems which allow the social graph to be distributed across multiple instances.

Networks like the ActivityPub fediverse (what many people incorrectly call the Mastodon network), despite their flaws, demonstrate the possibility of this. To that end, ForgeFed is an adaptation of ActivityPub allowing development forges to federate (share social graph data) with other forges. With proliferation of standards like ForgeFed, it is possible to build a replacement ecosystem that is actually trustworthy and representative of the voices and needs of software maintainers.

ForgeFed is moving along, albeit slowly. There is a reference implementation called Vervis, and there is work ongoing to integrate ForgeFed into Gitea and Gitlab CE. As this work comes to fruition, forges will be able to start federating with each other.

A side-note on Radicle

A competing proposal, known as Radicle, has been making waves lately. It should be ignored: it is just the latest in “Web3” cryptocurrency grifting, the software development equivalent to NFT mania. All problems solved by Radicle have better solutions in traditional infrastructure, or in ForgeFed. For example, to use Radicle, you must download a specialized client, and then download a blockchain with that client. This is not something most developers are going to want to do in order to just send a patch to a maintainer.

Setting up my own forge with CI

Treehouse, the community I started by accident over labor day weekend, is now offering a gitea instance with CI. It is my intention that this instance become communally governed, for the benefit of participants in the Treehouse community. We have made some modifications to gitea UI to make it more tolerable, and plan to implement ForgeFed as soon as patches are available, but it is admittedly still a work in progress. Come join us in #gitea on the Treehouse discord!

I have begun moving my own projects to this gitea instance. If you’re interested in doing the same, the instance is open to anybody who wants to participate. I will probably be publishing the specific kubernetes charts to enable this setup on your own infrastructure in the next few days, as I clean them up to properly use secrets. I also plan to do a second blog outlining the setup once everything is figured out.

It is my goal that we can move from large monolithic forges to smaller community-oriented ones, which federate with each other via ForgeFed to allow seamless collaboration without answering to corporate interests. Realization of this effort is a high priority of mine for 2022, and I intend to focus as much resources as I can on it.

Categories
Uncategorized

On CVE-2019-5021

A few years ago, it was discovered that the root account was not locked out in Alpine’s Docker images. This was not the first time that this was the case, an actually exploitable case of this was first fixed with a hotfix in 2015, but when the hotfix was replaced with appropriate use of /etc/securetty, the regression was inadvertently reintroduced for some configurations.

It should be noted that I said some configurations there. Although CVE-2019-5021 was issued a CVSSv2 score of 9.8, in reality I have yet to find any Alpine-based docker image that is actually vulnerable to CVE-2019-5021. Of course, this doesn’t mean that Alpine shouldn’t have been locking out the root user on its minirootfs releases: that was a mistake, which I am glad was quickly rectified.

Lately, however, there have been a few incidents involving CVE-2019-5021 involving less than honest actors in the security world. For example, a person named Donghyun Lee started mass-filing CVEs against Alpine-based images without actually verifying if the image was vulnerable or not, which Jerry Gamblin called out on Twitter last year. Other less than honest actors, have focused instead on attempting to use CVE-2019-5021 to sell their remediation solutions, implying a risk of vulnerability, where most likely none actually exists.

So, what configurations are actually vulnerable to CVE-2019-5021? Well, you must install both the shadow and linux-pam packages in the container to have any possibility of vulnerability to this issue. I have yet to find a single container which has installed these packages: think about it, Docker containers do not run multi-user, so there is no reason to configure PAM inside them. In essence, CVE-2019-5021 was a vulnerability due to the fact that the PAM configuration was not updated to align with the new Busybox configuration introduced in 2015.

And, for that matter, why is being able to escalate to root scary in a container? Well, if you are running a configuration without UID namespaces, root in the container is equivalent to root on the host: if the user can pivot outside the container filesystem, they can have full root access to the machine. Docker-in-docker setups with an Alpine-based container providing the Docker CLI, for example, would be easy to break out of if they were running PAM in a misconfigured way.

But in practice, nobody combines PAM with the Alpine Docker images, as there’s no reason to do so. Accordingly, be wary of marketing materials discussing CVE-2019-5021, in practice your configuration was most likely never vulnerable to it.

Categories
Uncategorized

the problematic GPL “or later” clause

The GNU General Public License started life as the GNU Emacs Public License in 1987 (the linked version is from February 1988), and has been built on the principle of copyleft: the use of the copyright system to enforce software freedom through licensing. This prototype version of the GPL was used for other packages, such as GNU Bison (in 1988), and Nethack (in 1989), and was most likely written by Richard Stallman himself.

This prototype version was also referred to as the GNU General Public License in a 1988 bulletin, so we can think of it in a way as GPLv0. This version of the GPL however, was mothballed, in Feburary 1989, with the publication of the GPLv1. One of the new features introduced in the newly rewritten GPLv1 license, was the “or later” clause:

7. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Program specifies a version number of the license which applies to it and “any later version”, you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the license, you may choose any version ever published by the Free Software Foundation.

Section 7 of the GNU General Public License version 1

The primary motive for the version upgrade clause, at the time, was quite simple: the concept of using copyright to enforce software freedom, was, at the time, a new and novel concept, and there was a concern that the license might have flaws or need clarifications. Accordingly, to streamline the process, they added the version upgrade clause to allow authors to consent to using new versions of the GPL as an alternative. Indeed, in the January 1991 release of the GNU Bulletin, plans to release the GPLv2 were announced as an effort to clarify the license:

We will also be releasing a version 2 of the ordinary GPL. There are no real changes in its policies, but we hope to clarify points that have led to misunderstanding and sometimes unnecessary worry.

GNU Bulletin volume 1, number 10, “New library license

After that, not much happened in the GNU project regarding licensing for a long time, until the GPLv3 drafting process in 2006. From a governance point of view, the GPLv3 drafting process was a significant accomplishment in multi-stakeholder governance, as outlined by Eben Moglen.

However, for all of the success of the GPLv3 drafting process, it must be noted that the GPL is ultimately published by the Free Software Foundation, an organization that many have questioned the long-term viability of lately. When the “or later version” clause was first introduced to the GPL, it was unthinkable that the Free Software Foundation could ever be in such a state of affairs, but now it is.

And this is ultimately the problem: what happens if the FSF shuts down, and has to liquidate? What if an intellectual property troll acquires the GNU copyright assignments, or acquires the trademark rights to the FSF name, and publishes a new GPL version? There are many possibilities to be concerned about, but developers can do two things to mitigate the damage.

First, they can stop using the “or later” clause in new GPL-licensed code. This will, effectively, limit those projects from being upgraded to new versions of the GPL, which may be published by a compromised FSF. In so doing, projects should be able to avoid relicensing discussions, as GPLv3-only code is compatible with GPLv3-or-later: the common denominator in this case is GPLv3.

Second, they can stop assigning copyright to the FSF. In the event that the FSF becomes compromised, for example, by an intellectual property troll, this limits the scope of their possible war chest for malicious GPL enforcement litigation. As we have learned from the McHardy cases involving Netfilter, in a project with multiple copyright holders, effective GPL enforcement litigation is most effective when done as a class action. In this way, dilution of the FSF copyright assignment pool protects the commons over time from exposure to malicious litigation by a compromised FSF.

Categories
Uncategorized

an inside look into the illicit ad industry

So, you want to work in ad tech, do you? Perhaps this will be a cautionary tale…

I have worked my entire life as a contractor. This has had advantages and disadvantages. For example, I am free to set my own schedule, and undertake engagements at my own leisure, but as a result my tax situation is more complicated. Another advantage is that sometimes, you get involved in an engagement that is truly fascinating. This is the story of such an engagement. Some details have been slightly changed, and specific names are elided.

A common theme amongst contractors in the technology industry is to band together to take on engagements which cannot be reasonably handled by a single contractor. Our story begins with such an engagement: a friend of mine ran a bespoke IT services company, which provided system administration, free software consulting and development. His company also handled the infrastructure deployment needs of customers who did not want to build their own infrastructure. I frequently worked with my friend on various consulting engagements over the years, including this one.

One day, I was chilling in IRC, when I got a PM from my friend: he had gotten an inquiry from a possible client that needed help reverse engineering a piece of obfuscated JavaScript. I said something like “sounds like fun, send it over, and I’ll see what I come up with.” The script in question was called popunder.js and did exactly what you think it does. The customer in question had started a popunder ad network, and needed help adapting this obfuscated popunder script to work with his system, which he built using a software called Revive Adserver, a fork of the last GPL version of OpenX.

I rolled my eyes and reverse engineered the script for him, allowing him to adapt it for his ad network. The adaptation was a success, and he wired me a sum that was triple my quoted hourly rate. This, admittedly, resulted in me being very curious about his business, as at the time, I was not used to making that kind of money. Actually, I’m still not.

A few weeks passed, and he approached me with a proposition: he needed somebody who could reverse engineer the JavaScript programs delivered by ad networks and figure out how the scripts worked. As he was paying considerably more than my advertised hourly rate, I agreed, and got to work reverse engineering the JavaScript programs he required. It was nearly a full time job, as these programs kept evolving.

In retrospect, he probably wasn’t doing anything with the reports I wrote on each piece of JavaScript I reverse engineered, as that wasn’t the actual point of the exercise: in reality, he wanted me to become familiar with the techniques ad networks used to detect fraud, so that we could develop countermeasures. In other words, the engagement evolved into a red-team type engagement, except that we weren’t testing the ad networks for their sake, but instead ours.

so-called “domain masking”: an explanation

Years ago, you might have browsed websites like The Pirate Bay and saw advertising for a popular game, or some sort of other advertisement that you wouldn’t have expected to see on The Pirate Bay. I assure you, brands were not knowingly targeting users on TPB: they were being duped via a category of techniques called domain masking.

This is a type of scam that black-hat ad networks do in order to launder illicit traffic into clean traffic: they will set up fake websites and apply for advertisements on those websites through a shell company. This gives them a clean advertising feed to serve ads from. The next step is to launder the traffic by serving those tags on empty pages on the website, so that you can use them with an <iframe> tag. After that, you use an ad server to rotate the various <iframe> tags, and done.

For a long time, this type of fraud went undetected, as the industry was not even aware that it was a thing, or perhaps it was aware, but didn’t care, as they promised more and more traffic to brands that they couldn’t otherwise fulfill. Either way, the clean networks started to talk about cracking down on domain masking, or as the black-hat networks call it, arbitrage or ROI. That means that the attack outlined above with just using <iframe> was quickly shut down, and thus began a back and forth cold war between the black-hat networks and their shell companies, and legitimate networks like Google.

non-human traffic detection

At first, the heuristics deployed by the ad networks were quite simple: they just started to check document.location.host and send it along to the ad server. If an ad tag is placed on an unauthorized domain, the account the ad tag belonged to would be flagged for an audit. But in order to make sure that the URL or domain name was appropriately escaped for inclusion in an HTTP GET request, they had to call window.encodeURIComponent(). This means that the first countermeasure we developed was something similar to:

(function () {
   let o = window.encodeURIComponent;
   window.encodeURIComponent = function (x) {
     if (x === "thepiratebay.org") return o("cleansite.com");
     return o(x);
   }
})();

This countermeasure worked for a very long time, with some networks, lasting several years. Google solved this attack by simply writing their own implementation of encodeURIComponent and protecting it behind a closure. Other networks tried to do things like:

var isFraud = false;

delete window.encodeURIComponent.toString;
if (window.encodeURIComponent.toString().indexOf("native code") < 0) {
  isFraud = true;
}

This led to countermeasures like patching XMLHTTPRequest itself:

(function() {
   let x = window.XMLHTTPRequest.prototype.open;
   window.XMLHTTPRequest.prototype.open = function (method, url, ...) {
     // code which would parse and rewrite the URL or POST payload here
     return x.bind(this, method, newurl, ...);
   };
})();

The cycle of patching on both sides is ongoing to this day. A friend of mine on Twitter referred to this tug-of-war as “core war,” which is an apt description: all of the involved actors are trying to patch each other out of being able to commit or detect subterfuge, and your browser gets slower and slower as more mitigations and countermeasures are layered on. If you’re not using an ad blocker yet, stop reading this, and install one: your browser will suddenly be a lot more performant.

enter thistle: a proxy between php-fpm and nginx

When it came to evading the automated non-human traffic detection deployed by ad networks, our game was impeccable: with each round of mitigation and countermeasure, we would only lose a few ad tags, if that. However, we kept getting caught by human review, because they would look at the referrer header and see something like http://cleansite.com/adserver/720x.php, which I mean, is totally suspect, right?

So, we decided what we needed to do was interdict the traffic, and respond with something else if appropriate, namely bare ad tags if a requesting client was already known to us. To do this, we wrote a proxy server which I named thistle, since the thistle flower is known to be the favorite of the most mischievous of faeries, and we were certainly up to mischief! The way it worked is that an <iframe> would enter at a specified URL, which would then tumble through several more URLs (blog articles), in order to ensure the referrer header always matched a real article on the fake website.

This was highly successful: we never got caught again by a manual audit, at least not for that reason.

the cycling of domains

Most advertising traffic is bought and sold using a protocol called OpenRTB, which allows so-called trading desks to buy and sell ad spots in real time, based on historical performance data. This means that, in order to keep CPM rates up, we would have to cycle in and out domains that the trading bots hadn’t seen in a while, or ever.

And that is where the operation started to break down: the fellow I was writing all this code for had his website people applying for ad tags without using an anonymizing VPN. At some point, an auditor noticed that all of these different sites were all made by people with the same IP, even though they had different shell companies, and so on, and shut the whole thing down, by sharing that intelligence with the other ad networks. It was fun while it lasted, though.

a popular torrent website, greed, the FBI, and the loss of almost all of our traffic

By the time that this operation started going off the rails, the overwhelming majority of our traffic was coming from a popular torrent website, which was eventually shut down by the feds, which basically was the end of the operation, as we lost almost all of our traffic.

I figured that was coming when the FBI contacted the person I was working for, asking if we knew anything about our advertising being featured on said torrent website. What ultimately resulted in the shutdown of the website, however, was quite funny: the owner of it was greedy, so the FBI offered to buy ads from him directly. He set up a new ad spot on the torrent website, and then sent bank wire instructions to the FBI agent investigating him, at which point they seized the website and shut it down.

Shortly afterward, the company went out of business, as there wasn’t enough traffic to keep the operation running anymore.

ad tech: capitalism refined to its purest?

As my friend Maia put it, “ad tech is about trying to scam the rest of ad tech as hard as possible, while trying to not get scammed too hard yourself.”

One can therefore argue that ad tech, like the crypto craze, is just capitalism refined to its purest: there is nothing of actual value being bought and sold at prices that are far in excess of what little value actually exists. And, like crypto, ad tech is responsible for substantial carbon dioxide emissions.

In short, do everyone a favor, and use a damn ad blocker. Oh, and don’t work in ad tech. I have wilder stories to tell about that engagement, but sometimes things are better left unsaid.

Categories
Uncategorized

spelunking through the apk-tools dependency solver

In our previous episode, I wrote a high level overview of apk’s differences verses traditional package managers, which many have cited as a helpful resource for understanding the behavior of apk when it does something different than a traditional package manager would. But that article didn’t go into depth in enough detail to explain how it all actually works. This one hopefully will.

A high level view of the moving parts

Our adventure begins at the /etc/apk/world file. This file contains the basic set of constraints imposed on the system: every constraint listed here must be solvable in order for the system to be considered correct, and no transaction may be committed that is incorrect. In other words, the package management system can be proven to be in a correct state every time a constraint is added or removed with the apk add/del commands.

Note I used the word transaction there: at its core, apk is a transactional package manager, though we have not fully exploited the transactional capabilities yet. A transaction is created by copying the current constraint list (db->world), manipulating it with apk_deps_add and then committing it with apk_solver_commit. The commitment phase does pre-flight checks directly and returns an error if the transaction fails to pass.

This means that removing packages works the same way: you copy the current constraint set, remove the desired constraint, and then commit the result, which either errors out or updates the installed constraint set after the transaction is committed.

A deeper look into the solver itself

As noted above, the primary entry point into the solver is to call the apk_solver_commit function, which at the time that I am writing this, is located in the apk-tools source code at src/commit.c:679. This function does a few pre-flight checks and then calls into the solver itself, using apk_solver_solve, which generates the actual transaction to be committed. If there are errors, the generated transaction is discarded and a report is printed instead, otherwise the generated transaction is committed using apk_solver_commit_changeset.

In essence, the code in src/commit.c can be thought of as the middle layer between the applets and the core solver. The core solver itself lives in src/solver.c and as previously noted, the main entry point is apk_solver_solve, which generates a proposed transaction to satisfy the requested constraints. This function lives at src/solver.c:1021, and is the only entry point into the solver itself.

The first thing the solver does is alphabetically sort the constraint set. If you’ve noticed that /etc/apk/world is always in alphabetical order, this is a side effect of that sorting.

Once the world constraints (the ones in /etc/apk/world) are alphabetically ordered, the next step is to figure out what package, if any, presently satisfies the constraint. This is handled by the discover_name function, which is called recursively on every constraint applicable to the system, starting with the world constraint.

The next step is to generate a fuzzy solution. This is done by walking the dependency graph again, calling the apply_constraint function. This step does basic dependency resolution, removing possible solutions which explicitly conflict. Reverse dependencies (install_if) are partially evaluated in this phase, but complex constraints (such as those involving a version constraint or multiple solutions) are not evaluated yet.

Once basic constraints are applied to the proposed updated world, the next step is to walk the dependency graph again, reconsidering the fuzzy solution generated in the step above. This step is done by the reconsider_name function, which walks over parts of the dependency graph that are still ambiguous. Finally, packages are selected to resolve these ambiguities using the select_package function. Afterwards, the final changeset is emitted by the generate_changeset function.

A deep dive into reconsider_name and select_package

As should hopefully be obvious by now, the really complicated cases are handled by the reconsider_name function. These cases include scenarios such as virtual providers, situations where more than one package satisfies the constraint set, and so on. For these scenarios, it is the responsibility of the reconsider_name function to select the most optimal package. Similarly, it is the responsibility of the select_package function to check the work done by reconsider_name and finalize the package selection if appropriate by removing the constraint from the ambiguous list.

The primary purpose of the reconsider_name function is to use discover_name and apply_constraint to move more specific constraints upwards and downwards through the dependency graph, narrowing the possible set of packages which can satisfy a given restraint, ideally to one package or less. These simplified dependency nodes are then fed into select_package to deduce the best package selection to make.

The select_package function checks each constraint, and the list of remaining candidate packages, and then picks the best package for each constraint. This is done by calling compare_providers for each possible package and until the best one is found. The heuristics checked by compare_providers are, in order:

  1. The packages are checked to see if they are NULL or not. The one that isn’t NULL wins. This is mostly as a safety check.
  2. We check to see if the user is using --latest or not. If they are, then the behavior changes a little bit. The details aren’t so important, you can read the source if you really want to know. Basically, in this step, we determine how fresh a package is, in alignment with what the user’s likely opinion on freshness would be.
  3. The provider versions are compared, if applicable. Highest version wins.
  4. The package versions themselves are compared. Highest version wins.
  5. The already installed package is preferred if the version is the same (this is helpful in upgrade transactions to make them less noisy).
  6. The provider_priority field is compared. Highest priority wins. This means that provider_priority is only checked for unversioned providers.
  7. Finally, the earliest repository in /etc/apk/repositories is preferred if all else is the same.

Hopefully, this demystifies some of the common misconceptions around how the solver works, especially how provider_priority works. Personally, I think in retrospect, despite working on the spec and implementing it in apk-tools, that provider_priority was a mistake, and the preferred solution should be to always use versioned providers (e.g. provides="foo=100") instead. The fact that we have moved to versioning cmd: providers in this way demonstrates that provider_priority isn’t really a good design.

Next time: what is the maximum number of entries allowed in /etc/apk/repositories and why is it so low?

Categories
Uncategorized

It’s time to boycott AWS

I woke up this morning not planning to write anything on this blog, much less anything about AWS. But then, as I was eating breakfast, I read a horrifying story in Mother Jones about how an AWS employee was treated as he did his best to cope with his wife’s terminal cancer.

In the free software community, Amazon (more specifically AWS) has been criticized for years for taking a largely exploitative position concerning FOSS projects. These conversations frequently result in proposals to use licensing as a weapon against AWS. In general, I believe that it would be difficult to target AWS with licensing, as statutory licenses must be fair, reasonable and non-discriminatory. But the issue of exploitation remains: AWS takes from the commons of FOSS projects and productizes that work, frequently without giving anything back.

They are, of course, allowed to do this, but at the same time, in doing so, they have frequently undercut the efforts of developers to monetize the labor involved in software maintenance, which leads to projects adopting licenses like SSPL and Commons Clause, which are significantly problematic for the commons.

On top of this, licensing-based attacks are unlikely to be effective against AWS anyway, because in the process of productization, they wind up significantly modifying the software anyway. This means that it is only another step further to just completely rewrite the software, which is something they have done in the past, and will likely do again in the future.

But my issue isn’t just the exploitative relationship AWS has with the commons (which is largely specific to AWS by the way), but rather the corporate culture of AWS. When I read the story in Mother Jones this morning, I saw no reason to disbelieve it, as I have heard many similar stories in the past from AWS employees.

As participants in the technology industry, we are free to choose our suppliers. This freedom comes with a damning responsibility, however. When we choose to engage with AWS as a supplier, we are enabling and affirming the way they do business as a company. We are affirming their exploitation of the commons.

We are also affirming their exploitative practice of placing AWS employees on a “pivot” (their parlance for a Performance Improvement Plan), which involves working employees to the bone, saying they failed to meet their PIP objectives and then firing them.

The free software community must stand against both kinds of exploitation. We must stand against it by boycotting AWS until they recalibrate their relationship with the commons, and their relationship with their employees. We must also encourage the adoption and proliferation of humane, freedom-respecting technology.

Categories
Uncategorized

don’t do clever things in configure scripts

Recently, a new version of ncurses was released and pushed to Alpine. The maintainer of ncurses in Alpine successfully built it on his machine, so he pushed it to the builders, expecting it to build fine on them. Of course, it promptly failed to build from source on the builders, because make install did not install the pkg-config .pc files to the right location.

You might think, what a weird regression, and you’d be right. After all, pkg-config files are usually just installed to $libdir/pkgconfig in any sort of autotools-based build system. Indeed, in the past, this is what ncurses did as well. However, this was clearly too robust of a solution to the problem of determining where to install the pkg-config files, and so innovation™️ happened:

Yes, you are reading this right: it is scraping debug output to find the pkg-config search path. This kind of behavior is clever, and it should absolutely be avoided in anything designed to be portable, such as a configure script. In fact, really, a build system should follow established expectations, specifically, in the autotools case, it should just default to $libdir/pkgconfig and allow the end user to override that if it makes sense to. Doing anything else is going to lead to edge cases and problems, since it is more clever than the user expects it to be.

the portable way to query the search list

If you really need to query the pkg-config search list, use pkg-config --variable=pc_path pkg-config. It is supported by every implementation of pkg-config, and it is an interface we all agreed to support explicitly for this use case. This method is supported by freedesktop.org’s implementation, OpenBSD’s implementation and pkgconf, which are the three main implementations in production use.

There is absolutely no need to scrape debug output, which is absolutely subject to change and should never be considered a stable interface in any program.

Categories
Uncategorized

the Alpine release process

It’s almost Halloween, which means it’s almost time for an Alpine release, and all hands are on deck to make sure the process goes smoothly. But what goes into making an Alpine release? What are all the moving parts? Since we are in the process of cutting a new release series, I figured I would write about how it is actually done.

the beginning of the development cycle

The development cycle for an Alpine release is 6 months long: it begins immediately once the release is branched in aports.git: at that point, there is no longer a development freeze, and minor changes start flowing in.

Prior to the beginning of the development cycle, larger changes are proposed as system change proposals, an example of which being the change proposal introducing Rust to main for the Alpine 3.16 development cycle. The largest, most invasive proposals are coordinated by the Technical Steering Committee, while others may be coordinated by smaller teams, and individual maintainers. Anybody may create a system change proposal and drive it in Alpine, regardless of whether or not they have developer rights in the project.

As these system change proposals are accepted (possibly after a few rounds of revision), the underlying steps needed to implement the change are sequenced into the overall development schedule if needed. Otherwise, they are implemented at the discretion of the contributor driving the change proposal.

soft freeze

About three weeks before release time, we set up new builders and initiate a mass rebuild of the distribution for the next release. These new builders will continue to follow changes to the edge branch until the final release is cut, at which point they will be switched to follow the branch set up for the release.

At this point, the edge branch is limited to minor, low risk changes only, unless explicitly granted an exception by the TSC. Efforts are primarily focused on making bug fix changes only, such as resolving failure-to-build-from-source (FTBFS) issues discovered during the rebuild.

release candidates

The next step before release is to do a few test releases. Release candidates are automatically produced by the builders when a developer updates the alpine-base package and tags that commit with an appropriate git tag. These candidates get uploaded to the mirror network and users begin testing them, which usually results in a few bugs being reported which get fixed prior to the final release.

If you are curious, you can read the code that is run to generate the releases yourself, it is located in the aports.git repository in the scripts folder. The main driver of the release generation process is mkimage.sh.

the final release

A few days after each release candidate is cut, the TSC (or a release engineering team delegated by the TSC) evaluates user feedback from testing the new release, and a go/no-go decision is made on making the final release. If the TSC decides the release is not ready, the a new release candidate is made.

Otherwise, if the decision to release is made, then the aports.git tree is branched, the new builders are switched to following the new branch, and the final release is cut on that branch. At that point, the edge branch is reopened for unrestricted development.

Hopefully, in just a few days, we will have shipped the Alpine 3.15.0 release to the world, with very few release candidates required to do so. So far, the release process has largely gone smoothly, but only time will tell.

Categories
Uncategorized

Trustworthy computing in 2021

Normally, when you hear the phrase “trusted computing,” you think about schemes designed to create roots of trust for companies, rather than the end user. For example, Microsoft’s Palladium project during the Longhorn development cycle of Windows is a classically cited example of trusted computing used as a basis to enforce Digital Restrictions Management against the end user.

However, for companies and software maintainers, or really anybody who is processing sensitive data, maintaining a secure chain of trust is paramount, and that root of trust is always the hardware. In the past, this was not so difficult: we had very simple computers, usually with some sort of x86 CPU and a BIOS, which was designed to be just enough to get DOS up and running on a system. This combination resulted in something trivial to audit and for the most part everything was fine.

More advanced systems of the day, like the Macintosh and UNIX workstations such as those sold by Sun and IBM used implementations of IEEE-1275, also known as Open Firmware. Unlike the BIOS used in the PC, Open Firmware was written atop a small Forth interpreter, which allowed for a lot more flexibility in handling system boot. Intel, noting the features that were enabled by Open Firmware, ultimately decided to create their own competitor called the Extensible Firmware Interface, which was launched with the Itanium.

Intel’s EFI evolved into an architecture-neutral variant known as the Unified Extensible Firmware Interface, frequently referred to as UEFI. For the most part, UEFI won against Open Firmware: the only vendor still supporting it being IBM, and only as a legacy compatibility option for their POWER machines. Arguably the demise of Open Firmware was more related to industry standardization on x86 instead of the technical quality of UEFI however.

So these days the most common architecture is x86 with UEFI firmware. Although many firmwares out there are complex, this in and of itself isn’t impossible to audit: most firmware is built on top of TianoCore. However, it isn’t ideal, and is not even the largest problem with modern hardware.

Low-level hardware initialization

Most people when asked how a computer boots, would say that UEFI is the first thing that the computer runs, and then that boots into the operating system by way of a boot loader. And, for the most part, due to magic, this is a reasonable assumption for the layperson. But it isn’t true at all.

In reality, most machines have either a dedicated service processor, or a special execution mode that they begin execution in. Regardless of whether a dedicated service processor (like the AMD PSP, older Intel ME, various ARM SoCs, POWER, etc.) or a special execution mode (newer Intel ME), system boot starts by executing code burned into a mask rom, which is part of the CPU circuitry itself.

Generally the mask rom code is designed to bring up just enough of the system to allow transfer of execution to a platform-provided payload. In other words, the mask rom typically brings up the processor’s core complex, and then jumps into platform-specific firmware in NOR flash, which then gets you into UEFI or Open Firmware or whatever your device is running that is user-facing.

Some mask roms initialize more, others less. As they are immutable, they cannot be tampered with on a targeted basis. However, once the main core complex is up, sometimes the service processor (or equivalent) sticks around and is still alive. In situations where the service processor remains operational, there is the possibility that it can be used as a backdoor. Accordingly, the behavior of the service processor must be carefully considered when evaluating the trustworthiness of a system.

One can ask a few simple questions to evaluate the trustworthiness of a system design, assuming that the worst case scenario is assumed for any question where the answer is unknown. These questions are:

  • How does the system boot? Does it begin executing code at a hardwired address or is there a service processor?
  • If there is a service processor, what is the initialization process that the service processor does? Is the mask rom and intermediate firmware auditable? Has it already been audited by a trusted party?
  • What components of the low level init process are stored in NOR flash or similar? What components are immutable?
  • What other functions does the service processor perform? Can they be disabled? Can the service processor be instructed to turn off?

System firmware

The next point of contention, of course, is the system firmware itself. On most systems today, this is an implementation of UEFI, either Aptio or InsydeH2O. Both are derived from the open source TianoCore EDK codebase.

In most cases, these firmwares are too complicated for an end user to audit. However, some machines support coreboot, which can be used to replace the proprietary UEFI with a system firmware of your choosing, including one built on TianoCore.

From a practical perspective, the main point of consideration at the firmware level is whether the trust store can be modified. UEFI mandates the inclusion of Microsoft’s signing key by default, but if you can uninstall their key and install your own, it is possible to gain some trustworthiness from the implementation, assuming it is not backdoored. This should be considered a minimum requirement for gaining some level of trust in the system firmware, but ultimately if you cannot audit the firmware, then you should not extend high amounts of trust to it.

Resource isolation

A good system design will attempt to isolate resources using IOMMUs. This is because external devices, such as those on the PCIe bus should not be trusted with unrestricted access to system memory, as they can potentially be backdoored.

It is sometimes possible to use virtualization technology to create barriers between PCIe devices and the main OS. Qubes OS for example uses the Xen hypervisor and dedicated VMs to isolate specific pieces of hardware and their drivers.

Additionally, with appropriate use of IOMMUs, system stability is improved, as badly behaving hardware and drivers cannot crash the system.

A reasonably secure system

Based on the discussion above, we can conclude some properties of what a secure system would look like. Not all systems evaluated later in this blog will have all of these properties. But we have a framework none the less, where the more properties that are there indicate a higher level of trustworthiness:

  • The system should have a hardware initialization routine that is as simple as possible.
  • The service processor, if any, should be restricted to hardware initialization and tear down and should not perform any other functionality.
  • The system firmware should be freely available and reproducible from source.
  • The system firmware must allow the end user to control any signing keys enrolled into the trust store.
  • The system should use IOMMUs to mediate I/O between the main CPU and external hardware devices like PCIe cards and so on.

How do systems stack up in the real world?

Using the framework above, lets look at a few of the systems I own and see how trustworthy they actually are. The results may surprise you. These are systems that anybody can purchase, without having to do any sort of hardware modifications themselves, from reputable vendors. Some examples are intentionally silly, in that while they are secure, you wouldn’t actually want to use them today for getting work done due to obsolescence.

Compaq DeskPro 486/33m

The DeskPro is an Intel 80486DX system running at 33mhz. It has 16MB of RAM, and I haven’t gotten around to unpacking it yet. But, it’s reasonably secure, even when turned on.

As described in the 80486 programmer’s manual, the 80486 is hardwired to start execution from 0xFFFFFFF0. As long as there is a ROM connected to the chip in such a way that the 0xFFFFFFF0 address can be read, the system will boot whatever is there. This jumps into a BIOS, and then from there, into its operating system. We can audit the system BIOS if desired, or, if we have a CPLD programmer, replace it entirely with our own implementation, since it’s socketed on the system board.

There is no service processor, and booting from any device other than the hard disk can be restricted with a password. Accordingly, any practical attack against this machine would require disassembly of it, for example, to replace the hard disk.

However, this machine does not use IOMMUs, as it predates IOMMUs, and it is too slow to use Xen to provide equivalent functionality. Overall it scores 3 out of 5 points on the framework above: simple initialization routine, no service controller, no trust store to worry about.

Where you can get one: eBay, local PC recycler, that sort of thing.

Dell Inspiron 5515 (AMD Ryzen 5700U)

This machine is my new workhorse for x86 tasks, since my previous x86 machine had a significant failure of the system board. Whenever I am doing x86-specific Alpine development, it is generally on this machine. But how does it stack up?

Unfortunately, it stacks up rather badly. Like modern Intel machines, system initialization is controlled by a service processor, the AMD Platform Security Processor. Worse yet, unlike Intel, the PSP firmware is distributed as a single signed image, and cannot have unwanted modules removed from it.

The system uses InsydeH2O for its UEFI implementation, which is closed source. It does allow Microsoft’s signing keys to be removed from the trust store. And while IOMMU functionality is available, it is available to virtualized guests only.

So, overall, it scores only 1 out of 5 possible points for trustworthiness. It should not surprise you to learn that I don’t do much sensitive computing on this device, instead using it for compiling only.

Where you can get one: basically any electronics store you want.

IBM/Lenovo ThinkPad W500

This machine used to be my primary computer, quite a while ago, and ThinkPads are known for being able to take quite a beating. It is also the first computer I tried coreboot on. These days, you can use Libreboot to install a deblobbed version of coreboot on the W500. And, since it is based on the Core2 Quad CPU, it does not have the Intel Management Engine service processor.

But, of course, the Core2 Quad is too slow for day to day work on an operating system where you have to compile lots of things. However, if you don’t have to compile lots of things, it might be a reasonably priced option.

When you use this machine with a coreboot distribution like Libreboot, it scores 4 out of 5 on the trustworthiness score, the highest of all x86 devices evaluated. Otherwise, with the normal Lenovo BIOS, it scores 3 out of 5, as the main differentiator is the availability of a reproducible firmware image: there is no Intel ME to worry about, and the UEFI BIOS allows removal of all preloaded signing keys.

However, if you use an old ThinkPad, using Libreboot introduces modern features that are not available in the Lenovo BIOS, for example, you can build a firmware that fully supports the latest UEFI specification by using the TianoCore payload.

Where you can get it: eBay, PC recyclers. The maintainer of Libreboot sells refurbished ThinkPads on her website with Libreboot pre-installed. Although her pricing is higher than a PC recycler, you are paying not only for a refurbished ThinkPad, but also to support the Libreboot project, hence the pricing premium.

Raptor Computing Systems Blackbird (POWER9 Sforza)

A while ago, somebody sent me a Blackbird system they built after growing tired of the #talos community. The vendor promises that the system is built entirely on user-controlled firmware. How does it measure up?

Firmware wise, it’s true: you can compile every piece of firmware yourself, and instructions are provided to do so. However, the OpenPOWER firmware initialization process is quite complicated. This is offset by the fact that you have all of the source code, of course.

There is a service processor, specifically the BMC. It runs the OpenBMC firmware, and is potentially a network-connected element. However, you can compile the firmware that runs on it yourself.

Overall, I give the Blackbird 5 out of 5 points, however, the pricing is expensive to buy directly from Raptor. A complete system usually runs in the neighborhood of about $3000-4000. There are also a lot of bugs with PPC64LE Linux still, too.

Where you can get it: eBay sometimes, the Raptor Computing Systems website.

Apple MacBook Air M1

Last year, Apple announced machines based on their own ARM CPU design, the Apple M1 CPU. Why am I bringing this up, since I am a free software developer, and Apple is usually wanting to destroy software freedom? Great question: the answer basically is that Apple’s M1 devices are designed in such a way that they have potential to be trustworthy`, performant and unlike Blackbird, reasonably affordable. However, this is still a matter of potential: the Asahi Linux project, while making fast progress has not yet arrived at production-quality support for this hardware yet. So how does it measure up?

Looking at the Asahi docs for system boot, there are three stages of system boot: SecureROM, and the two iBoot stages. The job of SecureROM is to initialize and load just enough to get the first iBoot stage running, while the first iBoot stage’s job is only to get the second iBoot stage running. The second iBoot stage then starts whatever kernel is passed to it, as long as it matches the enrolled hash for secure boot, which is user-controllable. This means that the second iBoot stage can chainload into GRUB or similar to boot Linux. Notably, there is no PKI involved in the secure boot process, it is strictly based on hashes.

This means that the system initialization is as simple as possible, leaving the majority of work to the second stage bootloader. There are no keys to manage, which means no trust store. The end user may trust whatever kernel hash she wishes.

But what about the Secure Enclave? Does it act as a service processor? No, it doesn’t: it remains offline until it is explicitly started by MacOS. And on the M1, everything is gated behind an IOMMU.

Therefore, the M1 actually gets 4 out of 5, making it roughly as trustworthy as the Libreboot ThinkPad, and slightly less trustworthy than the Blackbird. But unlike those devices, the performance is good, and the cost is reasonable. However… it’s not quite ready for Linux users yet. That leaves the Libreboot machines as providing the best balance between usability and trustworthiness today, even though the performance is quite slow by comparison to more modern computers. If you’re excited by these developments, you should follow the Asahi Linux project and perhaps donate to marcan’s Patreon.

Where to get it: basically any electronics store

SolidRun Honeycomb (NXP LX2160A, 16x Cortex-A72)

My main aarch64 workhorse at the moment is the SolidRun Honeycomb. I picked one up last year, and got Alpine running on it. Like the Blackbird, all firmware that can be flashed to the board is open source. SolidRun provides a build of u-boot or a build of TianoCore to use on the board. In general, they do a good job at enabling the ability to build your own firmware, the process is reasonably documented, with the only binary blob being DDR PHY training data.

However, mainline Linux support is only starting to mature: networking support just landed in full with Linux 5.14, for example. There are also bugs with the PCIe controller. And at $750 for the motherboard and CPU module, it is expensive to get started, but not nearly as expensive as something like Blackbird.

If you’re willing to put up with the PCIe bugs, however, it is a good starting point for a fully open system. In that regard, Honeycomb does get 5 out of 5 points, just like the Blackbird system.

Where to get it: SolidRun’s website.

Conclusions

While we have largely been in the dark for modern user-trustworthy computers, things are finally starting to look up. While Apple is a problematic company, for many reasons, they are at least producing computers which, once Linux is fully functional on them, are basically trustworthy, but at a sufficiently low price point verses other platforms like Blackbird. Similarly, Libreboot seems to be back up and running and will hopefully soon be targeting more modern hardware.