Trustworthy computing in 2021

Normally, when you hear the phrase “trusted computing,” you think about schemes designed to create roots of trust for companies, rather than the end user. For example, Microsoft’s Palladium project during the Longhorn development cycle of Windows is a classically cited example of trusted computing used as a basis to enforce Digital Restrictions Management against the end user.

However, for companies and software maintainers, or really anybody who is processing sensitive data, maintaining a secure chain of trust is paramount, and that root of trust is always the hardware. In the past, this was not so difficult: we had very simple computers, usually with some sort of x86 CPU and a BIOS, which was designed to be just enough to get DOS up and running on a system. This combination resulted in something trivial to audit and for the most part everything was fine.

More advanced systems of the day, like the Macintosh and UNIX workstations such as those sold by Sun and IBM used implementations of IEEE-1275, also known as Open Firmware. Unlike the BIOS used in the PC, Open Firmware was written atop a small Forth interpreter, which allowed for a lot more flexibility in handling system boot. Intel, noting the features that were enabled by Open Firmware, ultimately decided to create their own competitor called the Extensible Firmware Interface, which was launched with the Itanium.

Intel’s EFI evolved into an architecture-neutral variant known as the Unified Extensible Firmware Interface, frequently referred to as UEFI. For the most part, UEFI won against Open Firmware: the only vendor still supporting it being IBM, and only as a legacy compatibility option for their POWER machines. Arguably the demise of Open Firmware was more related to industry standardization on x86 instead of the technical quality of UEFI however.

So these days the most common architecture is x86 with UEFI firmware. Although many firmwares out there are complex, this in and of itself isn’t impossible to audit: most firmware is built on top of TianoCore. However, it isn’t ideal, and is not even the largest problem with modern hardware.

Low-level hardware initialization

Most people when asked how a computer boots, would say that UEFI is the first thing that the computer runs, and then that boots into the operating system by way of a boot loader. And, for the most part, due to magic, this is a reasonable assumption for the layperson. But it isn’t true at all.

In reality, most machines have either a dedicated service processor, or a special execution mode that they begin execution in. Regardless of whether a dedicated service processor (like the AMD PSP, older Intel ME, various ARM SoCs, POWER, etc.) or a special execution mode (newer Intel ME), system boot starts by executing code burned into a mask rom, which is part of the CPU circuitry itself.

Generally the mask rom code is designed to bring up just enough of the system to allow transfer of execution to a platform-provided payload. In other words, the mask rom typically brings up the processor’s core complex, and then jumps into platform-specific firmware in NOR flash, which then gets you into UEFI or Open Firmware or whatever your device is running that is user-facing.

Some mask roms initialize more, others less. As they are immutable, they cannot be tampered with on a targeted basis. However, once the main core complex is up, sometimes the service processor (or equivalent) sticks around and is still alive. In situations where the service processor remains operational, there is the possibility that it can be used as a backdoor. Accordingly, the behavior of the service processor must be carefully considered when evaluating the trustworthiness of a system.

One can ask a few simple questions to evaluate the trustworthiness of a system design, assuming that the worst case scenario is assumed for any question where the answer is unknown. These questions are:

  • How does the system boot? Does it begin executing code at a hardwired address or is there a service processor?
  • If there is a service processor, what is the initialization process that the service processor does? Is the mask rom and intermediate firmware auditable? Has it already been audited by a trusted party?
  • What components of the low level init process are stored in NOR flash or similar? What components are immutable?
  • What other functions does the service processor perform? Can they be disabled? Can the service processor be instructed to turn off?

System firmware

The next point of contention, of course, is the system firmware itself. On most systems today, this is an implementation of UEFI, either Aptio or InsydeH2O. Both are derived from the open source TianoCore EDK codebase.

In most cases, these firmwares are too complicated for an end user to audit. However, some machines support coreboot, which can be used to replace the proprietary UEFI with a system firmware of your choosing, including one built on TianoCore.

From a practical perspective, the main point of consideration at the firmware level is whether the trust store can be modified. UEFI mandates the inclusion of Microsoft’s signing key by default, but if you can uninstall their key and install your own, it is possible to gain some trustworthiness from the implementation, assuming it is not backdoored. This should be considered a minimum requirement for gaining some level of trust in the system firmware, but ultimately if you cannot audit the firmware, then you should not extend high amounts of trust to it.

Resource isolation

A good system design will attempt to isolate resources using IOMMUs. This is because external devices, such as those on the PCIe bus should not be trusted with unrestricted access to system memory, as they can potentially be backdoored.

It is sometimes possible to use virtualization technology to create barriers between PCIe devices and the main OS. Qubes OS for example uses the Xen hypervisor and dedicated VMs to isolate specific pieces of hardware and their drivers.

Additionally, with appropriate use of IOMMUs, system stability is improved, as badly behaving hardware and drivers cannot crash the system.

A reasonably secure system

Based on the discussion above, we can conclude some properties of what a secure system would look like. Not all systems evaluated later in this blog will have all of these properties. But we have a framework none the less, where the more properties that are there indicate a higher level of trustworthiness:

  • The system should have a hardware initialization routine that is as simple as possible.
  • The service processor, if any, should be restricted to hardware initialization and tear down and should not perform any other functionality.
  • The system firmware should be freely available and reproducible from source.
  • The system firmware must allow the end user to control any signing keys enrolled into the trust store.
  • The system should use IOMMUs to mediate I/O between the main CPU and external hardware devices like PCIe cards and so on.

How do systems stack up in the real world?

Using the framework above, lets look at a few of the systems I own and see how trustworthy they actually are. The results may surprise you. These are systems that anybody can purchase, without having to do any sort of hardware modifications themselves, from reputable vendors. Some examples are intentionally silly, in that while they are secure, you wouldn’t actually want to use them today for getting work done due to obsolescence.

Compaq DeskPro 486/33m

The DeskPro is an Intel 80486DX system running at 33mhz. It has 16MB of RAM, and I haven’t gotten around to unpacking it yet. But, it’s reasonably secure, even when turned on.

As described in the 80486 programmer’s manual, the 80486 is hardwired to start execution from 0xFFFFFFF0. As long as there is a ROM connected to the chip in such a way that the 0xFFFFFFF0 address can be read, the system will boot whatever is there. This jumps into a BIOS, and then from there, into its operating system. We can audit the system BIOS if desired, or, if we have a CPLD programmer, replace it entirely with our own implementation, since it’s socketed on the system board.

There is no service processor, and booting from any device other than the hard disk can be restricted with a password. Accordingly, any practical attack against this machine would require disassembly of it, for example, to replace the hard disk.

However, this machine does not use IOMMUs, as it predates IOMMUs, and it is too slow to use Xen to provide equivalent functionality. Overall it scores 3 out of 5 points on the framework above: simple initialization routine, no service controller, no trust store to worry about.

Where you can get one: eBay, local PC recycler, that sort of thing.

Dell Inspiron 5515 (AMD Ryzen 5700U)

This machine is my new workhorse for x86 tasks, since my previous x86 machine had a significant failure of the system board. Whenever I am doing x86-specific Alpine development, it is generally on this machine. But how does it stack up?

Unfortunately, it stacks up rather badly. Like modern Intel machines, system initialization is controlled by a service processor, the AMD Platform Security Processor. Worse yet, unlike Intel, the PSP firmware is distributed as a single signed image, and cannot have unwanted modules removed from it.

The system uses InsydeH2O for its UEFI implementation, which is closed source. It does allow Microsoft’s signing keys to be removed from the trust store. And while IOMMU functionality is available, it is available to virtualized guests only.

So, overall, it scores only 1 out of 5 possible points for trustworthiness. It should not surprise you to learn that I don’t do much sensitive computing on this device, instead using it for compiling only.

Where you can get one: basically any electronics store you want.

IBM/Lenovo ThinkPad W500

This machine used to be my primary computer, quite a while ago, and ThinkPads are known for being able to take quite a beating. It is also the first computer I tried coreboot on. These days, you can use Libreboot to install a deblobbed version of coreboot on the W500. And, since it is based on the Core2 Quad CPU, it does not have the Intel Management Engine service processor.

But, of course, the Core2 Quad is too slow for day to day work on an operating system where you have to compile lots of things. However, if you don’t have to compile lots of things, it might be a reasonably priced option.

When you use this machine with a coreboot distribution like Libreboot, it scores 4 out of 5 on the trustworthiness score, the highest of all x86 devices evaluated. Otherwise, with the normal Lenovo BIOS, it scores 3 out of 5, as the main differentiator is the availability of a reproducible firmware image: there is no Intel ME to worry about, and the UEFI BIOS allows removal of all preloaded signing keys.

However, if you use an old ThinkPad, using Libreboot introduces modern features that are not available in the Lenovo BIOS, for example, you can build a firmware that fully supports the latest UEFI specification by using the TianoCore payload.

Where you can get it: eBay, PC recyclers. The maintainer of Libreboot sells refurbished ThinkPads on her website with Libreboot pre-installed. Although her pricing is higher than a PC recycler, you are paying not only for a refurbished ThinkPad, but also to support the Libreboot project, hence the pricing premium.

Raptor Computing Systems Blackbird (POWER9 Sforza)

A while ago, somebody sent me a Blackbird system they built after growing tired of the #talos community. The vendor promises that the system is built entirely on user-controlled firmware. How does it measure up?

Firmware wise, it’s true: you can compile every piece of firmware yourself, and instructions are provided to do so. However, the OpenPOWER firmware initialization process is quite complicated. This is offset by the fact that you have all of the source code, of course.

There is a service processor, specifically the BMC. It runs the OpenBMC firmware, and is potentially a network-connected element. However, you can compile the firmware that runs on it yourself.

Overall, I give the Blackbird 5 out of 5 points, however, the pricing is expensive to buy directly from Raptor. A complete system usually runs in the neighborhood of about $3000-4000. There are also a lot of bugs with PPC64LE Linux still, too.

Where you can get it: eBay sometimes, the Raptor Computing Systems website.

Apple MacBook Air M1

Last year, Apple announced machines based on their own ARM CPU design, the Apple M1 CPU. Why am I bringing this up, since I am a free software developer, and Apple is usually wanting to destroy software freedom? Great question: the answer basically is that Apple’s M1 devices are designed in such a way that they have potential to be trustworthy`, performant and unlike Blackbird, reasonably affordable. However, this is still a matter of potential: the Asahi Linux project, while making fast progress has not yet arrived at production-quality support for this hardware yet. So how does it measure up?

Looking at the Asahi docs for system boot, there are three stages of system boot: SecureROM, and the two iBoot stages. The job of SecureROM is to initialize and load just enough to get the first iBoot stage running, while the first iBoot stage’s job is only to get the second iBoot stage running. The second iBoot stage then starts whatever kernel is passed to it, as long as it matches the enrolled hash for secure boot, which is user-controllable. This means that the second iBoot stage can chainload into GRUB or similar to boot Linux. Notably, there is no PKI involved in the secure boot process, it is strictly based on hashes.

This means that the system initialization is as simple as possible, leaving the majority of work to the second stage bootloader. There are no keys to manage, which means no trust store. The end user may trust whatever kernel hash she wishes.

But what about the Secure Enclave? Does it act as a service processor? No, it doesn’t: it remains offline until it is explicitly started by MacOS. And on the M1, everything is gated behind an IOMMU.

Therefore, the M1 actually gets 4 out of 5, making it roughly as trustworthy as the Libreboot ThinkPad, and slightly less trustworthy than the Blackbird. But unlike those devices, the performance is good, and the cost is reasonable. However… it’s not quite ready for Linux users yet. That leaves the Libreboot machines as providing the best balance between usability and trustworthiness today, even though the performance is quite slow by comparison to more modern computers. If you’re excited by these developments, you should follow the Asahi Linux project and perhaps donate to marcan’s Patreon.

Where to get it: basically any electronics store

SolidRun Honeycomb (NXP LX2160A, 16x Cortex-A72)

My main aarch64 workhorse at the moment is the SolidRun Honeycomb. I picked one up last year, and got Alpine running on it. Like the Blackbird, all firmware that can be flashed to the board is open source. SolidRun provides a build of u-boot or a build of TianoCore to use on the board. In general, they do a good job at enabling the ability to build your own firmware, the process is reasonably documented, with the only binary blob being DDR PHY training data.

However, mainline Linux support is only starting to mature: networking support just landed in full with Linux 5.14, for example. There are also bugs with the PCIe controller. And at $750 for the motherboard and CPU module, it is expensive to get started, but not nearly as expensive as something like Blackbird.

If you’re willing to put up with the PCIe bugs, however, it is a good starting point for a fully open system. In that regard, Honeycomb does get 5 out of 5 points, just like the Blackbird system.

Where to get it: SolidRun’s website.

Conclusions

While we have largely been in the dark for modern user-trustworthy computers, things are finally starting to look up. While Apple is a problematic company, for many reasons, they are at least producing computers which, once Linux is fully functional on them, are basically trustworthy, but at a sufficiently low price point verses other platforms like Blackbird. Similarly, Libreboot seems to be back up and running and will hopefully soon be targeting more modern hardware.

Bits related to Alpine Security Initiatives in September

The past month has been quite busy as we prepare to wrap up major security-related initiatives for the Alpine 3.15 release. Some progress has been made on long-term initiatives as well.

OpenSSL 3 migration

As I noted in my last status update, we began the process to migrate the distribution to using OpenSSL 3. As a result of this, we have found and mitigated a few interesting bugs, for example, ca-certificates-bundle was being pulled into the base system as a side effect rather than intentionally, despite apk-tools explicitly needing it for validating the TLS certificates used by Fastly for our CDN.

Migrating to OpenSSL 3 has not been without its share of difficulties however, as I noted in a blog post earlier in the month discussing some of these difficulties. I hope to be able to fully finish the OpenSSL 3 migration during the Alpine 3.16 development cycle as the last remaining packages such as mariadb and php make releases which support the new API. One other major issue needing to be addressed is updating wpa_supplicant and hostap to use the new OpenSSL APIs, but WPA requires the use of RC4 which has been moved to the legacy provider, so this will require programmatic loading of the legacy OpenSSL provider. Accordingly, we moved it back to OpenSSL 1.1 for now until upstream releases an update to address these problems.

OpenSSL 3 also introduces some unintended regressions. Specifically, a bug was reported against apk-tools where using apk --allow-untrusted would result in a crash. After some debugging work, I was able to reduce the issue to a simple reproducer: the EVP_md_null digest family was never updated to be compatible with the new OpenSSL 3 provider APIs, and so attempting to use it results in a crash, as the prerequisite function pointers never get set up on the EVP_MD_CTX context. This means that apk-tools is still using OpenSSL 1.1 for now, despite otherwise working with OpenSSL 3.

Coordinating the OpenSSL 3 migration consumed a lot of my time in September, for example, I spent a few days investigating OpenSSL 3 regressions on CPUs which claim to be Pentium-compatible but actually lack support for the lock cmpxchg8b instruction, and CPUs which claim to be Pentium 3-compatible, but lack the CMOV instructions. This investigation was quite frustrating, but also interesting, as the Vortex86DX3+ looks like a nice x86 CPU that is not vulnerable to Meltdown or Spectre due to not supporting speculation.

Rust in Alpine main for 3.16

As I noted in my previous blog about the OpenSSL 3 transition, we had to migrate Ansible from main to community due to Rust presently being in the community repository. This is largely because main has a general policy that is similar to other distributions: once we cut a new release series, we generally don’t do upgrades of packages in main, instead preferring to backport security and reliability fixes, unless we are certain they won’t cause regressions. Normally, this is a good thing, as new versions of software frequently bring ABI/API changes that are not backwards compatible, and upstream developers sometimes forget to update their SONAME versions to reflect those changes.

Distributions traditionally have to provide a maintenance lifecycle which is several years long without breaking their users’ environments, and so tend to be conservative in what post-release updates are made. Alpine takes a more “hybrid” approach and thus has more flexibility, but we still prefer to err on the side of caution. In the case of Rust, this meant that we wanted a working relationship that allowed us to have a high level of confidence in the Rust toolchains we were delivering to our users.

After almost a year of ongoing discussion, and a Rust MCP, we have come up with a plan in concert with Rust upstream which allows us to deliver production-quality Rust toolchains for Alpine users, and keep them continuously updated in Alpine releases. I expect the Alpine TSC to approve this for Alpine 3.16 and later. And yes, for other languages we are willing to offer a similar proposal, if there is a proven track record of maintaining backward compatibility and portability. Please feel free to reach out.

The more interesting development is that this allows for using components written in Rust for the base system in the future. While we do not plan to start evaluating these options until the 3.18 release cycle at the earliest, this does open a path to enabling rustls and hyper to replace OpenSSL and libfetch in apk-tools at some point, which could potentially be an interesting development. It also opens a path for components of apk-tools to eventually be written in Rust as well.

Distroless images

Another project I started recently is Witchery, a build system for generating distroless images using Alpine components. This allows a user to easily build a distroless image for their application by leveraging apk-tools to do the work. Distroless images are interesting from a security perspective as they contain less moving parts, in most cases, a distroless image built with Witchery will only contain musl, some data files in /etc and your application. By avoiding other components like busybox and apk-tools, the attack surface of an image is reduced, as there is nothing available outside your application payload for an attacker to use.

There is still a lot of work to do on Witchery, as it is presently in the proof of concept stage, and I plan on doing a second blog post about it soon. I believe that there is intrinsic value to deploying applications built against musl from a security point of view over glibc, as there is much more hardening in musl.

Acknowledgement

My activities relating to Alpine security work are presently sponsored by Google and the Linux Foundation. Without their support, I would not be able to work on security full time in Alpine, so thanks!

you can’t stop the (corporate) music

I’ve frequently said that marketing departments are the most damaging appendage of any modern corporation. However, there is one example of this which really proves the point: corporate songs, and more recently, corporate music videos. These Lovecraftian horrors are usually created in order to raise employee morale, typically at the cost of hundreds of thousands of dollars and thousands of man-hours being wasted on meetings to compose the song by committee. But don’t take my word for it: here’s some examples.

HP’s “Power Shift”

With a corporate song like this, it’s no surprise that PA-RISC went nowhere.

Lets say you’re a middle manager at Hewlett-Packard in 1991 leading the PA-RISC workstation team. Would you wake up one day and say “I know! What we need is to produce a rap video to show why PA-RISC is cool”? No, you probably wouldn’t. But that’s what somebody at HP did. The only thing this song makes me want to do is not buy a PA-RISC workstation. The lyrics likely haunt the hired composer to this day, just look at the hook:

You want power,
you want speed,
the 700 series,
is what you need!

PA-RISC has set the pace,
Hewlett-Packard now leads the race!

Brocade One: A Hole in None

This music video is so bad that Broadcom acquired Brocade to put it out of its misery a year later.

Your company is tanking because Cisco and Juniper released new products, such as the Cisco Nexus and Juniper MX router, that were far better than the Brocade NetIron MLXe router. What do you do? Make a better router? Nah, that’s too obvious. Instead, make a rap video talking about how your management tools are better! (I can speak from experience that Brocade’s VCS solution didn’t actually work reliably, but who cares about facts.)

PriceWaterhouseCoopers: talking about taxes, state and local

If you ever wondered if accountants could write hard hitting jams: the answer is no.

At least this one sounds like they made it in-house: the synth lead sounds like it is a Casio home synthesizer, and the people singing it sound like they probably worked there. Outside of the completely blasé lyrics, this one is surprisingly tolerable, but one still has to wonder if it was a good use of corporate resources to produce. Most likely not.

The Fujitsu Corporate Anthem

Fujitsu proves that this isn’t just limited to US companies

As far as corporate songs go, this one is actually quite alright. Fujitsu went all out on their propaganda exercise, hiring Mitsuko Miyake to sing their corporate power ballad, backed by a big band orchestra. If you hear it from them, Fujitsu exists to bring us all into a corporate utopia, powered by Fujitsu. Terrifying stuff, honestly.

The Gazprom Song: the Soviet Corporate Song

Remember when Gazprom had a gas leak sealed by detonating an atomic bomb?

A Russian friend of mine sent me this when I noted I was looking for examples of corporate propaganda. This song is about Gazprom, the Russian state gas company. Amongst other things, it claims to be a national savior in this song. I have no idea if that’s true or not, but they once had a gas leak sealed by getting the Soviet military to detonate an atomic bomb, so that seems pretty close.

Please stop making these songs

While I appreciate material where the jokes write themselves, these songs represent the worst of corporatism. Spend the money buying employees something they would actually appreciate, like a gift card or something instead of making these eldritch horrors. Dammit, I still have the PWC one stuck in my head. Gaaaaaaah!

Monitoring for process completion in 2021

A historical defect in the ifupdown suite has been the lack of proper supervision of processes run by the system in order to bring up and down interfaces. Specifically, it is possible in historical ifupdown for a process to hang forever, at which point the system will fail to finish configuring interfaces. As interface configuration is part of the boot process, this means that the boot process can potentially hang forever and fail to complete. Accordingly, we have introduced correct supervision of processes run by ifupdown-ng in the upcoming version 0.12, with a 5 minute timeout.

Because ifupdown-ng is intended to be portable, we had to implement two versions of the process completion monitoring routine. The portable version is a busy loop, which sleeps for 50 milliseconds between iteration, and the non-portable version uses Linux process descriptors, a feature introduced in Linux 5.3. For earlier versions, ifupdown-ng will downgrade to using the portable implementation. There are also a couple of other ways that one can monitor for process completion using notifications, but they were not appropriate for the ifupdown-ng design.

Busy-waiting with waitpid(2)

The portable version, as previously noted, uses a busy loop which sleeps for short durations of time. A naive version of a routine which does this would look something like:

/* return true if process exited successfully, false in any other case */
bool
monitor_with_timeout(pid_t child_pid, int timeout_sec)
{
    int status;
    int ticks;

    while (ticks < timeout_sec * 10)
    {
        /* waitpid returns the child PID on success */
        if (waitpid(child, &status, WNOHANG) == child)
            return WIFEXITED(status) && WEXITSTATUS(status) == 0;

        /* sleep 100ms */
        usleep(100000);
        ticks++;
    }

    /* timeout exceeded, kill the child process and error */
    kill(child, SIGKILL);
    waitpid(child, &status, WNOHANG);
    return false;
}

This approach, however, has some performance drawbacks. If the process has not already completed by the time that monitoring of it has begun, then you will be delayed at least 100ms. In the case of ifupdown-ng, almost all processes are very short-lived, so this is not a major issue, however, we can do better by tightening the event loop. Another optimization is to split the sleep part into two steps, allowing for the initial call to waitpid to have better chances of reaping the completed process:

/* return true if process exited successfully, false in any other case */
bool
monitor_with_timeout(pid_t child_pid, int timeout_sec)
{
    int status;
    int ticks;

    while (ticks < timeout_sec * 20)
    {
        /* sleep 50usec to allow the child PID to complete */
        usleep(50);

        /* waitpid returns the child PID on success */
        if (waitpid(child, &status, WNOHANG) == child)
            return WIFEXITED(status) && WEXITSTATUS(status) == 0;

        /* sleep 49.95ms */
        usleep(49950);
        ticks++;
    }

    /* timeout exceeded, kill the child process and error */
    kill(child, SIGKILL);
    waitpid(child, &status, WNOHANG);
    return false;
}

This works fairly well in practice: there is no performance regression on the ifupdown-ng test suite with this implementation.

The self-pipe trick

Daniel J. Bernstein described a trick in the early 90s that allows for process completion notifications to be delivered via a pollable file descriptor called the self-pipe trick. It is portable to any POSIX-compliant system, and can be used with poll or whatever you wish to use. It works by installing a signal handler against SIGCHLD that writes to a descriptor obtained with pipe(2). The downside of this approach is that you have to write quite a bit of code, and you have to track which pipe FD is associated with which PID. It also wastes a file descriptor per process, since you have a file descriptor for both sides of the pipe.

Linux’s signalfd

What if we could turn delivery of signals into a pollable file descriptor? This is precisely what Linux’s signalfd does. The basic idea here is to open a signalfd, associate SIGCHLD with it, and then do the waitpid(2) call when SIGCHLD is received at the signalfd. The downside with this approach is similar to the self-pipe trick, you have to keep global state in order to accomplish it, as there can only be a single SIGCHLD handler.

Process descriptors

FreeBSD introduced support for process descriptors in 2010 as part of the Capsicum framework. A process descriptor is an opaque handle to a specific process in the kernel. This is helpful as it avoids race conditions involving the recycling of PIDs. And since they are kernel handles, they can be waited on with kqueue like other kernel objects, by using EVFILT_PROCDESC.

There have been a few attempts to introduce process descriptors to Linux over the years. The attempt which finally succeeded was Christian Brauner’s pidfd API, completely landing in Linux 5.4, although parts of it were functional in prior releases. Like FreeBSD’s process descriptors, a pidfd is an opaque reference to a specific struct task_struct in the kernel, and is also pollable, making it quite suitable for notification monitoring.

A problem with using the pidfd API, however, is that it is not presently implemented in either glibc or musl, which means that applications will need to provide stub implementations of the API themselves for now. This issue with having to write our own stub aside, the solution is quite elegant:

#include <sys/syscall.h>

#if defined(__linux__) && defined(__NR_pidfd_open)

static inline int
local_pidfd_open(pid_t pid, unsigned int flags)
{
	return syscall(__NR_pidfd_open, pid, flags);
}

/* return true if process exited successfully, false in any other case */
bool
monitor_with_timeout(pid_t child_pid, int timeout_sec)
{
    int status;
    int pidfd = local_pidfd_open(child_pid, 0);
    if (pidfd < 0)
        return false;

    struct pollfd pfd = {
        .fd = pidfd,
        .pollin = POLLIN,
    };

    /* poll(2) returns the number of ready FDs, if it is less than
     * one, it means our process has timed out.
     */
    if (poll(&pfd, 1, timeout_sec * 1000) < 1)
    {
        close(pidfd);
        kill(child, SIGKILL);
        waitpid(child, &status, WNOHANG);
        return false;
    }

    /* if poll did return a ready FD, process completed. */ 
    waitpid(child, &status, WNOHANG);
    close(pidfd);

    return WIFEXITED(status) && WEXITSTATUS(status) == 0;
}

#endif

It will be interesting to see process supervisors (and other programs which perform short-lived supervision) adopt these new APIs. As for me, I will probably prepare patches to include pidfd_open and the other syscalls in musl as soon as possible.

The long-term consequences of maintainers’ actions

OpenSSL 3 has entered Alpine, and we have been switching software to use it over the past week.  While OpenSSL 1.1 is not going anywhere any time soon, it will eventually leave the distribution, once it no longer has any dependents.  I mostly bring this up because it highlights a few examples of maintainers not thinking about the big picture, let me explain.

First, the good news: in distribution-wide rebuilds, we already know that the overwhelming majority of packages in Alpine build just fine with OpenSSL 3, when individually built against it.  Roughly 85% of main builds just fine with OpenSSL 3, and 89% of community builds with it.  The rebuild effort is off to a good start.

Major upgrades to OpenSSL are not without their fallout, however.  In many cases, we cannot upgrade packages to use OpenSSL 3 because they have dependencies which themselves cannot yet be built with OpenSSL 3.  So, that 15% of main ultimately translates to 30-40% of main once you take into account dependencies like curl, which builds just fine with OpenSSL 3, but has hundreds of dependents, some of which don’t.

A major example of this is mariadb.  It has been known that OpenSSL 3 was on the horizon for over 4 years now, and that the OpenSSL 3 release would remove support for the classical OpenSSL programming approach of touching random internals.  However, they are just now beginning to update their OpenSSL support to use the modern APIs.  Because of this, we wound up having to downgrade dozens of packages which would otherwise have supported OpenSSL 3 just fine, because the maintainers of those packages did their part and followed the OpenSSL deprecation warnings as they showed up in OpenSSL releases.  MariaDB is a highly profitable company, who do business with the overwhelming majority of the Fortune 500 companies.  But yet, when OpenSSL 3 releases started to be cut, they weren’t ready, and despite having years of warning they’re still not, which accordingly limits what packages can get the OpenSSL 3 upgrade as a result.

Another casualty will be Ansible: we have already moved it to community.  You are probably wondering why Ansible, a software package which does not use OpenSSL at all, would be a casualty, so please let me explain.  Ansible uses paramiko for its SSH client, which is a great SSH library for Python, and is a totally solid decision to make.  However, paramiko uses cryptography for its cryptographic functions, again a totally solid decision to make, cryptography is a great library for developers to use.

For distributions, however, the story is different: cryptography moved to using Rust, because they wanted to leverage all of the static analysis capabilities built into the language.  This, too, is a reasonable decision, from a development perspective.  From the ecosystem perspective, however, it is problematic, as the Rust ecosystem is still rapidly evolving, and so we cannot support a single branch of the Rust compiler for an entire 2 year lifecycle, which means it exists in community.  Our solution, historically, has been to hold cryptography at the latest version that did not require Rust to build.  However, that version is not compatible with OpenSSL 3, and so it will eventually need to be upgraded to a new version which is.  And so, since cryptography has to move to community, so does paramiko and Ansible.

The ideology of moving fast and breaking things, while tolerated in the technology industry, does not translate to success in the world at large.  Outside of technology, the world prefers stability: the reason why banks still buy mainframes, and still use z/OS, is because the technology works and can be depended upon.  Similarly, the engine controller in cars, and medical devices like pacemakers and insulin pumps, are running on C.  They don’t run on C because C is a good language with all the latest features, they run on C because the risks and mitigations for issues in C programs are well-understood and documented as part of MISRA C.

Distributions exist to provide a similar set of stability and reliability guarantees.  If we cannot provide a long-term support lifecycle for a piece of technology your software depends on, then we are forced to provide a shorter support lifecycle for your software as well.  For some, that is fine, but I think many will be disappointed to see that they haven’t fully gotten OpenSSL 3, or that Ansible has had to be moved to community.

Efficient service isolation on Alpine with VRFs

Over the weekend, a reader of my blog contacted me basically asking about firewalls.  Firewalls themselves are boring in my opinion, so let’s talk about something Alpine can do that, as far as I know, no other distribution can easily do out of the box yet: service isolation using the base networking stack itself instead of netfilter.

A note on netfilter

Linux comes with a powerful network filtering framework, called netfilter. In most cases, netfilter is performant enough to be used for packet filtering, and the newer nftables project provides a mostly user friendly interface to the system, by comparison to the older iptables and ip6tables commands.

However, when trying to isolate individual services, a firewall is sometimes more complicated to use, especially when it concerns services which connect outbound. Netfilter does provide hooks that allow for matching via cgroup or PID, but these are complicated to use and carry a significant performance penalty.

seccomp and network namespaces

Two other frequently cited options are to use seccomp and network namespaces for this task. Seccomp is a natural solution to consider, but again has significant overhead, since all syscalls must be audited by the attached seccomp handler. Although seccomp handlers are eBPF programs and may be JIT compiled, the performance cost isn’t zero.

Similarly, one may find network namespaces to be a solution here. And indeed, network namespaces are a very powerful tool. But because of the flexibility afforded, network namespaces also require a lot of effort to set up. Importantly though, network namespaces allow for the use of an alternate routing table, one that can be restricted to say, the management LAN.

Introducing VRFs

Any network engineer with experience will surely be aware of VRFs. The VRF name is an acronym which stands for virtual routing and forwarding. On a router, these are interfaces that, when packets are forwarded to them, use an alternative routing table for finding the next destination. In that way, they are similar to Linux’s network namespaces, but are a lot simpler.

Thanks to the work of Cumulus Networks, Linux gained support for VRF interfaces in Linux 4.3. And since Alpine 3.13, we have supported managing VRFs and binding services to them, primarily for the purpose of low-cost service isolation. Let’s look at an example.

Setting up the VRF

On our example server, we will have a management LAN of 10.20.30.0/24. A gateway will exist at 10.20.30.1 as expected. The server itself will have an IP of 10.20.30.40. We will a single VRF, in conjunction with the system’s default route table.

Installing the needed tools

By default, Alpine comes with Busybox’s iproute2 implementation. While good for basic networking use cases, it is recommended to install the real iproute2 for production servers. To use VRFs, you will need to install the real iproute2, using apk add iproute2-minimal, which will cause the corresponding ifupdown-ng modules to be installed as well.

Configuring /etc/network/interfaces

We will assume the server’s ethernet port is the venerable eth0 interface in Alpine. First, we will want to set up the interface itself and it’s default route. If you’ve used the Alpine installer, this part should already be done, but we will include the configuration snippet for those following along.

auto eth0
iface eth0
address 10.20.30.40/24
gateway 10.20.30.1

The next step is to configure a VRF. In this case, we want to limit the network to just the management LAN, 10.20.30.0/24. At the moment, ifupdown-ng does not support configuring interface-specific routes out of the box, but it’s coming in the next version. Accordingly, we will use iproute2 directly with a post-up directive.

auto vrf-management
iface vrf-management
requires eth0
vrf-table 1
pre-up ip -4 rule add pref 32765 table local
pre-up ip -6 rule add pref 32765 table local
pre-up ip -4 rule del pref 0
pre-up ip -6 rule del pref 0
pre-up ip -4 rule add pref 2000 l3mdev unreachable
post-up ip route add 10.20.30.0/24 dev eth0 table 1

This does four things: first it creates the management VRF, vrf-management using the second kernel route table (each network namespace may have up to 4,096 routing tables).  It also asserts that the eth0 interface must be present and configured before the VRF is configured.  Next, it removes the default route lookup rules and moves them so that the VRFs will be checked first.  Finally, it then adds a route defining that the management LAN can be accessed through eth0. This allows egress packets to make their way back to clients on the management LAN.

In future versions of ifupdown-ng, the routing rule setup will be handled automatically.

Verifying the VRF works as expected

Once a VRF is configured, you can use the ip vrf exec command to run a program in the specified VRF context. In our case, the management VRF lacks a default route, so we should be able to observe a failure trying to ping hosts outside the management VRF, using ip vrf exec vrf-management ping 8.8.8.8 for example:

localhost:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=121 time=1.287 ms
^C
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.287/1.287/1.287 ms
localhost:~# ip vrf exec vrf-management ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Network unreachable

Success!  Now we know that using ip vrf exec, we can launch services with an alternate routing table.

Integration with OpenRC

Alpine presently uses OpenRC as its service manager, although we plan to switch to s6-rc in the future.  Our branch of OpenRC has support for using VRFs, for the declarative units.  In any of those files, you just add vrf=vrf-management, to the appropriate /etc/conf.d file, for example /etc/conf.d/sshd for the SSH daemon.

For services which have not been converted to use the declarative format, you will need to patch them to use ip vrf exec by hand.  In most cases, all you should need to do is use ${RC_VRF_EXEC} in the appropriate place.

As for performance, this is much more efficient than depending on netfilter, although the setup process is not as clean as I’d like it to be.

introducing witchery: tools for building distroless images with alpine

As I noted in my last blog, I have been working on a set of tools which enable the building of so-called “distroless” images based on Alpine.  These tools have now evolved to a point where they are usable for testing in lab environments, thus I am happy to announce the witchery project.

For the uninitiated, a “distroless” image is one which contains only the application and its dependencies.  This has some desirable qualities: since the image is only the application and its immediate dependencies, there is less attack surface to worry about.  For example, a simple hello-world application built with witchery clocks in at 619kB, while that same hello-world application deployed on alpine:3.14 clocks in at 5.6MB.  There are also drawbacks: a distroless image typically does not include a package manager, so there is generally no ability to add new packages to a distroless image.

As for why it’s called witchery: we are using Alpine’s package manager in new ways to perform truly deep magic.  The basic idea behind witchery is that you use it to stuff your application into an .apk file, and then use apk to install only that .apk and its dependencies into a rootfs: no alpine-base, no apk-tools, no busybox (though witchery allows you to install those things if you want them).

Deploying an an example application with witchery

For those who want to see the source code without commentary, you can find the Dockerfile for this example on the witchery GitHub repo.  For everyone else, I am going to try to break down what each part is doing, so that you can hopefully understand how it all fits together. We will be looking at the Dockerfile in the hello-world example.

The first thing the reader will likely notice is that Docker images built with witchery are done in three stages.  First, you build the application itself, then you use witchery to build what will become the final image, and finally, you copy that image over to a blank filesystem.

FROM alpine:3.14 AS build
WORKDIR /root
COPY . .
RUN apk add --no-cache build-base && gcc -o hello-world hello-world.c

The first stage to build the application is hopefully self explanatory, and is aptly named build.  We fetch the alpine:3.14 image from Dockerhub, then install a compiler (build-base) and finally use gcc to build the application.

The second stage has a few steps to it, that I will split up so that its easier to follow along.

FROM kaniini/witchery:latest AS witchery

First, we fetch the kaniini/witchery:latest image, and name it witchery.  This image contains alpine-sdk, which is needed to make packages, and the witchery tools which drive the alpine-sdk tools, such as abuild.

RUN adduser -D builder && addgroup builder abuild
USER builder
WORKDIR /home/builder

Anybody who is familiar with abuild will tell you that it cannot be used as root.  Accordingly, we create a user for running abuild, and add it to the abuild group.  We then tell Docker that we want to run commands as this new user, and do so from its home directory.

COPY --from=build /root/hello-world .
RUN mkdir -p payloadfs/app && mv hello-world payloadfs/app/hello-world
RUN abuild-keygen -na && fakeroot witchery-buildapk -n payload payloadfs/ payloadout/

The next step is to package our application.  The first step in doing so involves copying the application from our build stage.  We ultimately want the application to wind up in /app/hello-world, so we make a directory for the package filesystem, then move the application into place.  Finally, we generate a signing key for the package, and then generate a signed .apk for the application named payload.

At this point, we have a signed .apk package containing our application, but how do we actually build the image?  Well, just as we drove abuild with witchery-buildapk to build the .apk package and sign it, we will have apk build the image for us.  But first, we need to switch back to being root:

USER root
WORKDIR /root

Now that we are root again, we can generate the image.  But first, we need to add the signing key we generated in the earlier step to apk‘s trusted keys.  To do that, we simply copy it from the builder user’s home directory.

RUN cp /home/builder/.abuild/*.pub /etc/apk/keys

And finally, we build the image.  Witchery contains a helper tool, witchery-compose that makes doing this with apk really easy.

RUN witchery-compose -p ~builder/payloadout/payload*.apk -k /etc/apk/keys -X http://dl-cdn.alpinelinux.org/alpine/v3.14/main /root/outimg/

In this case, we want witchery-compose to grab the application package from ~builder/payloadout/payload*.apk.  We use a wildcard there because we don’t know the full filename of the generated package.  There are options that can be passed to witchery-buildapk to allow you to control all parts of the .apk package’s filename, so you don’t necessarily have to do this.  We also want witchery-compose to use the system’s trusted keys for validating signatures, and we want to pull dependencies from an Alpine mirror.

Once witchery-compose finishes, you will have a full image in /root/outimg.  The final step is to copy that to a new blank image.

FROM scratch
CMD ["/app/hello-world"]
COPY --from=witchery /root/outimg/ .

And that’s all there is to it!

Things left to do

There are still a lot of things left to do.  For example, we might want to implement layers that users can build from when deploying their apps, like one containing s6 for example.  We also don’t have a great answer for applications written in things like Python yet, so far this only works well for programs that are compiled in the traditional sense.

But its a starting point none the less.  I’ll be writing more about witchery over the coming months as the tools evolve into something even more powerful.  This is only the beginning.

Bits relating to Alpine security initiatives in August

As always, the primary focus of my work in Alpine is related to security, either through non-maintainer updates to address CVEs, new initiatives for hardening Alpine, maintenance of critical security-related packages or working with other projects to improve our workflows with better information sharing.  Here are some updates on that, which are slightly delayed because of the long weekend.

sudo deprecation

One of the key things we discussed in the last update was our plan to deprecate sudo, by moving it to communitysudo exists in a similar situation as firejail: it allows for some interesting use cases, but the security track record is not very good.  Additionally, the maintenance lifecycle for a release branch of sudo is very short, which makes it difficult to provide long-term support for any given version.

As such, the security team proposed to the Technical Steering Committee that we should deprecate sudo and move to an alternative implementation such as doas.  This required some work, namely, doas needed to gain support for configuration directories.  I wrote a patch for doas which provides support for configuration directories, and last week, pushed a doas package which includes this patch with some migration scripts.

At this point, basically everything which depended on sudo for technical reasons has been moved over to using doas.  We are just waiting for the cloud-init maintainer to finish testing their support for doas.  Once that is done, sudo will be moved to community.

OpenSSL 3.0

OpenSSL 3.0 was released today.  It is my intention to migrate Alpine to using it where possible.  As OpenSSL 3.0 will require a major rebuild, after talking with Timo, we will be coordinating this migration plan with the Technical Steering Committee.  Switching to OpenSSL 3.0 should not be as invasive as the OpenSSL 1.0 to 1.1 migration, as they did not change the APIs that much, and will give us the benefit of finally being free of that damn GPL-incompatible OpenSSL license, as OpenSSL 3 was relicensed to use the Apache 2.0 license.

I have already done some test rebuilds which covered much of the aports package set, and did not see much fallout so far.  Even packages which use the more lowlevel APIs, such as those in libcrypto compiled without any major problems.

A nice effect of the license change is that we should be able to drop dependencies on less audited TLS libraries, like GNU TLS, as many programs are licensed under GPL and therefore not compatible with the original OpenSSL license.

Reproducible Builds

We are starting to take steps towards reproducible packages.  The main blocker on that issue was determining what to do about storing the build metadata, so that a build environment can be recreated precisely.  To that extent, I have a patch to abuild which records all of the details exactly.  A rebuilder can then simply install the pinned packages with apk add --virtual.

We will need some way to archive historically built packages for the verification process.  Right now, the archive only ships current packages for each branch.  I am thinking about building something with ZFS or similar which snapshots the archive on a daily basis, but suggestions are welcome if anyone knows of better approaches.

Once these two things are addressed, we need to also add support for attempting rebuilds to the rebuilderd project.  In general, we should be able to model our support based on the support implemented for Arch Linux.

I am expecting to make significant progress on getting the .BUILDINFO file support merged into abuild and support for rebuilderd over the next month.  kpcyrd has been quite helpful in showing us how Arch has tackled reproducibility, and we have applied some lessons from that already to Alpine.

If you’re interested in this project, feel free to join #alpine-reproducible on irc.oftc.net.

secfixes-tracker

I am working on overhauling the JSON-LD documents which are presently generated by the secfixes-tracker application, so that they are more aligned with what the UVI vocabulary will look like.  At the same time, the UVI group have largely endorsed the use of Google’s new OSV format for use cases that do not require linked data.

Accordingly, I am writing a Python library which translates UVI to OSV and vice versa.  This is possible to do without much issues because UVI is intended to be a superset of OSV.

However, we need to request two mime-types for OSV documents and UVI JSON-LD documents.  In the meantime, the secfixes tracker will support querying with the .osv+json extension for querying our security tracker in the OSV format.

Anybody with experience requesting mime-types from IANA is encouraged to provide advice on how to do it most efficiently.

Best practices for Alpine installations

A couple of weeks ago, a kerfuffle happened where Adoptium planned to ship builds of OpenJDK with a third-party glibc package for Alpine.  Mixing libc on any system is a bad idea and has not necessarily obvious security implications.

As one option intended to discourage the practice of mixing musl and glibc on the same system, I proposed installing a conflict with the glibc package as part of the musl packaging.  We asked the Technical Steering Committee for guidance on this plan, and ultimately the TSC decided to solve this with documentation.

Therefore, I plan to work with the docs team to document practices to avoid (such as mixing libc implementations and release branches) to ensure Alpine systems remain secure and reliable.

Distroless, for Alpine

Google’s Distroless project provides tooling to allow users to build containers that include only the runtime dependencies to support an application.  This has some nice security advantages, because images have less components available to attack.  There has been some interest in building the same thing for Alpine, so that users can take advantage of the musl C library, while also having the security advantages of distroless.

It turns out that apk is capable of easily building a tool like this.  I already have a proof of concept, and I plan on expanding that into a more fully featured tool over the next week.  I also plan to do a deep dive into how that tool works once the initial version is released.

Acknowledgement

My activities relating to Alpine security work are presently sponsored by Google and the Linux Foundation. Without their support, I would not be able to work on security full time in Alpine, so thanks!

I drove 1700 miles for a Blåhaj last weekend and it was worth it

My grandmother has Alzheimer’s and has recently had to move into an assisted living facility. You’ve probably seen bits and pieces outlining my frustration with that process on Twitter over the past year or so. Anyway, I try to visit her once or twice a month, as time permits.

But what does that have to do with blåhaj, and what is a blåhaj, anyway? To answer your question literally, blåhaj is Swedish for “blue shark.” But to be more precise, it’s a popular shark stuffed animal design produced by IKEA. As a stuffed animal sommelier, I’ve been wanting one for a hot minute.

Anyway, visiting grandmother was on the way to the St. Louis IKEA. So, I figured, I would visit her, then buy a blåhaj, and go home, right? Well, it was not meant to be: after visiting with my grandmother, we went to the IKEA in St. Louis, and they had sold out. This had happened a few times before, but this time I decided I wasn’t going to take no for an answer. A blåhaj was to be acquired at any cost.

This led us to continue our journey onto Chicago, as the IKEA website indicated they had 20 or so in stock. 20 blåhaj for the entire Chicagoland area? We figured that, indeed, odds were good that a blåhaj could be acquired. Unfortunately, it still wasn’t meant to be: by the time we got to Chicago, the website indicated zero stock.

So we kept going, onward to Minneapolis. At the very least, we could see one of the great monuments to laissez-faire capitalism, the Mall of America, and the historic George Floyd Square, which was frankly more my speed and also quite moving. But again, our attempt to get a blåhaj was unsuccessful – the IKEA at the Mall of America was out of stock.

Our search for blåhaj wasn’t finished yet: there were two options, Winnipeg had nearly 100 in stock, and the Kansas City location had 53. A decision had to be made. We looked at the border crossing requirements for entering Canada and found out that if you present your CDC vaccination card, you could enter Canada without any problems. So, we flipped a coin: do we go six hours north, or six hours south?

Ultimately, we decided to go to the Kansas City location, as we wanted to start heading back towards home. It turns out that Kansas City is only about six hours away from Minneapolis, so we were able to make it to the Kansas City IKEA about an hour before it closed. Finally, a success: a blåhaj was acquired. And that’s when my truck started pissing oil, but that’s a story for another day.

Does blåhaj live up to or exceed my expectations?

Absolutely! As far as stuffed animals go, blåhaj is quite premium, and available for a bargain at only 20 dollars for the large one. The quality is quite comparable to high-end plush brands like Jellycat and Aurora. It is also very soft, unlike some of the other IKEA stuffed animals.

Some people asked about sleeping with a blåhaj. Jellycat stuffed animals, for example, are explicitly designed for spooning, which is essential for a side sleeper. The blåhaj is definitely not, but due to its softness you can use it as a body pillow in various ways, such as to support your head or back.

The shell of the blåhaj is made out of a soft micro plush material very similar to the material used on second generation Jellycat bashful designs (which is different than the yarn-like material used on the first generation bashfuls). All stitching is done using inside seems, so the construction is quite robust. It should last for years, even with constant abuse.

All in all, a fun trip, for a fun blåhaj, though maybe I wouldn’t drive 1700 miles round trip again for one.

How networks of consent can fix social platforms

Social platforms are powerful tools which allow a user to communicate with their friends and family. They also allow for activists to organize and manage political movements. Unfortunately, they also allow for users to harass other users and the mitigations available for that harassment are generally lacking.

By implementing networks of consent using the techniques presented, centralized, federated and distributed social networking platforms alike can build effective mitigations against harassment. Not all techniques will be required, depending on the design and implementation of a given platform.

What does consent mean here?

In this case, consent does not have any special technical meaning. It means the same thing as it does in real life: you (or your agent) is allowing or disallowing an action concerning you in some way.

As computers are incapable of inferring whether consent is given, the user records a statement affirming their consent if granted. Otherwise, the lack of a consent statement must be taken to mean that consent was not granted for an action. How this affirmation is recorded is platform specific.

In technical terms, we refer to these affirmations of consent as an object capability. Many things in this world are already built on object capabilities, for example Mach’s port system and cryptographic assets are forms of object capabilities.

How object capabilities can be used

In a monolithic system, you don’t really need real object capabilities, as they access grants can simply be recorded in the backend, and enforced transparently.

In a federated or distributed system, there are a few techniques that can be used to represent and invoke object capabilities. For example, an object capability might be represented by a key pair. In this case, the capability is invoked by signing the request with that key. Alternatively, capability URLs are another popular option, popularized by the Second Life Grid.

In a distributed system, simply having an opaque pointer to a given set of rights (and demonstrating possession of it) is sufficient, as the actor invoking the capability will invoke it directly with all relevant participants. This works because all participants are able to directly verify the validity of the capability as they witnessed its issuance to begin with.

However, in a federated system, you also need a way to provide proof that the invocation of a capability was accepted. This is usually implemented in the form of a signed proof statement created by the participant which issued the capability to begin with. Other more exotic schemes exist, but for the purpose of explaining everything this should suffice.

Building networks of consent with object capabilities

Now that we understand the basic concepts behind object capabilities, we can use them to model what a social network built from the ground up with a consent-oriented design would look like.

It should be noted that the user may configure her user agent to automatically consent to any incoming action, but this is an implementation detail. The presence of a consent framework at the network level does not imply the requirement for a user to manage whether consent is granted, it just allows for the necessary decision points to exist.

An example on a monolithic network

Let’s say that Alice wants to reply to her friend Bob’s post on Tooter, an imaginary proprietary social network that is focused on microblogging. In a proprietary network, Alice composes her reply, and then sends it to the network. The network then asks Bob’s user agent to approve or disapprove the reply. Bob’s user agent can choose to automatically accept the reply because Alice and Bob are already friends.

Now, let’s say that Karen98734762153 wants to reply to Bob’s post as well. Karen98734762153 has no prior relationship with Bob, but because Bob’s user agent is asked to make the decision, it can present the message to Bob to accept or reject. As Karen98734762153 is wanting to suggest the use of apple-flavored horse paste as a possible prophylactic for COVID, he chooses to reject the post, and Karen98734762153 is not granted any new privileges.

The same example in a distributed system

On a proprietary network, all of this can be implemented transparently to the end user. But can it be implemented in a distributed system? In this case, we assume a simple peer to peer network like Scuttlebutt. How would this work there?

As noted before, we can use object capabilities here. In this case, both Alice and Karen98734762153 would send their replies to Bob. Alice would reference her pre-existing relationship, and Karen98734762153 would not reference anything. Bob would commit Alice’s reply to the Scuttlebutt ledger and distribute that commitment to the pub-servers he is subscribed to, and ignore Karen98734762153’s reply.

The same example in a federated system

As seen, in a distributed system with a distributed ledger, where all objects are signed, this approach can work. Federated systems are a lot trickier to get right in regards to trust relationships, but it can be done here too. In this case, we introduce proof objects to demonstrate the acceptance of a capability. We will refer to these proofs as endorsements.

To this end, both Alice and Karen98734762153 send their replies to Bob, like before. Bob’s user agent then makes the decision to accept or reject the replies. In this example, Bob would add the reply to his local copy of replies, and then at a minimum send an endorsement back to Alice. Either Alice, Bob or both would then distribute that endorsement to a list of interested subscribers, who could verify the validity of the endorsement.

While other instances may choose to accept replies without an endorsement, they can also choose to reject them, or to give endorsed replies special status in their user interface. As there is not a unified consensus mechanism in federated networks, that is all that can be done. But it’s still pretty good.

The application of these ideas to other scenarios is left as an exercise for the reader.