<rss xmlns:source="http://source.scripting.com/" version="2.0">
  <channel>
    <title>Ariadne&#39;s Space</title>
    <link>https://ariadne.space/</link>
    <description></description>
    
    <language>en</language>
    
    <lastBuildDate>Sat, 28 Mar 2026 08:44:47 -0700</lastBuildDate>
    <item>
      <title>Why I am looking for Jellycat alternatives now</title>
      <link>https://ariadne.space/2026/03/28/why-i-am-looking-for.html</link>
      <pubDate>Sat, 28 Mar 2026 08:44:47 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2026/03/28/why-i-am-looking-for.html</guid>
      <description>&lt;p&gt;Many readers of my blog will note that I &lt;a href=&#34;https://ariadne.space/2022/02/11/how-to-refresh-older-stuffed.html&#34;&gt;used to be quite enthusiastic about Jellycat stuffed animals&lt;/a&gt;, especially their Bashful series, but I haven&amp;rsquo;t talked much about them lately.&lt;/p&gt;
&lt;p&gt;Outside of being well-designed, they also supported small business: the overwhelming majority of my collection has been purchased from independently-run stores, including one run by a close friend of mine in downtown Seattle. Unfortunately in the past few years, Jellycat have pursued a “brand elevation strategy” (their words, not mine), incrementally dropping small businesses from their network in favor of large chains and directing people to their website instead.&lt;/p&gt;
&lt;p&gt;In 2025, this included dropping hundreds of independent shops; &lt;a href=&#34;https://www.briscoepr.co.uk/what-the-jellycat-stockist-scandal-teaches-us-about-brand-loyalty/&#34;&gt;around 100 of these were in the UK alone&lt;/a&gt;, with no option for appeal, explicitly as part of this strategy. Many of these retailers had supported the brand for decades, and described the move as abrupt and poorly communicated.&lt;/p&gt;
&lt;p&gt;This is a baffling change in direction. Jellycat’s popularity was built in independent shops, which makes this strategy a good way to burn that goodwill.&lt;/p&gt;
&lt;p&gt;Frustratingly, this new direction is now becoming somewhat problematic for me, as it turns out they also make one of the best tactile tools I’ve found.&lt;/p&gt;
&lt;p&gt;In my case, I use one of the larger Jellycat Bashful Bunny variants (what they call the “Really Big” size) as a lap object during the day. It provides consistent tactile feedback, something to hold onto, and, when needed, doubles as a small pillow. The end result is that it helps with focus and anxiety during long stretches of work.&lt;/p&gt;
&lt;h2 id=&#34;what-i-am-looking-for-in-a-stuffed-animal&#34;&gt;What I am looking for in a stuffed animal&lt;/h2&gt;
&lt;p&gt;It needs to be large enough to sit comfortably in the lap and double as a small pillow, without becoming cumbersome. Beyond that, the details matter more than one might expect: it needs to be soft without being irritating, usable for long stretches, and have a shape that can be held onto or adjusted easily.&lt;/p&gt;
&lt;p&gt;Most alternatives fail one of these constraints. They are too small and effectively disappear, too large and unwieldy, or made from materials that are either unpleasant or distracting over time. Even small differences matter here; a Bunnies by the Bay bunny that was just slightly smaller was already enough to be annoying.&lt;/p&gt;
&lt;p&gt;This would be a minor annoyance if it were purely theoretical, but it is not: my current daily-use bunny is approaching end-of-life after sustained use. It has held up well, but it is not realistically replaceable with another one without undermining the earlier decision to stop supporting Jellycat.&lt;/p&gt;
&lt;p&gt;So this is both an explanation and a request: if anyone has found alternatives that meet roughly the same criteria, I would be interested in hearing about them.&lt;/p&gt;
&lt;p&gt;This is, apparently, a harder problem than it should be.&lt;/p&gt;
</description>
      <source:markdown>Many readers of my blog will note that I [used to be quite enthusiastic about Jellycat stuffed animals](https://ariadne.space/2022/02/11/how-to-refresh-older-stuffed.html), especially their Bashful series, but I haven&#39;t talked much about them lately.

Outside of being well-designed, they also supported small business: the overwhelming majority of my collection has been purchased from independently-run stores, including one run by a close friend of mine in downtown Seattle. Unfortunately in the past few years, Jellycat have pursued a “brand elevation strategy” (their words, not mine), incrementally dropping small businesses from their network in favor of large chains and directing people to their website instead.

In 2025, this included dropping hundreds of independent shops; [around 100 of these were in the UK alone](https://www.briscoepr.co.uk/what-the-jellycat-stockist-scandal-teaches-us-about-brand-loyalty/), with no option for appeal, explicitly as part of this strategy. Many of these retailers had supported the brand for decades, and described the move as abrupt and poorly communicated.

This is a baffling change in direction. Jellycat’s popularity was built in independent shops, which makes this strategy a good way to burn that goodwill.

Frustratingly, this new direction is now becoming somewhat problematic for me, as it turns out they also make one of the best tactile tools I’ve found.

In my case, I use one of the larger Jellycat Bashful Bunny variants (what they call the “Really Big” size) as a lap object during the day. It provides consistent tactile feedback, something to hold onto, and, when needed, doubles as a small pillow. The end result is that it helps with focus and anxiety during long stretches of work.

## What I am looking for in a stuffed animal

It needs to be large enough to sit comfortably in the lap and double as a small pillow, without becoming cumbersome. Beyond that, the details matter more than one might expect: it needs to be soft without being irritating, usable for long stretches, and have a shape that can be held onto or adjusted easily.

Most alternatives fail one of these constraints. They are too small and effectively disappear, too large and unwieldy, or made from materials that are either unpleasant or distracting over time. Even small differences matter here; a Bunnies by the Bay bunny that was just slightly smaller was already enough to be annoying.

This would be a minor annoyance if it were purely theoretical, but it is not: my current daily-use bunny is approaching end-of-life after sustained use. It has held up well, but it is not realistically replaceable with another one without undermining the earlier decision to stop supporting Jellycat.

So this is both an explanation and a request: if anyone has found alternatives that meet roughly the same criteria, I would be interested in hearing about them.

This is, apparently, a harder problem than it should be.
</source:markdown>
    </item>
    
    <item>
      <title>Why leaders often disappoint us</title>
      <link>https://ariadne.space/2026/01/22/why-leaders-often-disappoint-us.html</link>
      <pubDate>Thu, 22 Jan 2026 14:54:41 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2026/01/22/why-leaders-often-disappoint-us.html</guid>
      <description>&lt;p&gt;There&amp;rsquo;s an old saying about not meeting your heroes.  In practice, leaders tend to confirm this over time.  This is true across domains, and it&amp;rsquo;s rarely a single gaffe that does it.  The interesting question is why the disappointment usually takes the same shape.&lt;/p&gt;
&lt;p&gt;Disappointment does not always show up in the form of a bad conversation.  Often there isn&amp;rsquo;t any conversation at all, at least not in the way people imagine one.  As space disappears, interaction collapses into reaction.  Responses come faster, positions are stated rather than tested, and dialogue gives way to declaration.  At a certain distance, leadership becomes parasocial by default, taking the form of broadcast.  There is nothing to push back on, only things to react to.  By the time the gaffe happens, the system has already collapsed.&lt;/p&gt;
&lt;h2 id=&#34;the-accumulation-of-influence&#34;&gt;The accumulation of influence&lt;/h2&gt;
&lt;p&gt;Much of the time, leadership emerges through accumulated influence: someone does useful or visible work and attention gathers.  Over time, that influence carries more weight, and interaction quietly changes.  Exchange becomes presentation as influence becomes legible, and once influence is legible, it becomes performative by default.  At that point, silence loses neutrality.&lt;/p&gt;
&lt;p&gt;None of this requires bad intent.  The same pattern shows up in very different kinds of people, which makes individual explanations less convincing.  When silence carries cost, behavior tends to shift in predictable ways.  Reaction becomes safer than response, not because people are reckless, but because the underlying structure rewards it.&lt;/p&gt;
&lt;p&gt;A way to understand this shift is through ego development.  Early on, most people rely on external feedback to regulate their sense of self.  Attention and reinforcement help stabilize an emergent identity.  This is normal, temporary, and usually invisible, and under supportive conditions that reliance softens, allowing ongoing ego development to become more internally anchored.  When those conditions are absent, the process can stall.&lt;/p&gt;
&lt;h2 id=&#34;the-pressure-of-performance&#34;&gt;The pressure of performance&lt;/h2&gt;
&lt;p&gt;Performance pressure changes those conditions in reliable yet subtle ways.  Once visibility is continuous, there&amp;rsquo;s less room to disengage without consequence.  Attention has to be managed, silence has to be explained.  Over time, the space that ongoing ego development depends on gets crowded out by the need to remain legible, responsive, and present.&lt;/p&gt;
&lt;p&gt;As the space for integration disappears, the ego adapts by turning outward again.  Identity is maintained through response and visibility rather than reflection.  Over time, performance stops feeling optional and becomes the only mode that consistently receives feedback, recognition, and relief from pressure.&lt;/p&gt;
&lt;p&gt;Over time, this adaptation compresses.  Everything has to be answered, nothing gets to sit.  Reactionary performance closes off options until the ego is boxed in, with no clean way to pause or step back.  Listening to feedback becomes difficult, refusal stops reading as information and starts registering as pressure.&lt;/p&gt;
&lt;p&gt;Eventually the corner closes.  With no room left to pause or revise, the ego can&amp;rsquo;t maintain coherence through performance alone.  The result isn&amp;rsquo;t always silence, sometimes it&amp;rsquo;s escalation, overreach or the need to say something definitive.  You may be familiar with the old &amp;ldquo;escalate to deescalate&amp;rdquo; saying.  What collapses isn&amp;rsquo;t belief, but the capacity to remain integrated under pressure.&lt;/p&gt;
&lt;h2 id=&#34;the-collapse-of-integration&#34;&gt;The collapse of integration&lt;/h2&gt;
&lt;p&gt;Once integration collapses, the pressure doesn&amp;rsquo;t disappear.  It just looks for a new outlet.&lt;/p&gt;
&lt;p&gt;While the responses vary, the shape remains familiar.  Escalation replaces reflection, withdrawal masquerades as clarity.  Performance hardens into something more transactional.  What these paths share is an attempt to regain psychological safety without reopening space.  Coherence is rebuilt outwardly, even as inward integration remains unavailable.&lt;/p&gt;
&lt;p&gt;Another path out of collapse is grift.  Performance keeps going, but it changes shape.  Attention becomes stabilizing rather than incidental.  Some people who end up here may also have narcissistic pathology, but grift itself isn’t cleanly reducible to that: it emerges when identity can only be held together through external return.&lt;/p&gt;
&lt;p&gt;Sometimes people don&amp;rsquo;t collapse this way.  Usually it&amp;rsquo;s because the environment gives them the opportunity to pause: silence isn&amp;rsquo;t punished, and saying no doesn&amp;rsquo;t threaten belonging.  Influence is shared rather than concentrated.  Under those conditions, performance loosens its grip, and integration doesn’t have to be outsourced to constant response.&lt;/p&gt;
&lt;p&gt;The reason leaders disappoint us so often isn’t hard to find once you know what to look for.  Influence changes the shape of interaction, and performance gradually replaces integration as the primary stabilizer.  Over time, the space required for reflection and listening erodes.  What follows isn’t corruption so much as compression, and then collapse.  The disappointment comes from mistaking these outcomes for personal failure, when they’re often the trace left by a structure that no longer supports integration.&lt;/p&gt;
</description>
      <source:markdown>There&#39;s an old saying about not meeting your heroes.  In practice, leaders tend to confirm this over time.  This is true across domains, and it&#39;s rarely a single gaffe that does it.  The interesting question is why the disappointment usually takes the same shape.

Disappointment does not always show up in the form of a bad conversation.  Often there isn&#39;t any conversation at all, at least not in the way people imagine one.  As space disappears, interaction collapses into reaction.  Responses come faster, positions are stated rather than tested, and dialogue gives way to declaration.  At a certain distance, leadership becomes parasocial by default, taking the form of broadcast.  There is nothing to push back on, only things to react to.  By the time the gaffe happens, the system has already collapsed.

## The accumulation of influence

Much of the time, leadership emerges through accumulated influence: someone does useful or visible work and attention gathers.  Over time, that influence carries more weight, and interaction quietly changes.  Exchange becomes presentation as influence becomes legible, and once influence is legible, it becomes performative by default.  At that point, silence loses neutrality.

None of this requires bad intent.  The same pattern shows up in very different kinds of people, which makes individual explanations less convincing.  When silence carries cost, behavior tends to shift in predictable ways.  Reaction becomes safer than response, not because people are reckless, but because the underlying structure rewards it.

A way to understand this shift is through ego development.  Early on, most people rely on external feedback to regulate their sense of self.  Attention and reinforcement help stabilize an emergent identity.  This is normal, temporary, and usually invisible, and under supportive conditions that reliance softens, allowing ongoing ego development to become more internally anchored.  When those conditions are absent, the process can stall.

## The pressure of performance

Performance pressure changes those conditions in reliable yet subtle ways.  Once visibility is continuous, there&#39;s less room to disengage without consequence.  Attention has to be managed, silence has to be explained.  Over time, the space that ongoing ego development depends on gets crowded out by the need to remain legible, responsive, and present.

As the space for integration disappears, the ego adapts by turning outward again.  Identity is maintained through response and visibility rather than reflection.  Over time, performance stops feeling optional and becomes the only mode that consistently receives feedback, recognition, and relief from pressure.

Over time, this adaptation compresses.  Everything has to be answered, nothing gets to sit.  Reactionary performance closes off options until the ego is boxed in, with no clean way to pause or step back.  Listening to feedback becomes difficult, refusal stops reading as information and starts registering as pressure.

Eventually the corner closes.  With no room left to pause or revise, the ego can&#39;t maintain coherence through performance alone.  The result isn&#39;t always silence, sometimes it&#39;s escalation, overreach or the need to say something definitive.  You may be familiar with the old &#34;escalate to deescalate&#34; saying.  What collapses isn&#39;t belief, but the capacity to remain integrated under pressure.

## The collapse of integration

Once integration collapses, the pressure doesn&#39;t disappear.  It just looks for a new outlet.

While the responses vary, the shape remains familiar.  Escalation replaces reflection, withdrawal masquerades as clarity.  Performance hardens into something more transactional.  What these paths share is an attempt to regain psychological safety without reopening space.  Coherence is rebuilt outwardly, even as inward integration remains unavailable.

Another path out of collapse is grift.  Performance keeps going, but it changes shape.  Attention becomes stabilizing rather than incidental.  Some people who end up here may also have narcissistic pathology, but grift itself isn’t cleanly reducible to that: it emerges when identity can only be held together through external return.

Sometimes people don&#39;t collapse this way.  Usually it&#39;s because the environment gives them the opportunity to pause: silence isn&#39;t punished, and saying no doesn&#39;t threaten belonging.  Influence is shared rather than concentrated.  Under those conditions, performance loosens its grip, and integration doesn’t have to be outsourced to constant response.

The reason leaders disappoint us so often isn’t hard to find once you know what to look for.  Influence changes the shape of interaction, and performance gradually replaces integration as the primary stabilizer.  Over time, the space required for reflection and listening erodes.  What follows isn’t corruption so much as compression, and then collapse.  The disappointment comes from mistaking these outcomes for personal failure, when they’re often the trace left by a structure that no longer supports integration.
</source:markdown>
    </item>
    
    <item>
      <title>vm.overcommit_memory=2 is always the right setting for servers</title>
      <link>https://ariadne.space/2025/12/16/vmovercommitmemory-is-always-the-right.html</link>
      <pubDate>Tue, 16 Dec 2025 17:23:19 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2025/12/16/vmovercommitmemory-is-always-the-right.html</guid>
      <description>&lt;p&gt;The Linux kernel has a feature where you can tune the behavior of memory allocations: the &lt;code&gt;vm.overcommit_memory&lt;/code&gt; sysctl.  When overcommit is enabled (sadly, this is the default), the kernel will typically return a mapping when brk(2) or mmap(2) is called to increase a program&amp;rsquo;s heap size, regardless of whether or not memory is available.  Sounds good, right?&lt;/p&gt;
&lt;p&gt;Not really.  While overcommit is convenient for application developers, it fundamentally changes the contract of memory allocation: a successful allocation no longer represents an atomic acquisition of a real resource.  Instead, the returned mapping serves as a &lt;em&gt;deferred promise&lt;/em&gt;, which will only be fulfilled by the page fault handler if and when the memory is first accessed.  This is an important distinction, as it means overcommit effectively replaces a fail-fast transactional allocation model with a best-effort one where failures are only caught after the fact rather than at the point of allocation.&lt;/p&gt;
&lt;p&gt;To understand how this deferral works in practice, let&amp;rsquo;s consider what happens when a program calls malloc(3) to get a new memory allocation.  At a high level, the allocator calls brk(2) or mmap(2) to request additional virtual address space from the kernel, which is represented by virtual memory area objects, also known as VMAs.&lt;/p&gt;
&lt;p&gt;On a system where overcommit is disabled, the kernel ensures that enough backing memory is available to satisfy the request before allowing the allocation to succeed.  In contrast, when overcommit is enabled, the kernel simply allocates a VMA object without guaranteeing that backing memory is available: the mapping succeeds immediately, even though it is not known whether the request can ultimately be satisfied.&lt;/p&gt;
&lt;p&gt;The decoupling of success from backing memory availability makes allocation failures impossible to handle correctly.  Programs have no other option but to assume the allocation has succeeded before the kernel has actually determined whether the request can be fulfilled.  Disabling overcommit solves this problem by restoring admission control at allocation time, ensuring that allocations either fail immediately or succeed with a guarantee of backing memory.&lt;/p&gt;
&lt;h2 id=&#34;failure-locality-is-important-for-debugging&#34;&gt;Failure locality is important for debugging&lt;/h2&gt;
&lt;p&gt;When allocations fail fast, they are dramatically easier to debug, as the failure is synchronous with the request.  When a program crashes due to an allocation failure, the entire context of that allocation is preserved: the requested allocation size, the subsystem making the allocation and the underlying operation that required it are already known.&lt;/p&gt;
&lt;p&gt;With overcommit, this locality is lost by design.  Allocations appear to succeed and the program proceeds under the assumption that the memory is available.  When the allocation is eventually accessed, the kernel typically responds by invoking the OOM killer and terminating the process outright.  From the program&amp;rsquo;s perspective, there is no allocation failure to handle, only a &lt;code&gt;SIGKILL&lt;/code&gt;.  From the operator&amp;rsquo;s perspective, there is no stack trace pointing to the failure.  There are only post-mortem logs which often fail to paint a clear picture of what happened.&lt;/p&gt;
&lt;p&gt;Would you rather debug a crash at the allocation site or reconstruct an outage caused by an asynchronous OOM kill?  Overcommit doesn&amp;rsquo;t make allocation failure recoverable.  It makes it unreportable.&lt;/p&gt;
&lt;h2 id=&#34;dishonorable-mention-redis&#34;&gt;Dishonorable mention: Redis&lt;/h2&gt;
&lt;p&gt;So why am I writing about this, anyway?  The cost of overcommit isn&amp;rsquo;t just technical, it also represents bad engineering culture: shifting responsibility for correctness away from application developers and onto the kernel.  As an example, when you start Redis with overcommit disabled, it prints a scary warning that you should re-enable it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;WARNING Memory overcommit must be enabled!
Without it, a background save or replication may fail under low memory condition.
Being disabled, it can also cause failures without low memory condition, see &lt;a href=&#34;https://github.com/jemalloc/jemalloc/issues/1328&#34;&gt;https://github.com/jemalloc/jemalloc/issues/1328&lt;/a&gt;.
To fix this issue add &amp;lsquo;vm.overcommit_memory = 1&amp;rsquo; to /etc/sysctl.conf and then reboot or run the command &amp;lsquo;sysctl vm.overcommit_memory=1&amp;rsquo; for this to take effect.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;No.  Code that requires overcommit to function correctly is &lt;em&gt;failing to handle memory allocation errors correctly&lt;/em&gt;.  The answer is &lt;em&gt;not&lt;/em&gt; to print a warning that overcommit is disabled, but rather to surface low memory conditions explicitly so the system administrator can understand and resolve them.&lt;/p&gt;
</description>
      <source:markdown>The Linux kernel has a feature where you can tune the behavior of memory allocations: the `vm.overcommit_memory` sysctl.  When overcommit is enabled (sadly, this is the default), the kernel will typically return a mapping when brk(2) or mmap(2) is called to increase a program&#39;s heap size, regardless of whether or not memory is available.  Sounds good, right?

Not really.  While overcommit is convenient for application developers, it fundamentally changes the contract of memory allocation: a successful allocation no longer represents an atomic acquisition of a real resource.  Instead, the returned mapping serves as a *deferred promise*, which will only be fulfilled by the page fault handler if and when the memory is first accessed.  This is an important distinction, as it means overcommit effectively replaces a fail-fast transactional allocation model with a best-effort one where failures are only caught after the fact rather than at the point of allocation.

To understand how this deferral works in practice, let&#39;s consider what happens when a program calls malloc(3) to get a new memory allocation.  At a high level, the allocator calls brk(2) or mmap(2) to request additional virtual address space from the kernel, which is represented by virtual memory area objects, also known as VMAs.

On a system where overcommit is disabled, the kernel ensures that enough backing memory is available to satisfy the request before allowing the allocation to succeed.  In contrast, when overcommit is enabled, the kernel simply allocates a VMA object without guaranteeing that backing memory is available: the mapping succeeds immediately, even though it is not known whether the request can ultimately be satisfied.

The decoupling of success from backing memory availability makes allocation failures impossible to handle correctly.  Programs have no other option but to assume the allocation has succeeded before the kernel has actually determined whether the request can be fulfilled.  Disabling overcommit solves this problem by restoring admission control at allocation time, ensuring that allocations either fail immediately or succeed with a guarantee of backing memory.

## Failure locality is important for debugging

When allocations fail fast, they are dramatically easier to debug, as the failure is synchronous with the request.  When a program crashes due to an allocation failure, the entire context of that allocation is preserved: the requested allocation size, the subsystem making the allocation and the underlying operation that required it are already known.

With overcommit, this locality is lost by design.  Allocations appear to succeed and the program proceeds under the assumption that the memory is available.  When the allocation is eventually accessed, the kernel typically responds by invoking the OOM killer and terminating the process outright.  From the program&#39;s perspective, there is no allocation failure to handle, only a `SIGKILL`.  From the operator&#39;s perspective, there is no stack trace pointing to the failure.  There are only post-mortem logs which often fail to paint a clear picture of what happened.

Would you rather debug a crash at the allocation site or reconstruct an outage caused by an asynchronous OOM kill?  Overcommit doesn&#39;t make allocation failure recoverable.  It makes it unreportable.

## Dishonorable mention: Redis

So why am I writing about this, anyway?  The cost of overcommit isn&#39;t just technical, it also represents bad engineering culture: shifting responsibility for correctness away from application developers and onto the kernel.  As an example, when you start Redis with overcommit disabled, it prints a scary warning that you should re-enable it:

&gt; WARNING Memory overcommit must be enabled!
&gt; Without it, a background save or replication may fail under low memory condition.
&gt; Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328.
&gt; To fix this issue add &#39;vm.overcommit_memory = 1&#39; to /etc/sysctl.conf and then reboot or run the command &#39;sysctl vm.overcommit_memory=1&#39; for this to take effect.

No.  Code that requires overcommit to function correctly is *failing to handle memory allocation errors correctly*.  The answer is *not* to print a warning that overcommit is disabled, but rather to surface low memory conditions explicitly so the system administrator can understand and resolve them.
</source:markdown>
    </item>
    
    <item>
      <title>Rethinking sudo with object capabilities</title>
      <link>https://ariadne.space/2025/12/12/rethinking-sudo-with-object-capabilities.html</link>
      <pubDate>Fri, 12 Dec 2025 06:36:06 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2025/12/12/rethinking-sudo-with-object-capabilities.html</guid>
      <description>&lt;p&gt;I hate &lt;code&gt;sudo&lt;/code&gt; with a passion.  It represents everything I find offensive about the modern Unix security model:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;like &lt;code&gt;su&lt;/code&gt;, it must be a SUID binary to work&lt;/li&gt;
&lt;li&gt;it is monolithic: everything &lt;code&gt;sudo&lt;/code&gt; does runs as &lt;code&gt;root&lt;/code&gt;, there is no privilege separation&lt;/li&gt;
&lt;li&gt;it uses a non-declarative and non-hierarchical configuration format leading to forests of complex access-control policies and user errors due to lack of concision&lt;/li&gt;
&lt;li&gt;it supports plugins to extend the policy engine which run directly in the privileged SUID process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I could go on, but hopefully you get the point.  Alpine moved to &lt;code&gt;doas&lt;/code&gt; as the default privilege escalation tool several years ago, in Alpine 3.15, because of the large attack surface that &lt;code&gt;sudo&lt;/code&gt; brings due to its design.&lt;/p&gt;
&lt;p&gt;Systems built around identity-based access control tend to rely on ambient authority: policy is centralized and errors in the policy configuration or bugs in the policy engine can allow attackers to make full use of that ambient authority.  In the case of a SUID binary like &lt;code&gt;doas&lt;/code&gt; or &lt;code&gt;sudo&lt;/code&gt;, that means an attacker can obtain root access in the event of a bug or misconfiguration.&lt;/p&gt;
&lt;p&gt;What if there was a better way?  Instead of thinking about privilege escalation as becoming root for a moment, what if it meant being handed a narrowly scoped capability, one with just enough authority to perform a specific action and nothing more?  Enter the &lt;a href=&#34;https://en.wikipedia.org/wiki/Object-capability_model&#34;&gt;object-capability model&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In an object-capability system, there is no global decision point that asks who you are and what you might be allowed to do. Authority is explicit and local: a program can only perform an action if it has been given the capability to do so.  This makes privilege boundaries visible, composable, and far easier to reason about, shifting privilege escalation from a question of identity to a question of possession.&lt;/p&gt;
&lt;p&gt;Inspired by the object-capability model, I&amp;rsquo;ve been working on a project named &lt;a href=&#34;https://github.com/kaniini/capsudo&#34;&gt;capsudo&lt;/a&gt;.  Instead of treating privilege escalation as a temporary change of identity, capsudo reframes it as a mediated interaction with a service called &lt;code&gt;capsudod&lt;/code&gt; that holds specific authority, which may range from full root privileges to a narrowly scoped set of capabilities depending on how it is deployed.&lt;/p&gt;
&lt;h2 id=&#34;delegating-root-privilege-with-object-capabilities&#34;&gt;Delegating root privilege with object capabilities&lt;/h2&gt;
&lt;p&gt;What does that look like in practice?  First, let&amp;rsquo;s consider a system service which needs to perform a few privileged operations, such as mounting and unmounting filesystems, and how capsudo can be used to provide capabilities to that service.  With capsudo, we have a few different options.  We could, for example, grant generic &lt;code&gt;mount&lt;/code&gt; and &lt;code&gt;umount&lt;/code&gt; capabilities, or alternatively we could grant constrained &lt;code&gt;mount&lt;/code&gt; and &lt;code&gt;umount&lt;/code&gt; capabilities to specific device nodes instead.&lt;/p&gt;
&lt;p&gt;First, let&amp;rsquo;s take a look at what generic &lt;code&gt;mount&lt;/code&gt; and &lt;code&gt;umount&lt;/code&gt; capabilities would look like, as it is a good example for showing how constrained capabilities work.  To begin with, consider a volume management service running under the &lt;code&gt;mountd&lt;/code&gt; user.  We will grant capabilities to that &lt;code&gt;mountd&lt;/code&gt; user to then invoke using &lt;code&gt;capsudo&lt;/code&gt; by running a few instances of the &lt;code&gt;capsudod&lt;/code&gt; daemon.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;root# capsudod -o mountd:mountd -s /run/user/mountd/cap/mount -- mount &amp;amp;
root# capsudod -o mountd:mountd -s /run/user/mountd/cap/umount -- umount &amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You might notice that the capabilities above have had commands bound to them.  This is an important feature of capsudo which I will elaborate on in a moment.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s say that the user has plugged in a USB stick and wants it to be mounted to &lt;code&gt;/media/usb&lt;/code&gt;.  To do this with capsudo, the volume manager simply makes use of the &lt;code&gt;/run/user/mountd/cap/mount&lt;/code&gt; capability which has been delegated to it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;mountd$ capsudo -s /run/user/mountd/cap/mount -- /dev/sdb1 /media/usb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What is going on here?  When &lt;code&gt;capsudod&lt;/code&gt; was started, it bound the capability it provides to the &lt;code&gt;mount&lt;/code&gt; command by setting the executable it will run.  This means that the &lt;code&gt;/run/user/mountd/cap/mount&lt;/code&gt; capability cannot run any other command besides &lt;code&gt;mount&lt;/code&gt;.  This is technically a suboptimal delegation, however.  So let&amp;rsquo;s fix that delegation by stopping the &lt;code&gt;capsudod&lt;/code&gt; process and fixing it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;root# capsudod -s /run/user/mountd/cap/mount -- /usr/sbin/mount
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now when the capability is invoked, &lt;code&gt;capsudo&lt;/code&gt; will run &lt;code&gt;/usr/sbin/mount&lt;/code&gt; directly rather than checking the &lt;code&gt;PATH&lt;/code&gt; environment variable it was spawned with.&lt;/p&gt;
&lt;p&gt;We can build on this by creating a specific capability for mounting the device node, and another for unmounting the specific mount point:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;root# capsudod -s /run/user/mountd/cap/mount-dev-sdb1 -- /usr/sbin/mount /dev/sdb1
root# capsudod -s /run/user/mountd/cap/umount-media-usb -- /usr/sbin/umount /media/usb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;These would then be invoked as one would expect:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;mountd$ capsudo -s /run/user/mountd/cap/mount-dev-sdb1 -- /media/usb
mountd$ capsudo -s /run/user/mountd/cap/umount-media-usb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;So in essence a &lt;em&gt;capability&lt;/em&gt; is represented as a Unix socket, an optional argv list and optionally a set of mandatory environmental variables, creating a very composable interface for delegating authority.&lt;/p&gt;
&lt;h2 id=&#34;non-root-delegations-or-service-accounts-meet-service-capabilities&#34;&gt;Non-root delegations, or service accounts meet service capabilities&lt;/h2&gt;
&lt;p&gt;Now let&amp;rsquo;s talk about a scenario where traditionally root privilege is not required: service accounts.&lt;/p&gt;
&lt;p&gt;Suppose we have a web application deployment system where developers are allowed to update files in a specific directory and restart a service, but otherwise shouldn&amp;rsquo;t have administrative access to the system. Traditionally, this might still be implemented using &lt;code&gt;sudo&lt;/code&gt;, despite the fact that no global privileges are actually needed.&lt;/p&gt;
&lt;p&gt;With capsudo, we can instead run the &lt;code&gt;capsudod&lt;/code&gt; daemon under a dedicated service account which only owns the resources it is meant to manage.&lt;/p&gt;
&lt;p&gt;Assume a deployment service running under the &lt;code&gt;www-deployment&lt;/code&gt; user, which owns &lt;code&gt;/srv/www/app&lt;/code&gt; and is allowed to reload the uWSGI service via a capability delegated to the &lt;code&gt;www-deployment&lt;/code&gt; user.  We can start &lt;code&gt;capsudod&lt;/code&gt; instances under that user directly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;root# capsudod -o www-deployment:www-deployment &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;   -s /run/user/www-deployment/cap/service-uwsgi &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;   -- /usr/sbin/rc-service uwsgi &amp;amp;
www-deployment$ capsudod -o www-deployment:www-developers &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;   -s /run/user/www-deployment/cap/update-site -- /usr/bin/rsync -a &amp;amp;
www-deployment$ capsudod -o www-deployment:www-developers &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;   -s /run/user/www-deployment/cap/reload-site -- &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;   /usr/bin/capsudo -s /run/user/www-deployment/cap/service-uwsgi &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;      -- reload &amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A developer in the &lt;code&gt;www-developers&lt;/code&gt; group might then invoke these capabilities:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;dev$ capsudo -s /run/user/www-deployment/cap/update-site &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  -- ./build/ /srv/www/app/
dev$ capsudo -s /run/user/www-deployment/cap/reload-site
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;unpacking-the-delegations&#34;&gt;Unpacking the delegations&lt;/h3&gt;
&lt;p&gt;There is a lot going on here, so let&amp;rsquo;s walk through it step by step.&lt;/p&gt;
&lt;p&gt;First, the system administrator delegates a small amount of authority to the &lt;code&gt;www-deployment&lt;/code&gt; service account.  This is done by running a &lt;code&gt;capsudod&lt;/code&gt; instance that is able to manage the uWSGI service:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;root# capsudod -o www-deployment:www-deployment &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  -s /run/user/www-deployment/cap/service-uwsgi &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  -- /usr/sbin/rc-service uwsgi &amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This capability is owned by the &lt;code&gt;www-deployment&lt;/code&gt; user and allows exactly one operation: invoking the system&amp;rsquo;s service manager to act on the &lt;code&gt;uwsgi&lt;/code&gt; service.  No other services can be touched, and no other commands can be executed through this capability.&lt;/p&gt;
&lt;p&gt;Second, the &lt;code&gt;www-deployment&lt;/code&gt; account uses that authority to construct more narrowly scoped capabilities for others. Two additional &lt;code&gt;capsudod&lt;/code&gt; instances are started under the &lt;code&gt;www-deployment&lt;/code&gt; account, but with ownership granted to the &lt;code&gt;www-developers&lt;/code&gt; group:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;www-deployment$ capsudod -o www-deployment:www-developers &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;   -s /run/user/www-deployment/cap/update-site -- /usr/bin/rsync -a &amp;amp;

www-deployment$ capsudod -o www-deployment:www-developers &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;   -s /run/user/www-deployment/cap/reload-site -- &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;   /usr/bin/capsudo -s /run/user/www-deployment/cap/service-uwsgi -- reload &amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The first of these allows developers to update the application files using &lt;code&gt;rsync&lt;/code&gt;, but only through the &lt;code&gt;www-deployment&lt;/code&gt; account&amp;rsquo;s existing filesystem permissions.  The second is more interesting: it does not directly reload the service.  Instead, it &lt;em&gt;delegates a constrained use&lt;/em&gt; of the previously granted uWSGI capability.&lt;/p&gt;
&lt;p&gt;When a developer invokes &lt;code&gt;reload-site&lt;/code&gt;, they are not calling &lt;code&gt;rc-service&lt;/code&gt; themselves, and they are not interacting with the system service manager directly. They are invoking a capability that is itself &lt;em&gt;built on top of another capability&lt;/em&gt;, with additional constraints applied.&lt;/p&gt;
&lt;p&gt;The important property here is that authority only ever moves downward in scope.  It is possible to further delegate a subset of the authority granted by a capability, but not more.  This kind of authority layering is a natural fit for the object-capability model, but it is awkward and fragile to express with identity-based access control.&lt;/p&gt;
&lt;p&gt;Identity-based access control asks who should be allowed to act.  Object-capability systems ask where authority should live and how it should flow.  &lt;code&gt;capsudo&lt;/code&gt; today is an exploration of what happens when you take the second question seriously, treating privilege escalation as explicit delegation and opening the door to further refinements like passing concrete resources, such as pre-opened file descriptors, instead of whole identities.&lt;/p&gt;
</description>
      <source:markdown>I hate `sudo` with a passion.  It represents everything I find offensive about the modern Unix security model:

* like `su`, it must be a SUID binary to work
* it is monolithic: everything `sudo` does runs as `root`, there is no privilege separation
* it uses a non-declarative and non-hierarchical configuration format leading to forests of complex access-control policies and user errors due to lack of concision
* it supports plugins to extend the policy engine which run directly in the privileged SUID process

I could go on, but hopefully you get the point.  Alpine moved to `doas` as the default privilege escalation tool several years ago, in Alpine 3.15, because of the large attack surface that `sudo` brings due to its design.

Systems built around identity-based access control tend to rely on ambient authority: policy is centralized and errors in the policy configuration or bugs in the policy engine can allow attackers to make full use of that ambient authority.  In the case of a SUID binary like `doas` or `sudo`, that means an attacker can obtain root access in the event of a bug or misconfiguration.

What if there was a better way?  Instead of thinking about privilege escalation as becoming root for a moment, what if it meant being handed a narrowly scoped capability, one with just enough authority to perform a specific action and nothing more?  Enter the [object-capability model](https://en.wikipedia.org/wiki/Object-capability_model).

In an object-capability system, there is no global decision point that asks who you are and what you might be allowed to do. Authority is explicit and local: a program can only perform an action if it has been given the capability to do so.  This makes privilege boundaries visible, composable, and far easier to reason about, shifting privilege escalation from a question of identity to a question of possession.

Inspired by the object-capability model, I&#39;ve been working on a project named [capsudo][capsudo].  Instead of treating privilege escalation as a temporary change of identity, capsudo reframes it as a mediated interaction with a service called `capsudod` that holds specific authority, which may range from full root privileges to a narrowly scoped set of capabilities depending on how it is deployed.

   [capsudo]: https://github.com/kaniini/capsudo

## Delegating root privilege with object capabilities

What does that look like in practice?  First, let&#39;s consider a system service which needs to perform a few privileged operations, such as mounting and unmounting filesystems, and how capsudo can be used to provide capabilities to that service.  With capsudo, we have a few different options.  We could, for example, grant generic `mount` and `umount` capabilities, or alternatively we could grant constrained `mount` and `umount` capabilities to specific device nodes instead.

First, let&#39;s take a look at what generic `mount` and `umount` capabilities would look like, as it is a good example for showing how constrained capabilities work.  To begin with, consider a volume management service running under the `mountd` user.  We will grant capabilities to that `mountd` user to then invoke using `capsudo` by running a few instances of the `capsudod` daemon.

```sh
root# capsudod -o mountd:mountd -s /run/user/mountd/cap/mount -- mount &amp;
root# capsudod -o mountd:mountd -s /run/user/mountd/cap/umount -- umount &amp;
```
You might notice that the capabilities above have had commands bound to them.  This is an important feature of capsudo which I will elaborate on in a moment.

Let&#39;s say that the user has plugged in a USB stick and wants it to be mounted to `/media/usb`.  To do this with capsudo, the volume manager simply makes use of the `/run/user/mountd/cap/mount` capability which has been delegated to it:

```sh
mountd$ capsudo -s /run/user/mountd/cap/mount -- /dev/sdb1 /media/usb
```
What is going on here?  When `capsudod` was started, it bound the capability it provides to the `mount` command by setting the executable it will run.  This means that the `/run/user/mountd/cap/mount` capability cannot run any other command besides `mount`.  This is technically a suboptimal delegation, however.  So let&#39;s fix that delegation by stopping the `capsudod` process and fixing it:

```sh
root# capsudod -s /run/user/mountd/cap/mount -- /usr/sbin/mount
```
Now when the capability is invoked, `capsudo` will run `/usr/sbin/mount` directly rather than checking the `PATH` environment variable it was spawned with.

We can build on this by creating a specific capability for mounting the device node, and another for unmounting the specific mount point:

```sh
root# capsudod -s /run/user/mountd/cap/mount-dev-sdb1 -- /usr/sbin/mount /dev/sdb1
root# capsudod -s /run/user/mountd/cap/umount-media-usb -- /usr/sbin/umount /media/usb
```
These would then be invoked as one would expect:

```sh
mountd$ capsudo -s /run/user/mountd/cap/mount-dev-sdb1 -- /media/usb
mountd$ capsudo -s /run/user/mountd/cap/umount-media-usb
```
So in essence a *capability* is represented as a Unix socket, an optional argv list and optionally a set of mandatory environmental variables, creating a very composable interface for delegating authority.

## Non-root delegations, or service accounts meet service capabilities

Now let&#39;s talk about a scenario where traditionally root privilege is not required: service accounts.

Suppose we have a web application deployment system where developers are allowed to update files in a specific directory and restart a service, but otherwise shouldn&#39;t have administrative access to the system. Traditionally, this might still be implemented using `sudo`, despite the fact that no global privileges are actually needed.

With capsudo, we can instead run the `capsudod` daemon under a dedicated service account which only owns the resources it is meant to manage.

Assume a deployment service running under the `www-deployment` user, which owns `/srv/www/app` and is allowed to reload the uWSGI service via a capability delegated to the `www-deployment` user.  We can start `capsudod` instances under that user directly:

```sh
root# capsudod -o www-deployment:www-deployment \
   -s /run/user/www-deployment/cap/service-uwsgi \
   -- /usr/sbin/rc-service uwsgi &amp;
www-deployment$ capsudod -o www-deployment:www-developers \
   -s /run/user/www-deployment/cap/update-site -- /usr/bin/rsync -a &amp;
www-deployment$ capsudod -o www-deployment:www-developers \
   -s /run/user/www-deployment/cap/reload-site -- \
   /usr/bin/capsudo -s /run/user/www-deployment/cap/service-uwsgi \
      -- reload &amp;
```
A developer in the `www-developers` group might then invoke these capabilities:

```sh
dev$ capsudo -s /run/user/www-deployment/cap/update-site \
  -- ./build/ /srv/www/app/
dev$ capsudo -s /run/user/www-deployment/cap/reload-site
```
### Unpacking the delegations

There is a lot going on here, so let&#39;s walk through it step by step.

First, the system administrator delegates a small amount of authority to the `www-deployment` service account.  This is done by running a `capsudod` instance that is able to manage the uWSGI service:

```sh
root# capsudod -o www-deployment:www-deployment \
  -s /run/user/www-deployment/cap/service-uwsgi \
  -- /usr/sbin/rc-service uwsgi &amp;
```
This capability is owned by the `www-deployment` user and allows exactly one operation: invoking the system&#39;s service manager to act on the `uwsgi` service.  No other services can be touched, and no other commands can be executed through this capability.

Second, the `www-deployment` account uses that authority to construct more narrowly scoped capabilities for others. Two additional `capsudod` instances are started under the `www-deployment` account, but with ownership granted to the `www-developers` group:

```sh
www-deployment$ capsudod -o www-deployment:www-developers \
   -s /run/user/www-deployment/cap/update-site -- /usr/bin/rsync -a &amp;

www-deployment$ capsudod -o www-deployment:www-developers \
   -s /run/user/www-deployment/cap/reload-site -- \
   /usr/bin/capsudo -s /run/user/www-deployment/cap/service-uwsgi -- reload &amp;
```
The first of these allows developers to update the application files using `rsync`, but only through the `www-deployment` account&#39;s existing filesystem permissions.  The second is more interesting: it does not directly reload the service.  Instead, it *delegates a constrained use* of the previously granted uWSGI capability.

When a developer invokes `reload-site`, they are not calling `rc-service` themselves, and they are not interacting with the system service manager directly. They are invoking a capability that is itself *built on top of another capability*, with additional constraints applied.

The important property here is that authority only ever moves downward in scope.  It is possible to further delegate a subset of the authority granted by a capability, but not more.  This kind of authority layering is a natural fit for the object-capability model, but it is awkward and fragile to express with identity-based access control.

Identity-based access control asks who should be allowed to act.  Object-capability systems ask where authority should live and how it should flow.  `capsudo` today is an exploration of what happens when you take the second question seriously, treating privilege escalation as explicit delegation and opening the door to further refinements like passing concrete resources, such as pre-opened file descriptors, instead of whole identities.
</source:markdown>
    </item>
    
    <item>
      <title>I want you to understand</title>
      <link>https://ariadne.space/2025/12/02/i-want-you-to-understand.html</link>
      <pubDate>Tue, 02 Dec 2025 20:22:23 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2025/12/02/i-want-you-to-understand.html</guid>
      <description>&lt;p&gt;I want you to &lt;a href=&#34;https://aphyr.com/posts/397-i-want-you-to-understand-chicago&#34;&gt;understand&lt;/a&gt; what it is like to be transgender during this time.&lt;/p&gt;
&lt;p&gt;I want you to understand the threat to doctor-patient confidentiality.  In June, the Department of Justice &lt;a href=&#34;https://www.healthlawadvisor.com/doj-subpoena-seeks-health-information-of-hospital-patients-receiving-gender-affirming-care-will-judge-grant-motion-to-quash&#34;&gt;began targeting clinics and health systems which provide treatment for gender dysphoria&lt;/a&gt; with subpoenas requesting personally identifying information about patients.  While these subpoenas currently target clinics which provide services to minors, it is clear that they are testing the waters for expanding their inquiry to adult patients.  Although compliance with these subpoenas is likely illegal as disclosure of these records would violate HIPAA, I worry that I will be included on a &lt;a href=&#34;https://www.annefrank.org/en/timeline/147/all-jews-need-to-register/&#34;&gt;list of transgender individuals&lt;/a&gt; and targeted for discrimination as a result.&lt;/p&gt;
&lt;p&gt;I want you to understand the threat to medical care for trans people more broadly.  Like with the subpoenas, these efforts are starting &lt;a href=&#34;https://www.npr.org/sections/shots-health-news/2025/10/30/nx-s1-5588655/transgender-trump-medicare-medicaid-gender-affirming-care&#34;&gt;with trans children&lt;/a&gt;.  Although I am privileged to have private health insurance through my employer, private insurers often use Medicare coverage determination criteria as a baseline for their policies.  I worry that I could be denied access to medically necessary health care in the future.&lt;/p&gt;
&lt;p&gt;I want you to understand that one does not simply quit taking hormones.  Abruptly stopping HRT can leave the body in a hormonal state that may never fully return to baseline and potentially reverses some of the desired effects.  This outcome is often distressing, and loss of access to medical care may lead many to self-manage their HRT.  Due to confidentiality concerns, such self-managed treatment will likely not be monitored with lab work.  Managing hormone therapy &lt;a href=&#34;https://academic.oup.com/jcem/article/102/11/3869/4157558&#34;&gt;without proper medical supervision can be dangerous&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I want you to understand what it is like to travel as a transgender US citizen.  As a result of Trump&amp;rsquo;s Executive Order 14168, it is no longer possible for transgender people &lt;a href=&#34;https://travel.state.gov/content/travel/en/passports/passport-help/sex-marker.html&#34;&gt;to obtain a US passport that correctly reflects their gender presentation&lt;/a&gt;.  Traveling with identity documents that do not match your gender presentation can be dangerous abroad.  In some cases you can even be &lt;a href=&#34;https://www.travelguard.com/travel-resources/travel-safety/lgbtq-travel-safety/advice-for-transgender-and-non-binary-travelers&#34;&gt;denied entry or even deported&lt;/a&gt;.  Such policies &lt;a href=&#34;https://www.migrationpolicy.org/article/x-marker-trans-nonbinary-travelers&#34;&gt;discourage trans people from traveling&lt;/a&gt; due to fear of discrimination.&lt;/p&gt;
&lt;p&gt;I want you to understand what it is like to be a transgender worker.  A report from The Williams Institute of Law at UCLA shows that &lt;a href=&#34;https://williamsinstitute.law.ucla.edu/publications/transgender-workplace-discrim/&#34;&gt;over 80% of transgender employees in the US have experienced discrimination or harassment at work&lt;/a&gt; at some point.  Contrary to some optimistic portrayals during Pride Month, this is actually getting worse: the Movement Advancement Project 2025 NORC survey reports a &lt;a href=&#34;https://www.mapresearch.org/policy-and-issue-analysis/2025-norc-survey-report&#34;&gt;significant uptick in discrimination and harassment complaints&lt;/a&gt;.  If that wasn&amp;rsquo;t enough, Lambda Legal also reports a &lt;a href=&#34;https://lambdalegal.org/newsroom/us_20250626_record-breaking-surge-in-help-desk-requests-related-to-anti-lgbtq-discrimination-post-trump&#34;&gt;surge in the volume of requests submitted to their help desk&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I want you to understand what it is like to be a transgender entrepreneur.  Based on a report from Pitchbook, only &lt;a href=&#34;https://nvca.org/wp-content/uploads/2025/10/Q3-2025-PitchBook-NVCA-Venture-Monitor.pdf&#34;&gt;0.8% of venture capital funding went to female-founded companies in 2025&lt;/a&gt;, the lowest since 2015.  While we do not yet have data for LGBTQ founders in 2025, StartOut estimated that only &lt;a href=&#34;https://startout.org/wp-content/uploads/2023/09/2023-State-of-LGBTQ-Entrepreneurship-Report.pdf&#34;&gt;0.5% of companies which raised venture capital from 2000-2022&lt;/a&gt; were founded by LGBTQ founders.  These numbers plainly highlight ongoing social inequities.&lt;/p&gt;
&lt;p&gt;I want you to understand what it is like to be a transgender leader in open source.  While the open source community has made progress toward inclusion, a study by the Linux Foundation &lt;a href=&#34;https://8112310.fs1.hubspotusercontent-na1.net/hubfs/8112310/LF%20Research/2021%20DEI%20Survey%20-%20Report.pdf&#34;&gt;observes that people identifying as women, non-binary, LGBTQ+ or disabled were three times more likely&lt;/a&gt; to report threats.  Another study found that simply having a Code of Conduct did not make projects safer.  Without meaningful enforcement, &lt;a href=&#34;https://arxiv.org/pdf/2409.04511&#34;&gt;participants continued to experience harassment&lt;/a&gt;.  Even meaningful enforcement isn&amp;rsquo;t enough.  For example, after rejecting Xlibre in Alpine due to their reactionary background, a notable alt-right Linux podcaster made a video targeting me, focusing on my transgender identity rather than the technical merits.&lt;/p&gt;
&lt;p&gt;I need you to understand that while things are dire for trans people right now, we can fight back and win.  At the same time, we must confront these realities: human decency demands it.  Support politicians who fight anti-trans policies.  Donate to law firms like &lt;a href=&#34;https://lambdalegal.org&#34;&gt;Lambda Legal&lt;/a&gt;.  If you are a business owner, hire trans people: we have &lt;a href=&#34;https://en.wikipedia.org/wiki/Sophie_Wilson&#34;&gt;been driving innovation since time immemorial&lt;/a&gt;.  If you are an investor, invest in trans founders: the same StartOut report that shows that only 0.5% of funded companies were founded by LGBTQ founders also observed that those founders created more jobs with less funding than their peers.&lt;/p&gt;
</description>
      <source:markdown>I want you to [understand][u-1] what it is like to be transgender during this time.

   [u-1]: https://aphyr.com/posts/397-i-want-you-to-understand-chicago

I want you to understand the threat to doctor-patient confidentiality.  In June, the Department of Justice [began targeting clinics and health systems which provide treatment for gender dysphoria][u-2] with subpoenas requesting personally identifying information about patients.  While these subpoenas currently target clinics which provide services to minors, it is clear that they are testing the waters for expanding their inquiry to adult patients.  Although compliance with these subpoenas is likely illegal as disclosure of these records would violate HIPAA, I worry that I will be included on a [list of transgender individuals][u-3] and targeted for discrimination as a result.

   [u-2]: https://www.healthlawadvisor.com/doj-subpoena-seeks-health-information-of-hospital-patients-receiving-gender-affirming-care-will-judge-grant-motion-to-quash
   [u-3]: https://www.annefrank.org/en/timeline/147/all-jews-need-to-register/

I want you to understand the threat to medical care for trans people more broadly.  Like with the subpoenas, these efforts are starting [with trans children][u-4].  Although I am privileged to have private health insurance through my employer, private insurers often use Medicare coverage determination criteria as a baseline for their policies.  I worry that I could be denied access to medically necessary health care in the future.

   [u-4]: https://www.npr.org/sections/shots-health-news/2025/10/30/nx-s1-5588655/transgender-trump-medicare-medicaid-gender-affirming-care

I want you to understand that one does not simply quit taking hormones.  Abruptly stopping HRT can leave the body in a hormonal state that may never fully return to baseline and potentially reverses some of the desired effects.  This outcome is often distressing, and loss of access to medical care may lead many to self-manage their HRT.  Due to confidentiality concerns, such self-managed treatment will likely not be monitored with lab work.  Managing hormone therapy [without proper medical supervision can be dangerous][u-5].

   [u-5]: https://academic.oup.com/jcem/article/102/11/3869/4157558

I want you to understand what it is like to travel as a transgender US citizen.  As a result of Trump&#39;s Executive Order 14168, it is no longer possible for transgender people [to obtain a US passport that correctly reflects their gender presentation][u-6].  Traveling with identity documents that do not match your gender presentation can be dangerous abroad.  In some cases you can even be [denied entry or even deported][u-7].  Such policies [discourage trans people from traveling][u-8] due to fear of discrimination.

   [u-6]: https://travel.state.gov/content/travel/en/passports/passport-help/sex-marker.html
   [u-7]: https://www.travelguard.com/travel-resources/travel-safety/lgbtq-travel-safety/advice-for-transgender-and-non-binary-travelers
   [u-8]: https://www.migrationpolicy.org/article/x-marker-trans-nonbinary-travelers

I want you to understand what it is like to be a transgender worker.  A report from The Williams Institute of Law at UCLA shows that [over 80% of transgender employees in the US have experienced discrimination or harassment at work][u-11] at some point.  Contrary to some optimistic portrayals during Pride Month, this is actually getting worse: the Movement Advancement Project 2025 NORC survey reports a [significant uptick in discrimination and harassment complaints][u-12].  If that wasn&#39;t enough, Lambda Legal also reports a [surge in the volume of requests submitted to their help desk][u-13].

   [u-11]: https://williamsinstitute.law.ucla.edu/publications/transgender-workplace-discrim/
   [u-12]: https://www.mapresearch.org/policy-and-issue-analysis/2025-norc-survey-report
   [u-13]: https://lambdalegal.org/newsroom/us_20250626_record-breaking-surge-in-help-desk-requests-related-to-anti-lgbtq-discrimination-post-trump

I want you to understand what it is like to be a transgender entrepreneur.  Based on a report from Pitchbook, only [0.8% of venture capital funding went to female-founded companies in 2025][u-9], the lowest since 2015.  While we do not yet have data for LGBTQ founders in 2025, StartOut estimated that only [0.5% of companies which raised venture capital from 2000-2022][u-10] were founded by LGBTQ founders.  These numbers plainly highlight ongoing social inequities.

   [u-9]: https://nvca.org/wp-content/uploads/2025/10/Q3-2025-PitchBook-NVCA-Venture-Monitor.pdf
   [u-10]: https://startout.org/wp-content/uploads/2023/09/2023-State-of-LGBTQ-Entrepreneurship-Report.pdf

I want you to understand what it is like to be a transgender leader in open source.  While the open source community has made progress toward inclusion, a study by the Linux Foundation [observes that people identifying as women, non-binary, LGBTQ+ or disabled were three times more likely][u-14] to report threats.  Another study found that simply having a Code of Conduct did not make projects safer.  Without meaningful enforcement, [participants continued to experience harassment][u-15].  Even meaningful enforcement isn&#39;t enough.  For example, after rejecting Xlibre in Alpine due to their reactionary background, a notable alt-right Linux podcaster made a video targeting me, focusing on my transgender identity rather than the technical merits.

   [u-14]: https://8112310.fs1.hubspotusercontent-na1.net/hubfs/8112310/LF%20Research/2021%20DEI%20Survey%20-%20Report.pdf
   [u-15]: https://arxiv.org/pdf/2409.04511

I need you to understand that while things are dire for trans people right now, we can fight back and win.  At the same time, we must confront these realities: human decency demands it.  Support politicians who fight anti-trans policies.  Donate to law firms like [Lambda Legal][u-16].  If you are a business owner, hire trans people: we have [been driving innovation since time immemorial][u-17].  If you are an investor, invest in trans founders: the same StartOut report that shows that only 0.5% of funded companies were founded by LGBTQ founders also observed that those founders created more jobs with less funding than their peers.

   [u-16]: https://lambdalegal.org
   [u-17]: https://en.wikipedia.org/wiki/Sophie_Wilson
</source:markdown>
    </item>
    
    <item>
      <title>Two weeks of wayback</title>
      <link>https://ariadne.space/2025/07/07/two-weeks-of-wayback.html</link>
      <pubDate>Mon, 07 Jul 2025 21:28:10 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2025/07/07/two-weeks-of-wayback.html</guid>
      <description>&lt;p&gt;A poorly kept secret is that the X11 graphics stack is under-maintained as resources shift towards the maintenance of Wayland&amp;rsquo;s graphics stack instead.  To some extent, technical steering committees in major distributions have been watching this situation develop for the past few years with increasing concern, as limited maintenance becomes a security risk: bugs accumulate and already burdened distribution security teams have to carry the security maintenance load in an absence of new releases.&lt;/p&gt;
&lt;p&gt;In Alpine, we have been discussing the sunset of the standalone X.org server implementation for several years for these reasons to come up with a strategy that allows us to keep supporting X11-based desktop environments in a world without the X.org server.  Recently, a group of neofascist reactionaries announced a fork of the X.org server which, amongst other things, &lt;a href=&#34;https://github.com/X11Libre/xserver/pull/56&#34;&gt;has introduced new security bugs into the X server they forked from X.org&lt;/a&gt;, which brought Alpine to a new crossroads in the general discussion we&amp;rsquo;ve been having about X11.  While Alpine has rejected this fork on the grounds that collaborating with neofascist reactionaries is fundamentally incompatible with our values, the overarching problem of X11 under-maintenance still persists, and is unlikely to change any time soon, leading us to begin directly looking for a solution.&lt;/p&gt;
&lt;h2 id=&#34;enter-wayback-just-enough-wayland-to-make-xwayland-work&#34;&gt;Enter Wayback: just enough Wayland to make Xwayland work&lt;/h2&gt;
&lt;p&gt;For the past year or so, the main idea circulating around the Alpine community to solve the X11 maintenance problem has been the creation of a stub Wayland compositor that can sit in front of Xwayland and act as a full X server.  Given the timing and desire to put the X11 maintenance issue to bed entirely, I decided to write a quick and dirty proof of concept over a weekend, sharing it on Mastodon and BlueSky:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&#34;https://ariadne.space/uploads/2025/a0f851df74c8f574.png&#34; width=&#34;600&#34; height=&#34;337&#34; alt=&#34;The first proof of concept of Wayback running on a 720p virtual output.  xterm shows the wayland plumbing underneath via wayland-info.&#34;&gt;
&lt;/center&gt;
&lt;p&gt;Since then, a lot has happened: we have been slowly putting the foundational pieces together to build a replacement X stack around Xwayland, and enough is now there for people with simple setups to use wayback as their daily X11 implementation, as long as they don&amp;rsquo;t mind bugs.&lt;/p&gt;
&lt;h2 id=&#34;towards-the-first-wayback-release&#34;&gt;Towards the first Wayback release&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s a lot still left to do before we can confidently say that Wayback is ready for distributions to switch to.  This work is across the stack: Wayback still needs to expose surfaces that Xwayland can use, Xwayland needs to implement a few new features such as cursor warping and some X extensions inside Xwayland itself need to be properly plumbed (such as Xinerama being able to make use of the Wayland output layout data).&lt;/p&gt;
&lt;p&gt;Longer term goals aside, we are at most a few weeks away from the first alpha-quality release of Wayback.  The main focus of this release is to get to a point where enough is working that users with basic setups and requirements can be reasonably served by Wayback in place of the X.org server, to allow for further testing.  It&amp;rsquo;s already to a point where I am daily driving it:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&#34;https://ariadne.space/uploads/2025/2025-07-07-211610-3840x2160-scrot.png&#34; width=&#34;600&#34; height=&#34;337&#34; alt=&#34;Wayback running Window Maker on bare metal, as well as several X applications, including Firefox editing my blog.&#34;&gt;
&lt;/center&gt;
&lt;p&gt;Of course, while the first release is coming soon, the project remains in an experimental state, and the first release will itself be experimental, but we&amp;rsquo;re making real progress towards a sustainable solution for the X11 problem.  Come join us in IRC (irc.libera.chat #wayback) or Matrix (#wayback:catircservices.org)!  Unlike other projects, we are focused on building real solutions rather than fascism.&lt;/p&gt;
</description>
      <source:markdown>A poorly kept secret is that the X11 graphics stack is under-maintained as resources shift towards the maintenance of Wayland&#39;s graphics stack instead.  To some extent, technical steering committees in major distributions have been watching this situation develop for the past few years with increasing concern, as limited maintenance becomes a security risk: bugs accumulate and already burdened distribution security teams have to carry the security maintenance load in an absence of new releases.

In Alpine, we have been discussing the sunset of the standalone X.org server implementation for several years for these reasons to come up with a strategy that allows us to keep supporting X11-based desktop environments in a world without the X.org server.  Recently, a group of neofascist reactionaries announced a fork of the X.org server which, amongst other things, [has introduced new security bugs into the X server they forked from X.org](https://github.com/X11Libre/xserver/pull/56), which brought Alpine to a new crossroads in the general discussion we&#39;ve been having about X11.  While Alpine has rejected this fork on the grounds that collaborating with neofascist reactionaries is fundamentally incompatible with our values, the overarching problem of X11 under-maintenance still persists, and is unlikely to change any time soon, leading us to begin directly looking for a solution.

## Enter Wayback: just enough Wayland to make Xwayland work

For the past year or so, the main idea circulating around the Alpine community to solve the X11 maintenance problem has been the creation of a stub Wayland compositor that can sit in front of Xwayland and act as a full X server.  Given the timing and desire to put the X11 maintenance issue to bed entirely, I decided to write a quick and dirty proof of concept over a weekend, sharing it on Mastodon and BlueSky:

&lt;center&gt;
&lt;img src=&#34;https://ariadne.space/uploads/2025/a0f851df74c8f574.png&#34; width=&#34;600&#34; height=&#34;337&#34; alt=&#34;The first proof of concept of Wayback running on a 720p virtual output.  xterm shows the wayland plumbing underneath via wayland-info.&#34;&gt;
&lt;/center&gt;

Since then, a lot has happened: we have been slowly putting the foundational pieces together to build a replacement X stack around Xwayland, and enough is now there for people with simple setups to use wayback as their daily X11 implementation, as long as they don&#39;t mind bugs.

## Towards the first Wayback release

There&#39;s a lot still left to do before we can confidently say that Wayback is ready for distributions to switch to.  This work is across the stack: Wayback still needs to expose surfaces that Xwayland can use, Xwayland needs to implement a few new features such as cursor warping and some X extensions inside Xwayland itself need to be properly plumbed (such as Xinerama being able to make use of the Wayland output layout data).

Longer term goals aside, we are at most a few weeks away from the first alpha-quality release of Wayback.  The main focus of this release is to get to a point where enough is working that users with basic setups and requirements can be reasonably served by Wayback in place of the X.org server, to allow for further testing.  It&#39;s already to a point where I am daily driving it:

&lt;center&gt;
&lt;img src=&#34;https://ariadne.space/uploads/2025/2025-07-07-211610-3840x2160-scrot.png&#34; width=&#34;600&#34; height=&#34;337&#34; alt=&#34;Wayback running Window Maker on bare metal, as well as several X applications, including Firefox editing my blog.&#34;&gt;
&lt;/center&gt;

Of course, while the first release is coming soon, the project remains in an experimental state, and the first release will itself be experimental, but we&#39;re making real progress towards a sustainable solution for the X11 problem.  Come join us in IRC (irc.libera.chat #wayback) or Matrix (#wayback:catircservices.org)!  Unlike other projects, we are focused on building real solutions rather than fascism.
</source:markdown>
    </item>
    
    <item>
      <title>C SBOMs, and how pkgconf can solve this problem</title>
      <link>https://ariadne.space/2025/02/08/c-sboms-and-how-pkgconf.html</link>
      <pubDate>Sat, 08 Feb 2025 20:45:37 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2025/02/08/c-sboms-and-how-pkgconf.html</guid>
      <description>&lt;p&gt;I recently attended FOSDEM, and saw a talk in the &lt;a href=&#34;https://fosdem.org/2025/schedule/event/fosdem-2025-4846-struggles-with-making-sboms-for-c-apps/&#34;&gt;SBOM devroom about a software engineer&amp;rsquo;s attempts to build an SBOM for a C project&lt;/a&gt;.
There are a number of reasons why the C ecosystem is difficult to reflect in SBOMs, but the largest problem is that the C ecosystem is fractured across a handful of build systems: GNU Autotools, CMake and Meson are the primary build systems used by projects but there are hundreds of others in the long tail.&lt;/p&gt;
&lt;p&gt;A key thing that these build systems have in common is that they can integrate with pkg-config, which is a database that describes available build dependencies and their use.
This database, naturally, is of significant relevance to SBOM generation, because it already has most of the relevant information needed to generate an SBOM.&lt;/p&gt;
&lt;p&gt;pkgconf has a &lt;a href=&#34;https://github.com/pkgconf/pkgconf/blob/master/cli/bomtool/main.c&#34;&gt;bomtool utility which is intended to generate SBOMs using pkg-config data&lt;/a&gt;.
But how can this be leveraged in practice?&lt;/p&gt;
&lt;p&gt;To show how bomtool can be used to generate SBOMs, lets make a simple project using Meson.  First, we need a simple program:&lt;/p&gt;
&lt;h4 id=&#34;mainc&#34;&gt;&lt;code&gt;main.c&lt;/code&gt;&lt;/h4&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-c&#34; data-lang=&#34;c&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;glib.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;main&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; argc, &lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;argv[])
{
        g_print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;hello world&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;);
        &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can compile this program by hand:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;~/bomtool-example $ gcc -o main main.c `pkg-config --cflags --libs glib-2.0`
~/bomtool-example $ ./main
hello world
&lt;/code&gt;&lt;/pre&gt;&lt;h4 id=&#34;mesonbuild&#34;&gt;&lt;code&gt;meson.build&lt;/code&gt;&lt;/h4&gt;
&lt;p&gt;Now we can generate a Meson project to build this program:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-meson&#34; data-lang=&#34;meson&#34;&gt;project(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;bomtool-example&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;c&amp;#39;&lt;/span&gt;)

glib &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; dependency(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;glib-2.0&amp;#39;&lt;/span&gt;)

exe &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; executable(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;bomtool-example&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;main.c&amp;#39;&lt;/span&gt;, dependencies: [glib])
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;At that point, you can run &lt;code&gt;meson setup _build &amp;amp;&amp;amp; cd _build &amp;amp;&amp;amp; ninja&lt;/code&gt; and get a &lt;code&gt;bomtool-example&lt;/code&gt; binary that effectively matches the one built by hand earlier.&lt;/p&gt;
&lt;p&gt;How do we turn that into an SBOM though?  Well, we need to extend the Meson build script to generate a pkg-config module, which we can do by adding:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-meson&#34; data-lang=&#34;meson&#34;&gt;pkg &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; import(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;pkgconfig&amp;#39;&lt;/span&gt;)
pcfile &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; pkg.generate(name: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;bomtool-example&amp;#39;&lt;/span&gt;, filebase: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;bomtool-example&amp;#39;&lt;/span&gt;, description: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;bomtool example&amp;#39;&lt;/span&gt;, version: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;0.0.1&amp;#39;&lt;/span&gt;, requires: [glib])
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This causes the following &lt;code&gt;.pc&lt;/code&gt; file to be generated:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-pc&#34; data-lang=&#34;pc&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;prefix&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;/usr/local&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;includedir&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;prefix&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;/include&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;Name&lt;/span&gt;: bomtool-example
&lt;span style=&#34;color:#f92672&#34;&gt;Description&lt;/span&gt;: bomtool example
&lt;span style=&#34;color:#f92672&#34;&gt;Version&lt;/span&gt;: 0.0.1
&lt;span style=&#34;color:#f92672&#34;&gt;Requires&lt;/span&gt;: glib-2.0
&lt;span style=&#34;color:#f92672&#34;&gt;Cflags&lt;/span&gt;: -I&lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;includedir&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now we can use bomtool to generate an SBOM:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;~/bomtool-example/build $ bomtool ./meson-private/bomtool-example.pc &amp;gt; bomtool-example.spdx.txt
~/bomtool-example/build $ tail -n &lt;span style=&#34;color:#ae81ff&#34;&gt;10&lt;/span&gt; bomtool-example.spdx.txt 
Relationship: SPDXRef-Package-glib-2.0C642.82.4 DEPENDENCY_OF SPDXRef-Package-bomtool-exampleC640.0.1


Relationship: SPDXRef-Package-glib-2.0C642.82.4 DEPENDS_ON SPDXRef-Package-libpcre2-8C6410.43
Relationship: SPDXRef-Package-libpcre2-8C6410.43 DEV_DEPENDENCY_OF SPDXRef-Package-glib-2.0C642.82.4


Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-bomtool-exampleC640.0.1
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-glib-2.0C642.82.4
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-libpcre2-8C6410.43
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;What is left to do?  A few things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Build systems like CMake and Meson should become aware of bomtool and leverage it automatically.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;pkg-config module authors should add SPDX license expressions to their pkg-config modules, support for this was added in pkgconf 1.9, so it is a reasonably stable feature now.  This will improve the quality of the SBOMs generated by bomtool.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Support for output formats that are more useful such as the new SPDX 3 JSON-LD format, CycloneDX, etc.  Tools exist today which allow for translation between these formats, however, so it is not a huge requirement.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is pretty clear, at least to me that the pkg-config ecosystem has a large role to play in the future of C SBOMs, as the necessary information about dependencies and other relationships are richly expressed at this layer.
But at the same time, bomtool is still new, and .pc files are still being updated to reflect their projects&#39; license data.&lt;/p&gt;
</description>
      <source:markdown>I recently attended FOSDEM, and saw a talk in the [SBOM devroom about a software engineer&#39;s attempts to build an SBOM for a C project](https://fosdem.org/2025/schedule/event/fosdem-2025-4846-struggles-with-making-sboms-for-c-apps/).
There are a number of reasons why the C ecosystem is difficult to reflect in SBOMs, but the largest problem is that the C ecosystem is fractured across a handful of build systems: GNU Autotools, CMake and Meson are the primary build systems used by projects but there are hundreds of others in the long tail.

A key thing that these build systems have in common is that they can integrate with pkg-config, which is a database that describes available build dependencies and their use.
This database, naturally, is of significant relevance to SBOM generation, because it already has most of the relevant information needed to generate an SBOM.

pkgconf has a [bomtool utility which is intended to generate SBOMs using pkg-config data](https://github.com/pkgconf/pkgconf/blob/master/cli/bomtool/main.c).
But how can this be leveraged in practice?

To show how bomtool can be used to generate SBOMs, lets make a simple project using Meson.  First, we need a simple program:

#### `main.c`

```c
#include &lt;glib.h&gt;

int main(int argc, const char *argv[])
{
        g_print(&#34;hello world\n&#34;);
        return 0;
}
```
Now we can compile this program by hand:

```
~/bomtool-example $ gcc -o main main.c `pkg-config --cflags --libs glib-2.0`
~/bomtool-example $ ./main
hello world
```
#### `meson.build`

Now we can generate a Meson project to build this program:

```meson
project(&#39;bomtool-example&#39;, &#39;c&#39;)

glib = dependency(&#39;glib-2.0&#39;)

exe = executable(&#39;bomtool-example&#39;, &#39;main.c&#39;, dependencies: [glib])
```
At that point, you can run `meson setup _build &amp;&amp; cd _build &amp;&amp; ninja` and get a `bomtool-example` binary that effectively matches the one built by hand earlier.

How do we turn that into an SBOM though?  Well, we need to extend the Meson build script to generate a pkg-config module, which we can do by adding:

```meson
pkg = import(&#39;pkgconfig&#39;)
pcfile = pkg.generate(name: &#39;bomtool-example&#39;, filebase: &#39;bomtool-example&#39;, description: &#39;bomtool example&#39;, version: &#39;0.0.1&#39;, requires: [glib])
```
This causes the following `.pc` file to be generated:

```pc
prefix=/usr/local
includedir=${prefix}/include

Name: bomtool-example
Description: bomtool example
Version: 0.0.1
Requires: glib-2.0
Cflags: -I${includedir}
```
Now we can use bomtool to generate an SBOM:

```shell
~/bomtool-example/build $ bomtool ./meson-private/bomtool-example.pc &gt; bomtool-example.spdx.txt
~/bomtool-example/build $ tail -n 10 bomtool-example.spdx.txt 
Relationship: SPDXRef-Package-glib-2.0C642.82.4 DEPENDENCY_OF SPDXRef-Package-bomtool-exampleC640.0.1


Relationship: SPDXRef-Package-glib-2.0C642.82.4 DEPENDS_ON SPDXRef-Package-libpcre2-8C6410.43
Relationship: SPDXRef-Package-libpcre2-8C6410.43 DEV_DEPENDENCY_OF SPDXRef-Package-glib-2.0C642.82.4


Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-bomtool-exampleC640.0.1
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-glib-2.0C642.82.4
Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-libpcre2-8C6410.43
```
What is left to do?  A few things:

* Build systems like CMake and Meson should become aware of bomtool and leverage it automatically.

* pkg-config module authors should add SPDX license expressions to their pkg-config modules, support for this was added in pkgconf 1.9, so it is a reasonably stable feature now.  This will improve the quality of the SBOMs generated by bomtool.

* Support for output formats that are more useful such as the new SPDX 3 JSON-LD format, CycloneDX, etc.  Tools exist today which allow for translation between these formats, however, so it is not a huge requirement.

It is pretty clear, at least to me that the pkg-config ecosystem has a large role to play in the future of C SBOMs, as the necessary information about dependencies and other relationships are richly expressed at this layer.
But at the same time, bomtool is still new, and .pc files are still being updated to reflect their projects&#39; license data.
</source:markdown>
    </item>
    
    <item>
      <title>The XZ Utils backdoor is a symptom of a larger problem</title>
      <link>https://ariadne.space/2024/04/01/the-xz-utils-backdoor-is.html</link>
      <pubDate>Mon, 01 Apr 2024 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2024/04/02/the-xz-utils-backdoor-is.html</guid>
      <description>&lt;p&gt;On March 29th, Andres Freund &lt;a href=&#34;https://www.openwall.com/lists/oss-security/2024/03/29/4&#34;&gt;dropped a bombshell on the oss-security mailing list&lt;/a&gt;: recent XZ Utils source code tarball releases made by Jia Tan were released with a backdoor.
Thankfully, for multiple reasons, &lt;a href=&#34;https://alpinelinux.org/posts/XZ-backdoor-CVE-2024-3094.html&#34;&gt;Alpine was not impacted by this backdoor&lt;/a&gt;, despite the recent source code tarball releases being published in Alpine &lt;code&gt;edge&lt;/code&gt;.
But what lessons do we need to learn from this incident?&lt;/p&gt;
&lt;h2 id=&#34;the-software-supply-chain-is-not-real&#34;&gt;The software &amp;ldquo;supply chain&amp;rdquo; is not real&lt;/h2&gt;
&lt;p&gt;As a community of hackers, we have built an exhaustive commons of free software released under various free licenses such as the GPL and the Apache 2.0 license.
Software packages in this commons have taken over the corporate world, because it enabled more rapid innovation by allowing developers to focus more on the business logic of their applications, rather than low-level details.
This has been overall a good thing for society: from the open commons we have spawned a whole world of applications which have become the foundational bedrock of modern society.
It can certainly be argued that the invention of FOSS licensing models has been as revolutionary for the digital economy as the steam engine was for industry.&lt;/p&gt;
&lt;p&gt;There is one problem, however &amp;ndash; when we take software from the commons, we are like raccoons digging through a dumpster to find something useful.
There is no &amp;ldquo;supply chain&amp;rdquo; in reality, &lt;em&gt;but&lt;/em&gt; there is an effort by corporations which consume software from the commons to pretend there is one in order to shift the obligations related to ingesting third-party code away from themselves and to the original authors and maintainers of the code they are using.&lt;/p&gt;
&lt;p&gt;For there to be a &amp;ldquo;supply chain&amp;rdquo;, there must be a supplier, which in return requires a contractual relationship between two parties.
With software licensed under FOSS licensing terms, a consumer receives a non-exclusive license to make use of the software however they wish (in accordance with the license requirements, of course), but non-exclusive licenses cannot and do not imply a contractual supplier-consumer relationship.&lt;/p&gt;
&lt;p&gt;With that said, many of the proposals made by people working to improve security of the software &amp;ldquo;supply chain&amp;rdquo; have practical and valuable uses for protecting the integrity of the commons, and are worthy of further examination.&lt;/p&gt;
&lt;h2 id=&#34;junk-drawer-libraries-are-valuable-targets&#34;&gt;&amp;ldquo;Junk drawer&amp;rdquo; libraries are valuable targets&lt;/h2&gt;
&lt;p&gt;CVE-2024-3094 happened for a simple reason: &lt;a href=&#34;https://bugs.debian.org/778913&#34;&gt;distributions patching OpenSSH to support systemd&amp;rsquo;s readiness notifications&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Frequently, authors looking to add systemd readiness notifications to their software tend to just look for the &lt;code&gt;systemd&lt;/code&gt; pkg-config package, and use its &lt;code&gt;CFLAGS&lt;/code&gt; and &lt;code&gt;LIBS&lt;/code&gt;.
This results in the software linking to &lt;code&gt;libsystemd&lt;/code&gt;.
What does &lt;em&gt;that&lt;/em&gt; have to do with anything, after all we are talking about a backdoor in &lt;code&gt;liblzma&lt;/code&gt;, not &lt;code&gt;libsystemd&lt;/code&gt;?
Simple: although &lt;code&gt;sd_daemon()&lt;/code&gt; does not make use of any functionality in &lt;code&gt;liblzma&lt;/code&gt;, because &lt;code&gt;libsystemd&lt;/code&gt; has a &lt;code&gt;DT_DEPEND&lt;/code&gt; entry against &lt;code&gt;liblzma&lt;/code&gt;, it will pull in &lt;code&gt;liblzma&lt;/code&gt; as a shared object dependency.
Once that happens, the &lt;em&gt;constructor&lt;/em&gt; functions in &lt;code&gt;liblzma&lt;/code&gt; will run, as it is being loaded due to being in the dependency graph.&lt;/p&gt;
&lt;p&gt;What can be done about this?
A simple solution would be to start to split up the various &lt;code&gt;libsystemd&lt;/code&gt; routines into smaller packages.
This would allow for these packages to link against &lt;code&gt;libsystemd-daemon&lt;/code&gt; or similar instead, which would presumably not link against &lt;code&gt;liblzma&lt;/code&gt;, as it is unnecessary for readiness notifications.
The &lt;code&gt;systemd&lt;/code&gt; pkg-config package could be kept around as a metapackage pulling in the other libraries as a migration path.&lt;/p&gt;
&lt;p&gt;I call libraries which are large amalgamations of unrelated routines &amp;ldquo;junk drawer&amp;rdquo; libraries because they are basically the programming equivalent to a junk drawer: routines and dependencies accumulate over years and suddenly you have a mess of programs which depend on this library but only use some small portion of the library.
As these unnecessary dependencies accumulate, these &amp;ldquo;junk drawer&amp;rdquo; libraries become valuable points of interest when scouting for projects to compromise.
I would recommend auditing any of the other dependencies of systemd for possible backdoors for this reason.
There are a number of other libraries which could have been targeted in this way as well, which are also in the libsystemd dependency graph, such as PCRE.&lt;/p&gt;
&lt;h2 id=&#34;be-kind-to-software-maintainers&#34;&gt;Be kind to software maintainers&lt;/h2&gt;
&lt;p&gt;Although I am not certain that this lesson is particularly applicable to the xz-utils situation, since the actor who implemented the backdoor most likely made use of sockpuppet personas to advocate for his becoming a maintainer, the mental health of software maintainers is important.&lt;/p&gt;
&lt;p&gt;Directly what this means is that if you see somebody harassing a maintainer with specific demands, you should not join in on the thread.
Let the maintainer deal with it publicly, and reach out privately if you are concerned about the situation.
Otherwise, even if you are concerned about burnout or the maintainer overworking, you may wind up advocating for a threat actor to become a maintainer of something.&lt;/p&gt;
</description>
      <source:markdown>
On March 29th, Andres Freund [dropped a bombshell on the oss-security mailing list][0]: recent XZ Utils source code tarball releases made by Jia Tan were released with a backdoor.
Thankfully, for multiple reasons, [Alpine was not impacted by this backdoor][1], despite the recent source code tarball releases being published in Alpine `edge`.
But what lessons do we need to learn from this incident?

  [0]: https://www.openwall.com/lists/oss-security/2024/03/29/4
  [1]: https://alpinelinux.org/posts/XZ-backdoor-CVE-2024-3094.html

## The software &#34;supply chain&#34; is not real

As a community of hackers, we have built an exhaustive commons of free software released under various free licenses such as the GPL and the Apache 2.0 license.
Software packages in this commons have taken over the corporate world, because it enabled more rapid innovation by allowing developers to focus more on the business logic of their applications, rather than low-level details.
This has been overall a good thing for society: from the open commons we have spawned a whole world of applications which have become the foundational bedrock of modern society.
It can certainly be argued that the invention of FOSS licensing models has been as revolutionary for the digital economy as the steam engine was for industry.

There is one problem, however -- when we take software from the commons, we are like raccoons digging through a dumpster to find something useful.
There is no &#34;supply chain&#34; in reality, *but* there is an effort by corporations which consume software from the commons to pretend there is one in order to shift the obligations related to ingesting third-party code away from themselves and to the original authors and maintainers of the code they are using.

For there to be a &#34;supply chain&#34;, there must be a supplier, which in return requires a contractual relationship between two parties.
With software licensed under FOSS licensing terms, a consumer receives a non-exclusive license to make use of the software however they wish (in accordance with the license requirements, of course), but non-exclusive licenses cannot and do not imply a contractual supplier-consumer relationship.

With that said, many of the proposals made by people working to improve security of the software &#34;supply chain&#34; have practical and valuable uses for protecting the integrity of the commons, and are worthy of further examination.

## &#34;Junk drawer&#34; libraries are valuable targets

CVE-2024-3094 happened for a simple reason: [distributions patching OpenSSH to support systemd&#39;s readiness notifications][2].

  [2]: https://bugs.debian.org/778913

Frequently, authors looking to add systemd readiness notifications to their software tend to just look for the `systemd` pkg-config package, and use its `CFLAGS` and `LIBS`.
This results in the software linking to `libsystemd`.
What does *that* have to do with anything, after all we are talking about a backdoor in `liblzma`, not `libsystemd`?
Simple: although `sd_daemon()` does not make use of any functionality in `liblzma`, because `libsystemd` has a `DT_DEPEND` entry against `liblzma`, it will pull in `liblzma` as a shared object dependency.
Once that happens, the *constructor* functions in `liblzma` will run, as it is being loaded due to being in the dependency graph.

What can be done about this?
A simple solution would be to start to split up the various `libsystemd` routines into smaller packages.
This would allow for these packages to link against `libsystemd-daemon` or similar instead, which would presumably not link against `liblzma`, as it is unnecessary for readiness notifications.
The `systemd` pkg-config package could be kept around as a metapackage pulling in the other libraries as a migration path.

I call libraries which are large amalgamations of unrelated routines &#34;junk drawer&#34; libraries because they are basically the programming equivalent to a junk drawer: routines and dependencies accumulate over years and suddenly you have a mess of programs which depend on this library but only use some small portion of the library.
As these unnecessary dependencies accumulate, these &#34;junk drawer&#34; libraries become valuable points of interest when scouting for projects to compromise.
I would recommend auditing any of the other dependencies of systemd for possible backdoors for this reason.
There are a number of other libraries which could have been targeted in this way as well, which are also in the libsystemd dependency graph, such as PCRE.

## Be kind to software maintainers

Although I am not certain that this lesson is particularly applicable to the xz-utils situation, since the actor who implemented the backdoor most likely made use of sockpuppet personas to advocate for his becoming a maintainer, the mental health of software maintainers is important.

Directly what this means is that if you see somebody harassing a maintainer with specific demands, you should not join in on the thread.
Let the maintainer deal with it publicly, and reach out privately if you are concerned about the situation.
Otherwise, even if you are concerned about burnout or the maintainer overworking, you may wind up advocating for a threat actor to become a maintainer of something.
</source:markdown>
    </item>
    
    <item>
      <title>Most breaches actually begin in corp</title>
      <link>https://ariadne.space/2023/12/06/most-breaches-actually-begin-in.html</link>
      <pubDate>Wed, 06 Dec 2023 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2023/12/07/most-breaches-actually-begin-in.html</guid>
      <description>&lt;p&gt;Readers of my blog will note that while I believe Rust is an excellent
tool for developers to leverage when building software, that there is
a disconnect between the developers leveraging Rust features to improve
their software and many of the advocates who talk about the language,
which I believe is counterproductive when it comes to Rust advocacy.&lt;/p&gt;
&lt;p&gt;For example, I see &lt;a href=&#34;https://www.linkedin.com/feed/update/urn:li:activity:7138201685847453697/&#34;&gt;takes like these&lt;/a&gt; frequently, which generally
advocate that if &lt;em&gt;only&lt;/em&gt; we adopted memory safe languages, we would solve
all security problems in computing forever:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If it&amp;rsquo;s estimated that writing in a memory safe language prevented
750 vulnerabilities (in just one codebase!) and IBM calculated [1]
the average cost of a data breach is $4.45 million, that&amp;rsquo;s over
$3.3 &lt;em&gt;billion&lt;/em&gt; saved by moving to memory safety.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Don&amp;rsquo;t get me wrong: it sure would be nice to change to a memory safe
language and save $3.3 billion in losses, but in reality it&amp;rsquo;s far
more complicated than that.&lt;/p&gt;
&lt;p&gt;Every year, Verizon&amp;rsquo;s security group releases a &lt;a href=&#34;https://www.verizon.com/business/resources/Tbcb/reports/2023-data-breach-investigations-report-dbir.pdf&#34;&gt;Data Breaches
Investigation Report&lt;/a&gt;.  These reports are &lt;em&gt;fascinating&lt;/em&gt;
to read, and I highly recommend giving them a read if you&amp;rsquo;re
interested about the past year&amp;rsquo;s notable data breaches and
how they actually happened.&lt;/p&gt;
&lt;p&gt;What we learn from these reports is that, in general:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Over 70% of data breaches actually involve a human element
instead of a software vulnerability, for example a phishing
attack or a misconfiguration of a service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Almost 50% of data breaches actually involve compromised
credentials, such as leaked OAuth tokens which did not
expire.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Roughly 15% of data breaches have phishing as their root cause.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Only 5% of data breaches actually come from exploitation of
a software vulnerability.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Don&amp;rsquo;t get me wrong &amp;ndash; software vulnerabilities are bad and should be
fixed in an expedient manner, however, to circle back to the prior
example I quoted, if we are considering data breaches to have a price
tag of $4.45 million, and we are talking about 750 security incidents
in practice, then in reality only 38 of these incidents would have
the potential to have memory safety as their root cause, which is a
much smaller price tag of $169.1 million that could be attributed to
memory safety.&lt;/p&gt;
&lt;p&gt;The point is not that we shouldn&amp;rsquo;t refactor, or even rewrite software,
to improve its memory safety.  But we should be honest about why we are
doing it.  While memory safety &lt;em&gt;is&lt;/em&gt; important, the real benefit in
doing this refactoring work is to improve the &lt;em&gt;clarity&lt;/em&gt; of the underlying
software&amp;rsquo;s technical design: technical constraints can be enforced using
Rust&amp;rsquo;s trait system, for example &amp;ndash; a form of behavioral modeling.&lt;/p&gt;
&lt;p&gt;By leveraging features such as traits to enforce behavioral correctness
of the code you are writing, you wind up having a much better
vulnerability posture &lt;em&gt;overall&lt;/em&gt;, not just in the area of memory safety.
This is the reason why refactoring software to use code written in Rust
and other modern languages with these features is advantageous.&lt;/p&gt;
&lt;p&gt;This is a far more interesting story than the talking points about
memory safety I hear.  At this point, with features such as &lt;code&gt;FORTIFY&lt;/code&gt;
and Address Sanitization, it is possible to address memory safety
defects without having to go to such lengths to refactor pre-existing
code.&lt;/p&gt;
&lt;p&gt;Features like ASan do not even have to carry significant runtime
performance penalties.  To illustrate my point, Justine Tunney proposed
building a modified version of Alpine with ASan enabled in 2021 using
a production-tuned variant of &lt;a href=&#34;https://github.com/jart/cosmopolitan/blob/master/libc/intrin/asan.c&#34;&gt;her ASan runtime included in her
Cosmopolitan libc project&lt;/a&gt;.  It was estimated that enabling
ASan in conjunction with this variant of her ASan runtime would only
result in a 3 to 5% performance reduction over code that did not have
ASan enabled.  Adopting this work would have immediately derisked the
use of memory unsafe code in all packages as they would be built with
ASan by default.&lt;/p&gt;
&lt;p&gt;And, of course, even with the borrow checker, and traits, and type
enforcement, and the other code verification features provided by the
Rust compiler, you still have &lt;code&gt;unsafe{}&lt;/code&gt; blocks, and the Rust compiler
provides support for ASan as a mitigation for these blocks.  So you
&lt;em&gt;still&lt;/em&gt; really need ASan even in a memory safe world, because even when
you build such a thing with perfect memory safe abstractions over a
memory unsafe world, you really are still building on top of a memory
unsafe world.&lt;/p&gt;
&lt;p&gt;The point here isn&amp;rsquo;t that these abstractions are meaningless.  They do
provide significant harm reduction when working with otherwise memory
unsafe interfaces, but even the most perfect abstraction is still, by
its very nature of being an abstraction, leaky.  Instead, we should
recognize &lt;em&gt;why&lt;/em&gt; Rust improves memory safety, and how the techniques
which improve memory safety can also be used to enforce elements of
the underlying software&amp;rsquo;s design at compile time.  This is a much
better story than the handwaving I usually see about memory safety
from advocates.&lt;/p&gt;
</description>
      <source:markdown>
Readers of my blog will note that while I believe Rust is an excellent
tool for developers to leverage when building software, that there is
a disconnect between the developers leveraging Rust features to improve
their software and many of the advocates who talk about the language,
which I believe is counterproductive when it comes to Rust advocacy.

For example, I see [takes like these][linkedin] frequently, which generally
advocate that if *only* we adopted memory safe languages, we would solve
all security problems in computing forever:

&gt; If it&#39;s estimated that writing in a memory safe language prevented
&gt; 750 vulnerabilities (in just one codebase!) and IBM calculated [1]
&gt; the average cost of a data breach is $4.45 million, that&#39;s over
&gt; $3.3 *billion* saved by moving to memory safety. 

   [linkedin]: https://www.linkedin.com/feed/update/urn:li:activity:7138201685847453697/

Don&#39;t get me wrong: it sure would be nice to change to a memory safe
language and save $3.3 billion in losses, but in reality it&#39;s far
more complicated than that.

Every year, Verizon&#39;s security group releases a [Data Breaches
Investigation Report][dbir].  These reports are *fascinating*
to read, and I highly recommend giving them a read if you&#39;re
interested about the past year&#39;s notable data breaches and
how they actually happened.

   [dbir]: https://www.verizon.com/business/resources/Tbcb/reports/2023-data-breach-investigations-report-dbir.pdf

What we learn from these reports is that, in general:

 * Over 70% of data breaches actually involve a human element
   instead of a software vulnerability, for example a phishing
   attack or a misconfiguration of a service.

 * Almost 50% of data breaches actually involve compromised
   credentials, such as leaked OAuth tokens which did not
   expire.

 * Roughly 15% of data breaches have phishing as their root cause.

 * Only 5% of data breaches actually come from exploitation of
   a software vulnerability.

Don&#39;t get me wrong -- software vulnerabilities are bad and should be
fixed in an expedient manner, however, to circle back to the prior
example I quoted, if we are considering data breaches to have a price
tag of $4.45 million, and we are talking about 750 security incidents
in practice, then in reality only 38 of these incidents would have
the potential to have memory safety as their root cause, which is a
much smaller price tag of $169.1 million that could be attributed to
memory safety.

The point is not that we shouldn&#39;t refactor, or even rewrite software,
to improve its memory safety.  But we should be honest about why we are
doing it.  While memory safety *is* important, the real benefit in
doing this refactoring work is to improve the *clarity* of the underlying
software&#39;s technical design: technical constraints can be enforced using
Rust&#39;s trait system, for example -- a form of behavioral modeling.

By leveraging features such as traits to enforce behavioral correctness
of the code you are writing, you wind up having a much better
vulnerability posture *overall*, not just in the area of memory safety.
This is the reason why refactoring software to use code written in Rust
and other modern languages with these features is advantageous.

This is a far more interesting story than the talking points about
memory safety I hear.  At this point, with features such as `FORTIFY`
and Address Sanitization, it is possible to address memory safety
defects without having to go to such lengths to refactor pre-existing
code.

Features like ASan do not even have to carry significant runtime
performance penalties.  To illustrate my point, Justine Tunney proposed
building a modified version of Alpine with ASan enabled in 2021 using
a production-tuned variant of [her ASan runtime included in her
Cosmopolitan libc project][cosmo-asan].  It was estimated that enabling
ASan in conjunction with this variant of her ASan runtime would only
result in a 3 to 5% performance reduction over code that did not have
ASan enabled.  Adopting this work would have immediately derisked the
use of memory unsafe code in all packages as they would be built with
ASan by default.

   [cosmo-asan]: https://github.com/jart/cosmopolitan/blob/master/libc/intrin/asan.c

And, of course, even with the borrow checker, and traits, and type
enforcement, and the other code verification features provided by the
Rust compiler, you still have `unsafe{}` blocks, and the Rust compiler
provides support for ASan as a mitigation for these blocks.  So you
*still* really need ASan even in a memory safe world, because even when
you build such a thing with perfect memory safe abstractions over a
memory unsafe world, you really are still building on top of a memory
unsafe world.

The point here isn&#39;t that these abstractions are meaningless.  They do
provide significant harm reduction when working with otherwise memory
unsafe interfaces, but even the most perfect abstraction is still, by
its very nature of being an abstraction, leaky.  Instead, we should
recognize *why* Rust improves memory safety, and how the techniques
which improve memory safety can also be used to enforce elements of
the underlying software&#39;s design at compile time.  This is a much
better story than the handwaving I usually see about memory safety
from advocates.
</source:markdown>
    </item>
    
    <item>
      <title>Writing portable ARM64 assembly</title>
      <link>https://ariadne.space/2023/04/12/writing-portable-arm-assembly.html</link>
      <pubDate>Wed, 12 Apr 2023 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2023/04/13/writing-portable-arm-assembly.html</guid>
      <description>&lt;p&gt;An unfortunate side effect of the rising popularity of Apple&amp;rsquo;s ARM-based
computers is an increase in unportable assembly code which targets the
64-bit ARM ISA.  This is because developers are writing these bits of
assembly code to speed up their programs when run on Apple&amp;rsquo;s ARM-based
computers, without considering the other 64-bit ARM devices out there,
such as SBCs and servers running Linux or BSD.&lt;/p&gt;
&lt;p&gt;The good news is that it is very easy to write assembly which targets
Apple&amp;rsquo;s computers as well as the other 64-bit ARM devices running
operating systems other than Darwin.  It just requires being aware of
a few differences between the Mach-O and ELF ABIs, as well as knowing
what Apple-specific syntax extensions to avoid.  By following the
guidance in this blog, you will be able to write assembly code which
is portable between Apple&amp;rsquo;s toolchain, the official ARM assembly
toolchain, and the GNU toolchain.&lt;/p&gt;
&lt;h2 id=&#34;differences-between-the-elf-and-mach-o-abis&#34;&gt;Differences between the ELF and Mach-O ABIs&lt;/h2&gt;
&lt;p&gt;Modern UNIX systems, including Linux-based systems largely use the
&lt;a href=&#34;https://en.wikipedia.org/wiki/Executable_and_Linkable_Format&#34;&gt;ELF binary format&lt;/a&gt;.  Apple uses &lt;a href=&#34;https://en.wikipedia.org/wiki/Mach-O&#34;&gt;Mach-O&lt;/a&gt; in Darwin
instead for historical reasons.  This is not a requirement for Apple
imposed by their use of Mach, indeed, OSFMK, the kernel that Darwin,
MkLinux and OSF/1 are all based on, supports ELF binaries just fine.
Apple just decided to use the Mach-O format instead.&lt;/p&gt;
&lt;p&gt;When it comes to writing assembly (or, really, just linking code
in general) targeting Darwin, the main difference to be aware of is
that all symbols are prefixed with a single underscore.  For example,
if you have a function that would be declared in C like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-c&#34; data-lang=&#34;c&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;extern&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;unmask&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;payload, &lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;mask, size_t len);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;On Darwin, the function in your assembly code must be defined as &lt;code&gt;_unmask&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The other major difference is that ELF defines different classes of
data, for example &lt;code&gt;STT_FUNC&lt;/code&gt; and &lt;code&gt;STT_OBJECT&lt;/code&gt;.  There is no equivalence
in Mach-O, and thus the &lt;code&gt;.type&lt;/code&gt; directive that you would use when writing
assembly for ELF targets is not supported.&lt;/p&gt;
&lt;h3 id=&#34;a-brief-note-on-platform-abis&#34;&gt;A brief note on Platform ABIs&lt;/h3&gt;
&lt;p&gt;You will also need to be aware of minor differences between the Darwin
ABI and other platform ABIs.  A notable example is that the &lt;code&gt;x18&lt;/code&gt;
register is reserved by the Darwin ABI and is explicitly zeroed on
context switches in some cases.  This register is also reserved on
Android, but not on GNU/Linux or Alpine.&lt;/p&gt;
&lt;h2 id=&#34;apple-specific-vector-mnemonics&#34;&gt;Apple-specific vector mnemonics&lt;/h2&gt;
&lt;p&gt;The other main thing to watch out for is Apple&amp;rsquo;s custom mnemonics for
NEON.  In order to make writing NEON code less cumbersome, Apple
introduced a set of mnemonics that allow simplification of specifying
NEON instructions.  For example, if you are targeting Apple devices
only, you might write an exclusive-or NEON instruction like so:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-asm&#34; data-lang=&#34;asm&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;eor.16b&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;v2&lt;/span&gt;, &lt;span style=&#34;color:#66d9ef&#34;&gt;v2&lt;/span&gt;, &lt;span style=&#34;color:#66d9ef&#34;&gt;v0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is an Apple-specific extension to the ARM assembly syntax.  The
&lt;a href=&#34;https://developer.arm.com/documentation/dui0802/b/A64-SIMD-Vector-Instructions/EOR--vector-&#34;&gt;official ARM assembly manual&lt;/a&gt; specifies that the memory layout
must be specified for each register:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-asm&#34; data-lang=&#34;asm&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;eor&lt;/span&gt;     &lt;span style=&#34;color:#66d9ef&#34;&gt;v2.16b&lt;/span&gt;, &lt;span style=&#34;color:#66d9ef&#34;&gt;v2.16b&lt;/span&gt;, &lt;span style=&#34;color:#66d9ef&#34;&gt;v0.16b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;abstracting-the-abi-details-with-some-macros&#34;&gt;Abstracting the ABI details with some macros&lt;/h2&gt;
&lt;p&gt;The good news is that the ABI details can easily be abstracted with a
few macros.  As for using NEON functions, the answer is simple: stick to
what the ARM manual says to do, rather than using Apple&amp;rsquo;s mnemonics.&lt;/p&gt;
&lt;p&gt;There are two macros that you need.  These can be placed in a header
file somewhere if wanted.&lt;/p&gt;
&lt;p&gt;The first macro allows you to deal with the underscore requirement of the
Darwin ABI:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-c&#34; data-lang=&#34;c&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#ifdef __APPLE__
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# define PROC_NAME(__proc) _ ## __proc
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#else
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# define PROC_NAME(__proc) __proc
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#endif
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The second macro is optional, but it allows you to define the correct
ELF symbol types outside of Apple&amp;rsquo;s toolchain:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-c&#34; data-lang=&#34;c&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#ifdef __clang__
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# define TYPE(__proc, __typ)
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#else
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# define TYPE(__proc, __typ) .type __proc, __typ
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#endif
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then you just write your assembly as normal, but using these macros:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-asm&#34; data-lang=&#34;asm&#34;&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;.global&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;PROC_NAME&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;unmask&lt;/span&gt;)
&lt;span style=&#34;color:#a6e22e&#34;&gt;.align&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;TYPE&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;unmask&lt;/span&gt;, &lt;span style=&#34;color:#960050;background-color:#1e0010&#34;&gt;@&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;function&lt;/span&gt;)
&lt;span style=&#34;color:#a6e22e&#34;&gt;PROC_NAME&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;unmask&lt;/span&gt;):
   &lt;span style=&#34;color:#a6e22e&#34;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And that&amp;rsquo;s all there is to it.  As long as you follow these guidelines,
you will have assembly which is portable to any UNIX-like environment on
64-bit ARM.&lt;/p&gt;
</description>
      <source:markdown>
An unfortunate side effect of the rising popularity of Apple&#39;s ARM-based
computers is an increase in unportable assembly code which targets the
64-bit ARM ISA.  This is because developers are writing these bits of
assembly code to speed up their programs when run on Apple&#39;s ARM-based
computers, without considering the other 64-bit ARM devices out there,
such as SBCs and servers running Linux or BSD.

The good news is that it is very easy to write assembly which targets
Apple&#39;s computers as well as the other 64-bit ARM devices running
operating systems other than Darwin.  It just requires being aware of
a few differences between the Mach-O and ELF ABIs, as well as knowing
what Apple-specific syntax extensions to avoid.  By following the
guidance in this blog, you will be able to write assembly code which
is portable between Apple&#39;s toolchain, the official ARM assembly
toolchain, and the GNU toolchain.

## Differences between the ELF and Mach-O ABIs

Modern UNIX systems, including Linux-based systems largely use the
[ELF binary format][elf].  Apple uses [Mach-O][mach-o] in Darwin
instead for historical reasons.  This is not a requirement for Apple
imposed by their use of Mach, indeed, OSFMK, the kernel that Darwin,
MkLinux and OSF/1 are all based on, supports ELF binaries just fine.
Apple just decided to use the Mach-O format instead.

   [elf]: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
   [mach-o]: https://en.wikipedia.org/wiki/Mach-O

When it comes to writing assembly (or, really, just linking code
in general) targeting Darwin, the main difference to be aware of is
that all symbols are prefixed with a single underscore.  For example,
if you have a function that would be declared in C like:

```c
extern void unmask(const char *payload, const char *mask, size_t len);
```
On Darwin, the function in your assembly code must be defined as `_unmask`.

The other major difference is that ELF defines different classes of
data, for example `STT_FUNC` and `STT_OBJECT`.  There is no equivalence
in Mach-O, and thus the `.type` directive that you would use when writing
assembly for ELF targets is not supported.

### A brief note on Platform ABIs

You will also need to be aware of minor differences between the Darwin
ABI and other platform ABIs.  A notable example is that the `x18`
register is reserved by the Darwin ABI and is explicitly zeroed on
context switches in some cases.  This register is also reserved on
Android, but not on GNU/Linux or Alpine.

## Apple-specific vector mnemonics

The other main thing to watch out for is Apple&#39;s custom mnemonics for
NEON.  In order to make writing NEON code less cumbersome, Apple
introduced a set of mnemonics that allow simplification of specifying
NEON instructions.  For example, if you are targeting Apple devices
only, you might write an exclusive-or NEON instruction like so:

```asm
eor.16b v2, v2, v0
```
This is an Apple-specific extension to the ARM assembly syntax.  The
[official ARM assembly manual][armasm] specifies that the memory layout
must be specified for each register:

```asm
eor     v2.16b, v2.16b, v0.16b
```
   [armasm]: https://developer.arm.com/documentation/dui0802/b/A64-SIMD-Vector-Instructions/EOR--vector-

## Abstracting the ABI details with some macros

The good news is that the ABI details can easily be abstracted with a
few macros.  As for using NEON functions, the answer is simple: stick to
what the ARM manual says to do, rather than using Apple&#39;s mnemonics.

There are two macros that you need.  These can be placed in a header
file somewhere if wanted.

The first macro allows you to deal with the underscore requirement of the
Darwin ABI:

```c
#ifdef __APPLE__
# define PROC_NAME(__proc) _ ## __proc
#else
# define PROC_NAME(__proc) __proc
#endif
```
The second macro is optional, but it allows you to define the correct
ELF symbol types outside of Apple&#39;s toolchain:

```c
#ifdef __clang__
# define TYPE(__proc, __typ)
#else
# define TYPE(__proc, __typ) .type __proc, __typ
#endif
```
Then you just write your assembly as normal, but using these macros:

```asm
.global PROC_NAME(unmask)
.align 2
TYPE(unmask, @function)
PROC_NAME(unmask):
   ...
```
And that&#39;s all there is to it.  As long as you follow these guidelines,
you will have assembly which is portable to any UNIX-like environment on
64-bit ARM.
</source:markdown>
    </item>
    
    <item>
      <title>Help migrate a community from Discord to something else</title>
      <link>https://ariadne.space/2023/03/07/help-migrate-a-community-from.html</link>
      <pubDate>Tue, 07 Mar 2023 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2023/03/08/help-migrate-a-community-from.html</guid>
      <description>&lt;p&gt;During the height of the pandemic, I set up a community using Discord.
Since then, it has evolved into being one of the most active (yet tight-knit) technical
communities on Discord: members ranging from all around the world and from all sorts of
technical and social backgrounds participate in conversations every day on a variety of
topics.&lt;/p&gt;
&lt;h2 id=&#34;why-leave-discord&#34;&gt;Why leave Discord?&lt;/h2&gt;
&lt;p&gt;The current situation sounds pretty good, right?
Well, as Richard Stallman warned, &lt;a href=&#34;https://www.gnu.org/philosophy/who-does-that-server-really-serve.en.html&#34;&gt;proprietary services masquerading as software&lt;/a&gt;
do not necessarily act on behalf of the user.&lt;/p&gt;
&lt;p&gt;In this specific case, despite paying money to Discord for its services, there have been
many instances where it has been transparently obvious to myself and the rest of our team
that Discord is not really acting in the interest of our community.
Some examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Discord has banned the accounts of several community members over the past 18 months.
When pressed on the issue, they usually have no viable explanation for why they took
that specific action.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Discord has rolled out automated moderation features which were configured with very
aggressive defaults, and enabled by default.
These features have also had bugs, and when pressing support on those issues, our
mileage has varied.
We have been able to mostly disable the &amp;ldquo;auto-mod&amp;rdquo; features that were obnoxiously
intrusive, however.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Discord&amp;rsquo;s leadership team has speculated on introducing functionality that is not
aligned with the interests of our community, such as NFT support.
They later rolled this speculation back after too many users complained.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On March 27th, Discord plans to roll out a &lt;a href=&#34;https://discord.com/privacy&#34;&gt;new Privacy Policy&lt;/a&gt; which,
among other things, grants them the right to record video calls without consent.
It is very likely that they plan to do this in order to enforce Content ID type
restrictions on the content being shared in Discord-using communities.&lt;/p&gt;
&lt;p&gt;Although the community I started does not frequently share content which would run
afoul of these issues, Content ID systems can be easily fooled into reacting to
content which does not violate any copyright, such as ambient noises.
On top of this, the community in question engages in a lot of activism on various
topics intersectional to the world we exist in.
Discord being able to step into and monitor video calls in our community is
therefore entirely unacceptable from a security point of view.&lt;/p&gt;
&lt;p&gt;Should we have chosen Discord for this community given its&#39; security needs?
Most likely not, but at the time that this community was established, the
current reality was not envisioned.
Had we intended to build a community explicitly for the activities that it
has chosen to engage in, we would likely have avoided Discord.
But at the same time, the Discord UX is likely attributable for some of the
success of the community: it is simple and largely optimized for the
multi-device reality we now live in.&lt;/p&gt;
&lt;h2 id=&#34;matrix&#34;&gt;Matrix?&lt;/h2&gt;
&lt;p&gt;During the height of the pandemic, FOSDEM was held virtually on Matrix-based
infrastructure.
The combination of multimedia experiences and text-based chat looked to be
marginally competitive to the formula that Discord provides communities.
For this reason, Matrix is at the top of a short list of alternatives we are
considering.&lt;/p&gt;
&lt;p&gt;But we have questions and concerns.
This is where a helpful advocate from the Matrix community who is familiar
with the Trust &amp;amp; Safety aspects of the protocol would be welcome.
I would even be willing to pay a reasonable consultancy fee for real answers
to these concerns.&lt;/p&gt;
&lt;p&gt;The main concern we have is one of safety.
Much like with the fediverse, there are homeservers run by persons known to
be a security threat to our community.
We need a robust solution for keeping those homeservers defederated from the
rooms hosted on our own homeserver.&lt;/p&gt;
&lt;p&gt;We are told that Synapse has an integration with Mjolnir to allow for this,
but we would prefer to use Dendrite.
Does Dendrite offer a similar integration with Mjolnir?
Also, with a bot making real-time policy decisions on what room invitations
and joins are allowed, is it possible to have physical redundancy for
Mjolnir?&lt;/p&gt;
&lt;p&gt;Building on that question, availability in general is another major area
of concern.
Is it possible to have multiple instances of Dendrite running at once in
a geo-distributed fashion?
It is okay to assume that we would be using Spanner or similar to manage
the database replication to support that.&lt;/p&gt;
&lt;p&gt;The next major concern is CSAM.
When we launched our Mastodon instance, we had an incident where a user
uploaded CSAM to our instance.
We have heard that it is possible to convince homeservers to blindly
cache CSAM from other homeservers.
What mitigations exist for this issue?&lt;/p&gt;
&lt;p&gt;Finally, the last remaining question is how to integrate this into our
infrastructure.
For reference, we use Kubernetes to manage our services, with Traefik
acting both as Ingress and as a Service Mesh.
Previously we used Knative to manage some services such as Mastodon Web.
Is there any advantage to using Knative to manage a Matrix homeserver?
(We assume there is not.)&lt;/p&gt;
&lt;h2 id=&#34;something-else&#34;&gt;Something else?&lt;/h2&gt;
&lt;p&gt;We are also open to using something other than Matrix.
But it needs to be something that we can manage ourselves at the
infrastructure level.
We already have been burned by Discord, we&amp;rsquo;re not interested in being
burned by another service.
It also needs to provide a similar end-to-end user experience as
Discord.
From what I can find, Matrix is the only project out there that is
able to meet those requirements.&lt;/p&gt;
&lt;p&gt;But I would be happy to hear about alternatives which have done
similarly well at getting over the hump, where network effect is no
longer a serious concern.&lt;/p&gt;
&lt;p&gt;Reach me at &lt;a href=&#34;mailto:ariadne@dereferenced.org&#34;&gt;ariadne@dereferenced.org&lt;/a&gt; if you have answers to any of
the above questions.  Thanks!&lt;/p&gt;
</description>
      <source:markdown>
During the height of the pandemic, I set up a community using Discord.
Since then, it has evolved into being one of the most active (yet tight-knit) technical
communities on Discord: members ranging from all around the world and from all sorts of
technical and social backgrounds participate in conversations every day on a variety of
topics.

## Why leave Discord?

The current situation sounds pretty good, right?
Well, as Richard Stallman warned, [proprietary services masquerading as software][gnu-saass]
do not necessarily act on behalf of the user.

   [gnu-saass]: https://www.gnu.org/philosophy/who-does-that-server-really-serve.en.html

In this specific case, despite paying money to Discord for its services, there have been
many instances where it has been transparently obvious to myself and the rest of our team
that Discord is not really acting in the interest of our community.
Some examples:

* Discord has banned the accounts of several community members over the past 18 months.
  When pressed on the issue, they usually have no viable explanation for why they took
  that specific action.

* Discord has rolled out automated moderation features which were configured with very
  aggressive defaults, and enabled by default.
  These features have also had bugs, and when pressing support on those issues, our
  mileage has varied.
  We have been able to mostly disable the &#34;auto-mod&#34; features that were obnoxiously
  intrusive, however.

* Discord&#39;s leadership team has speculated on introducing functionality that is not
  aligned with the interests of our community, such as NFT support.
  They later rolled this speculation back after too many users complained.

On March 27th, Discord plans to roll out a [new Privacy Policy][new-pp] which,
among other things, grants them the right to record video calls without consent.
It is very likely that they plan to do this in order to enforce Content ID type
restrictions on the content being shared in Discord-using communities.

Although the community I started does not frequently share content which would run
afoul of these issues, Content ID systems can be easily fooled into reacting to
content which does not violate any copyright, such as ambient noises.
On top of this, the community in question engages in a lot of activism on various
topics intersectional to the world we exist in.
Discord being able to step into and monitor video calls in our community is
therefore entirely unacceptable from a security point of view.

   [new-pp]: https://discord.com/privacy

Should we have chosen Discord for this community given its&#39; security needs?
Most likely not, but at the time that this community was established, the
current reality was not envisioned.
Had we intended to build a community explicitly for the activities that it
has chosen to engage in, we would likely have avoided Discord.
But at the same time, the Discord UX is likely attributable for some of the
success of the community: it is simple and largely optimized for the
multi-device reality we now live in.

## Matrix?

During the height of the pandemic, FOSDEM was held virtually on Matrix-based
infrastructure.
The combination of multimedia experiences and text-based chat looked to be
marginally competitive to the formula that Discord provides communities.
For this reason, Matrix is at the top of a short list of alternatives we are
considering.

But we have questions and concerns.
This is where a helpful advocate from the Matrix community who is familiar
with the Trust &amp; Safety aspects of the protocol would be welcome.
I would even be willing to pay a reasonable consultancy fee for real answers
to these concerns.

The main concern we have is one of safety.
Much like with the fediverse, there are homeservers run by persons known to
be a security threat to our community.
We need a robust solution for keeping those homeservers defederated from the
rooms hosted on our own homeserver.

We are told that Synapse has an integration with Mjolnir to allow for this,
but we would prefer to use Dendrite.
Does Dendrite offer a similar integration with Mjolnir?
Also, with a bot making real-time policy decisions on what room invitations
and joins are allowed, is it possible to have physical redundancy for
Mjolnir?

Building on that question, availability in general is another major area
of concern.
Is it possible to have multiple instances of Dendrite running at once in
a geo-distributed fashion?
It is okay to assume that we would be using Spanner or similar to manage
the database replication to support that.

The next major concern is CSAM.
When we launched our Mastodon instance, we had an incident where a user
uploaded CSAM to our instance.
We have heard that it is possible to convince homeservers to blindly
cache CSAM from other homeservers.
What mitigations exist for this issue?

Finally, the last remaining question is how to integrate this into our
infrastructure.
For reference, we use Kubernetes to manage our services, with Traefik
acting both as Ingress and as a Service Mesh.
Previously we used Knative to manage some services such as Mastodon Web.
Is there any advantage to using Knative to manage a Matrix homeserver?
(We assume there is not.)

## Something else?

We are also open to using something other than Matrix.
But it needs to be something that we can manage ourselves at the
infrastructure level.
We already have been burned by Discord, we&#39;re not interested in being
burned by another service.
It also needs to provide a similar end-to-end user experience as
Discord.
From what I can find, Matrix is the only project out there that is
able to meet those requirements.

But I would be happy to hear about alternatives which have done
similarly well at getting over the hump, where network effect is no
longer a serious concern.

Reach me at &lt;ariadne@dereferenced.org&gt; if you have answers to any of
the above questions.  Thanks!
</source:markdown>
    </item>
    
    <item>
      <title>pkgconf, CVE-2023-24056 and disinformation</title>
      <link>https://ariadne.space/2023/01/23/pkgconf-cve-and-disinformation.html</link>
      <pubDate>Mon, 23 Jan 2023 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2023/01/24/pkgconf-cve-and-disinformation.html</guid>
      <description>&lt;p&gt;Readers will have noticed that two maintenance releases of pkgconf were cut over the weekend,
1.9.4 and 1.8.1 respectively, to address &lt;a href=&#34;https://nvd.nist.gov/vuln/detail/CVE-2023-24056&#34;&gt;CVE-2023-24056&lt;/a&gt;, a pkg-config specific variation
of the now-classic &amp;ldquo;&lt;a href=&#34;https://en.wikipedia.org/wiki/Billion_laughs_attack&#34;&gt;billion laughs attack&lt;/a&gt;&amp;rdquo;.  While fixing software defects is important,
a lot went wrong with how this CVE was reported and the motivations behind its disclosure, and
for my own catharsis, I want to talk about this.&lt;/p&gt;
&lt;h2 id=&#34;the-origin-of-pkgconf&#34;&gt;The origin of &lt;code&gt;pkgconf&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;To hopefully explain why I am so bothered by all of this, let&amp;rsquo;s first understand the history of
pkgconf: a project I began noodling on in March 2011.&lt;/p&gt;
&lt;p&gt;2011 was a particularly rough year for me.  In January, my father was diagnosed with pancreatic
cancer, and declined to disclose this to anyone.  When I came back to Oklahoma to visit my
parents in early March, I walked into my dad&amp;rsquo;s house and found him jaundiced.  I drove him to
the emergency room, and was informed that he only had a few months to live due to the pancreatic
cancer he allowed to progress to stage 4.  This was &lt;em&gt;shocking&lt;/em&gt; to me, especially considering I
was 23 at the time.  The stress of it led to me breaking up with my boyfriend at the time.&lt;/p&gt;
&lt;p&gt;I did the only thing I could do given the situation: spent as much time with him as possible.
The hospital had installed Wi-Fi earlier that year, so I was able to take my computer and work
on my projects while I spent time with him.  This worked out well, because it gave us a common
ground of subjects to talk about: my dad was the person who originally pushed me into getting
involved with software engineering as a profession in the first place.  While he himself never
worked as a software engineer, he developed a number of small utilities and demo programs for
MS-DOS.  Later, he became heavily interested in BSD, and then Slackware.&lt;/p&gt;
&lt;p&gt;During this time period, pkg-config 0.26 was released, which required either a complicated
bootstrap procedure to satisfy the glib2 requirements by hand, or a pre-existing copy of
pkg-config to exist.  Alpine was impacted by this bootstrap problem, and we ultimately decided
to hold back pkg-config on the 0.25 version because the bootstrapping problem was too complex
to solve for the pending release.&lt;/p&gt;
&lt;p&gt;At the same time, I was looking for something, &lt;em&gt;anything&lt;/em&gt; to work on that would serve as a
distraction and conversation piece.  This created an opportunity: I could work on a replacement
pkg-config implementation that did not have the bootstrap requirement that the freedesktop
implementation required.  I began working on pkgconf, specifically the .pc file parsing and
dependency graph walking code, while my dad was in the hospital.  He found talking about it
&lt;em&gt;fascinating&lt;/em&gt;, and so we discussed the various aspects of implementing a parser, and walking
dependency graphs in C.  In a limited way, it was a project we collaborated on, in that I would
write code, tell him about it, and he&amp;rsquo;d point out ways my assumptions probably didn&amp;rsquo;t hold
true.&lt;/p&gt;
&lt;p&gt;After he passed away, I quit working on it for a while, until a few friends of mine decided
to pick it up and experiment with it in Gentoo and FreeBSD.  Sadly, my father passed away in
early April, so he didn&amp;rsquo;t get to see the first viable release, or to see pkgconf integrated
into Linux distributions.&lt;/p&gt;
&lt;h2 id=&#34;maintaining-a-production-quality-build-tool-at-scale&#34;&gt;Maintaining a production-quality build tool at scale&lt;/h2&gt;
&lt;p&gt;These days, pkgconf is basically everywhere.  It is the default pkg-config implementation in
every mainstream Linux distribution except Ubuntu.  It is used heavily in embedded Linux
development and in plenty of other scenarios.  My distfiles server, &lt;code&gt;distfiles.dereferenced.org&lt;/code&gt;,
logs dozens of pkgconf downloads every second of the day.&lt;/p&gt;
&lt;p&gt;The success of pkgconf is not without its problems though.  There are aspects of the software
which, given what I know today, I would probably implement substantially differently.  The
technical debt is real.  I&amp;rsquo;ve been working, however, as time permits, to improve these problems
in the &lt;code&gt;pkgconf-1.9.x&lt;/code&gt; release series.&lt;/p&gt;
&lt;p&gt;But when pkgconf does something which is unexpected, and breaks a user&amp;rsquo;s build&amp;hellip; those
interactions are rarely fun.  Many times, the user with the issue shows up on the issue
tracker, or worse, my personal inbox in a bad mood, which results in a triage experience
that is suboptimal for everyone involved.  Thankfully, this doesn&amp;rsquo;t happen so much
anymore, as we have worked hard to balance compatibility and developer-friendly output
from the tool.&lt;/p&gt;
&lt;p&gt;But as smooth as things are these days, maintaining a production build tool imposes a lot
of burden that you cannot begin to expect until you&amp;rsquo;ve done it before.  It is not enough
to simply tell a user that the framework he is using is doing things wrong, for example,
underspecifying its dependencies.  You must consider &amp;ldquo;self-service&amp;rdquo; features: ones which
allow the user to diagnose the issues in his build and correct them himself.  By doing
so, you provide the user with a good experience, and keep support requests from annoyed
users much lower.  All of this has to be designed and implemented in production build
tools.&lt;/p&gt;
&lt;h2 id=&#34;the-appearance-of-competition&#34;&gt;The appearance of &amp;ldquo;competition&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;The past weekend has been a wild ride for me.  I recently moved to Seattle, and have
been getting settled in.  A few people brought &lt;a href=&#34;https://nullprogram.com/blog/2023/01/18/&#34;&gt;u-config: a new, lean pkg-config clone&lt;/a&gt;
to my attention.  At first, I shrugged it off, and mostly would have continued to do
so.  An implementation of pkg-config on Windows would be good for me, personally, as
I do not develop pkgconf on Windows, and different people who contribute to the
maintenance of pkgconf&amp;rsquo;s Windows support have different goals.  This has led to some
significant fragmentation of pkgconf on the Windows side, with different tools bundling
it supporting specific aspects of the pkg-config format in different ways.&lt;/p&gt;
&lt;p&gt;I have a number of social and technical observations about u-config.  Some good,
some not so good.  To start off with the social aspects: I don&amp;rsquo;t particularly
appreciate the level of aggression directed toward pkgconf.  While that alone would
not normally be a turn-off for me (one has to have a reasonably thick skin when
being a FOSS maintainer), casually dropping the &amp;ldquo;billion laughs&amp;rdquo; 0day with a snyde
comment about how we should use ASan (we do) when developing pkgconf was too much,
and the bug itself (a mistake in accounting for available buffer space during variable
expansion) was overstated.&lt;/p&gt;
&lt;p&gt;There is a lot of good things about u-config.  By focusing on only the minimally
required functionality, the author was able to write an excellent tool which has
the potential to someday be a replacement to pkgconf.  I am open to talking about
such a deprecation, even.&lt;/p&gt;
&lt;p&gt;However, after the initial blogpost (which contained disinformation about both
freedesktop pkg-config &lt;em&gt;and&lt;/em&gt; pkgconf), there was additional disinformation from
another person who is enthusiastic about the u-config project.  Notably, he
submitted a patch, which amongst other things, could be misinterpreted by readers
to conclude that &lt;code&gt;pkgconf&lt;/code&gt; does not consider &lt;code&gt;/usr/include&lt;/code&gt; as a system include
path.  When configured correctly, it definitely does.  For example, on Alpine Linux:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pestilence:~$ pkgconf --dump-personality
Triplet: default
DefaultSearchPaths: /usr/lib/pkgconfig /usr/share/pkgconfig
SystemIncludePaths: /usr/include
SystemLibraryPaths: /usr/lib 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But this &lt;a href=&#34;https://github.com/skeeto/u-config/commit/c069c94d77e1381cf7d67b8283601c5e79a91534#diff-c1f8e1880984a1a513fbb1c1191ea62910de9f1656c89f30d41609fb7317080bR1563&#34;&gt;particular disinformation was merged by the author of the software&lt;/a&gt;, without
regard for checking the comment for disinformation, despite how absurd it would be
if it were true.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update (28 January 2023):&lt;/em&gt; Since the initial publication of this blog, the comment
introduced in the above patch has been corrected to reflect a specific edge case
relating to &lt;code&gt;-I/usr/include&lt;/code&gt; verses &lt;code&gt;-I /usr/include&lt;/code&gt;.  I believe the discrepancy
in the handling of both fragments to be a bug, one which was not reported to me,
but rather discussed only in the source code comment.  The contributor of the patch
in question to u-config, in particular, has pointed the fact that they later changed
the source code comment to clarify the issue, as part of an attempt to deflect from
the point of this blog: discussing how the u-config author and contributors have
chosen to engage in bad faith with other pkg-config implementations (especially
pkgconf) from the beginning of their project.  While I plan to fix the non-reported
discrepancy in the next pkgconf release, I will note that the u-config authors have
so far &lt;a href=&#34;https://github.com/skeeto/u-config/blob/7b5d32f/u-config.c#L1679-L1686&#34;&gt;chosen to not handle this edge case&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;pkg-config-implementations-do-specific-things-for-a-reason&#34;&gt;&lt;code&gt;pkg-config&lt;/code&gt; implementations do specific things for a reason&lt;/h2&gt;
&lt;p&gt;In the UNIX environment, the behavior of the system toolchain is static and
must be well-defined.  Tools which act adjacently to the system C toolchain
must behave in ways which are aware of how the C toolchain is configured
to behave.  This is why &lt;code&gt;pkgconf&lt;/code&gt; checks several different environment
variables to learn about how the system toolchain has been configured, and
what deviations, if any, have been configured via the environment.&lt;/p&gt;
&lt;p&gt;A frequent patten in UNIX pkg-config files is to write things like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;prefix=/usr
includedir=${prefix}/include
libdir=${prefix}/lib
Package: whatever
Version: 0
Cflags: -I${includedir}
Libs: -L${libdir} -lwhatever
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;On Windows, &lt;code&gt;pkg-config&lt;/code&gt; implementations have &lt;code&gt;--define-prefix&lt;/code&gt;, which is
used to override the &lt;code&gt;${prefix}&lt;/code&gt; variable for this reason.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;pkg-config&lt;/code&gt; is not aware of &lt;code&gt;/usr/include&lt;/code&gt; being a &lt;em&gt;system&lt;/em&gt; include path,
then a disaster can happen when querying for multiple dependencies at the same
time.  Consider this other pkg-config file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;prefix=/usr
includedir=${prefix}/include/OtherLib
libdir=${prefix}/lib
Package: OtherLib
Version: 0
Cflags: -I${includedir}
Libs: -L${libdir} -lother
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now lets say that &lt;code&gt;OtherLib&lt;/code&gt; has a &lt;code&gt;/usr/include/OtherLib/math.h&lt;/code&gt; file which
uses &lt;code&gt;#include_next&lt;/code&gt; to enhance the &lt;code&gt;math.h&lt;/code&gt; header.  A real-world example of
a library which does this is &lt;code&gt;libbsd&lt;/code&gt;.  Well, if you query pkg-config with
&lt;code&gt;pkg-config --cflags --libs whatever OtherLib&lt;/code&gt;, then you will get:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pestilence:~$ pkgconf --with-path=examples/ --personality=examples/broken.personality whatever OtherLib
-I/usr/include -I/usr/include/OtherLib -lwhatever -lother
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This means that &lt;code&gt;/usr/include/math.h&lt;/code&gt; will be preferred over &lt;code&gt;/usr/include/OtherLib/math.h&lt;/code&gt;,
and your build will fail.&lt;/p&gt;
&lt;p&gt;So this type of filtering, and the other types of filtering that pkgconf does, is very important
in the UNIX environment.  The author of u-config will unfortunately have to learn these things
one by one as users come to him with bug reports.&lt;/p&gt;
&lt;p&gt;There is probably an alternate reality where u-config and pkgconf work together to deprecate pkgconf,
and someday I hope that will be the reality here.  But until the disinformation and putdowns are
addressed, it will unfortunately be impossible to collaborate.&lt;/p&gt;
&lt;p&gt;Anyway, if you got through all of this, thanks for reading, I guess.&lt;/p&gt;
</description>
      <source:markdown>
Readers will have noticed that two maintenance releases of pkgconf were cut over the weekend,
1.9.4 and 1.8.1 respectively, to address [CVE-2023-24056][cve], a pkg-config specific variation
of the now-classic &#34;[billion laughs attack][bla]&#34;.  While fixing software defects is important,
a lot went wrong with how this CVE was reported and the motivations behind its disclosure, and
for my own catharsis, I want to talk about this.

  [cve]: https://nvd.nist.gov/vuln/detail/CVE-2023-24056
  [bla]: https://en.wikipedia.org/wiki/Billion_laughs_attack

## The origin of `pkgconf`

To hopefully explain why I am so bothered by all of this, let&#39;s first understand the history of
pkgconf: a project I began noodling on in March 2011.

2011 was a particularly rough year for me.  In January, my father was diagnosed with pancreatic
cancer, and declined to disclose this to anyone.  When I came back to Oklahoma to visit my
parents in early March, I walked into my dad&#39;s house and found him jaundiced.  I drove him to
the emergency room, and was informed that he only had a few months to live due to the pancreatic
cancer he allowed to progress to stage 4.  This was *shocking* to me, especially considering I
was 23 at the time.  The stress of it led to me breaking up with my boyfriend at the time.

I did the only thing I could do given the situation: spent as much time with him as possible.
The hospital had installed Wi-Fi earlier that year, so I was able to take my computer and work
on my projects while I spent time with him.  This worked out well, because it gave us a common
ground of subjects to talk about: my dad was the person who originally pushed me into getting
involved with software engineering as a profession in the first place.  While he himself never
worked as a software engineer, he developed a number of small utilities and demo programs for
MS-DOS.  Later, he became heavily interested in BSD, and then Slackware.

During this time period, pkg-config 0.26 was released, which required either a complicated
bootstrap procedure to satisfy the glib2 requirements by hand, or a pre-existing copy of
pkg-config to exist.  Alpine was impacted by this bootstrap problem, and we ultimately decided
to hold back pkg-config on the 0.25 version because the bootstrapping problem was too complex
to solve for the pending release.

At the same time, I was looking for something, *anything* to work on that would serve as a
distraction and conversation piece.  This created an opportunity: I could work on a replacement
pkg-config implementation that did not have the bootstrap requirement that the freedesktop
implementation required.  I began working on pkgconf, specifically the .pc file parsing and
dependency graph walking code, while my dad was in the hospital.  He found talking about it
*fascinating*, and so we discussed the various aspects of implementing a parser, and walking
dependency graphs in C.  In a limited way, it was a project we collaborated on, in that I would
write code, tell him about it, and he&#39;d point out ways my assumptions probably didn&#39;t hold
true.

After he passed away, I quit working on it for a while, until a few friends of mine decided
to pick it up and experiment with it in Gentoo and FreeBSD.  Sadly, my father passed away in
early April, so he didn&#39;t get to see the first viable release, or to see pkgconf integrated
into Linux distributions.

## Maintaining a production-quality build tool at scale

These days, pkgconf is basically everywhere.  It is the default pkg-config implementation in
every mainstream Linux distribution except Ubuntu.  It is used heavily in embedded Linux
development and in plenty of other scenarios.  My distfiles server, `distfiles.dereferenced.org`,
logs dozens of pkgconf downloads every second of the day.

The success of pkgconf is not without its problems though.  There are aspects of the software
which, given what I know today, I would probably implement substantially differently.  The
technical debt is real.  I&#39;ve been working, however, as time permits, to improve these problems
in the `pkgconf-1.9.x` release series.

But when pkgconf does something which is unexpected, and breaks a user&#39;s build... those
interactions are rarely fun.  Many times, the user with the issue shows up on the issue
tracker, or worse, my personal inbox in a bad mood, which results in a triage experience
that is suboptimal for everyone involved.  Thankfully, this doesn&#39;t happen so much
anymore, as we have worked hard to balance compatibility and developer-friendly output
from the tool.

But as smooth as things are these days, maintaining a production build tool imposes a lot
of burden that you cannot begin to expect until you&#39;ve done it before.  It is not enough
to simply tell a user that the framework he is using is doing things wrong, for example,
underspecifying its dependencies.  You must consider &#34;self-service&#34; features: ones which
allow the user to diagnose the issues in his build and correct them himself.  By doing
so, you provide the user with a good experience, and keep support requests from annoyed
users much lower.  All of this has to be designed and implemented in production build
tools.

## The appearance of &#34;competition&#34;

The past weekend has been a wild ride for me.  I recently moved to Seattle, and have
been getting settled in.  A few people brought [u-config: a new, lean pkg-config clone][ucblog]
to my attention.  At first, I shrugged it off, and mostly would have continued to do
so.  An implementation of pkg-config on Windows would be good for me, personally, as
I do not develop pkgconf on Windows, and different people who contribute to the
maintenance of pkgconf&#39;s Windows support have different goals.  This has led to some
significant fragmentation of pkgconf on the Windows side, with different tools bundling
it supporting specific aspects of the pkg-config format in different ways.

   [ucblog]: https://nullprogram.com/blog/2023/01/18/

I have a number of social and technical observations about u-config.  Some good,
some not so good.  To start off with the social aspects: I don&#39;t particularly
appreciate the level of aggression directed toward pkgconf.  While that alone would
not normally be a turn-off for me (one has to have a reasonably thick skin when
being a FOSS maintainer), casually dropping the &#34;billion laughs&#34; 0day with a snyde
comment about how we should use ASan (we do) when developing pkgconf was too much,
and the bug itself (a mistake in accounting for available buffer space during variable
expansion) was overstated.

There is a lot of good things about u-config.  By focusing on only the minimally
required functionality, the author was able to write an excellent tool which has
the potential to someday be a replacement to pkgconf.  I am open to talking about
such a deprecation, even.

However, after the initial blogpost (which contained disinformation about both
freedesktop pkg-config *and* pkgconf), there was additional disinformation from
another person who is enthusiastic about the u-config project.  Notably, he
submitted a patch, which amongst other things, could be misinterpreted by readers
to conclude that `pkgconf` does not consider `/usr/include` as a system include
path.  When configured correctly, it definitely does.  For example, on Alpine Linux:

    pestilence:~$ pkgconf --dump-personality
    Triplet: default
    DefaultSearchPaths: /usr/lib/pkgconfig /usr/share/pkgconfig
    SystemIncludePaths: /usr/include
    SystemLibraryPaths: /usr/lib 

But this [particular disinformation was merged by the author of the software][uc-disinfo], without
regard for checking the comment for disinformation, despite how absurd it would be
if it were true.

  [uc-disinfo]: https://github.com/skeeto/u-config/commit/c069c94d77e1381cf7d67b8283601c5e79a91534#diff-c1f8e1880984a1a513fbb1c1191ea62910de9f1656c89f30d41609fb7317080bR1563

*Update (28 January 2023):* Since the initial publication of this blog, the comment
introduced in the above patch has been corrected to reflect a specific edge case
relating to `-I/usr/include` verses `-I /usr/include`.  I believe the discrepancy
in the handling of both fragments to be a bug, one which was not reported to me,
but rather discussed only in the source code comment.  The contributor of the patch
in question to u-config, in particular, has pointed the fact that they later changed
the source code comment to clarify the issue, as part of an attempt to deflect from
the point of this blog: discussing how the u-config author and contributors have
chosen to engage in bad faith with other pkg-config implementations (especially
pkgconf) from the beginning of their project.  While I plan to fix the non-reported
discrepancy in the next pkgconf release, I will note that the u-config authors have
so far [chosen to not handle this edge case][uc-comment-2].

  [uc-comment-2]: https://github.com/skeeto/u-config/blob/7b5d32f/u-config.c#L1679-L1686

## `pkg-config` implementations do specific things for a reason

In the UNIX environment, the behavior of the system toolchain is static and
must be well-defined.  Tools which act adjacently to the system C toolchain
must behave in ways which are aware of how the C toolchain is configured
to behave.  This is why `pkgconf` checks several different environment
variables to learn about how the system toolchain has been configured, and
what deviations, if any, have been configured via the environment.

A frequent patten in UNIX pkg-config files is to write things like:

    prefix=/usr
    includedir=${prefix}/include
    libdir=${prefix}/lib
    Package: whatever
    Version: 0
    Cflags: -I${includedir}
    Libs: -L${libdir} -lwhatever

On Windows, `pkg-config` implementations have `--define-prefix`, which is
used to override the `${prefix}` variable for this reason.

If `pkg-config` is not aware of `/usr/include` being a *system* include path,
then a disaster can happen when querying for multiple dependencies at the same
time.  Consider this other pkg-config file:

    prefix=/usr
    includedir=${prefix}/include/OtherLib
    libdir=${prefix}/lib
    Package: OtherLib
    Version: 0
    Cflags: -I${includedir}
    Libs: -L${libdir} -lother

Now lets say that `OtherLib` has a `/usr/include/OtherLib/math.h` file which
uses `#include_next` to enhance the `math.h` header.  A real-world example of
a library which does this is `libbsd`.  Well, if you query pkg-config with
`pkg-config --cflags --libs whatever OtherLib`, then you will get:

    pestilence:~$ pkgconf --with-path=examples/ --personality=examples/broken.personality whatever OtherLib
    -I/usr/include -I/usr/include/OtherLib -lwhatever -lother

This means that `/usr/include/math.h` will be preferred over `/usr/include/OtherLib/math.h`,
and your build will fail.

So this type of filtering, and the other types of filtering that pkgconf does, is very important
in the UNIX environment.  The author of u-config will unfortunately have to learn these things
one by one as users come to him with bug reports.

There is probably an alternate reality where u-config and pkgconf work together to deprecate pkgconf,
and someday I hope that will be the reality here.  But until the disinformation and putdowns are
addressed, it will unfortunately be impossible to collaborate.

Anyway, if you got through all of this, thanks for reading, I guess.
</source:markdown>
    </item>
    
    <item>
      <title>Building fair webs of trust by leveraging the OCAP model</title>
      <link>https://ariadne.space/2022/12/02/building-fair-webs-of-trust.html</link>
      <pubDate>Fri, 02 Dec 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/12/03/building-fair-webs-of-trust.html</guid>
      <description>&lt;p&gt;Since the beginning of the Internet, determining the trustworthiness
of participants and published information has been a significant
point of contention.
Many systems have been proposed to solve these underlying concerns,
usually pertaining to specific niches and communities, but these
pre-existing solutions are nebulous at best.
How can we build infrastructure for truly democratic Webs of Trust?&lt;/p&gt;
&lt;h2 id=&#34;fairness-in-reputation-based-systems&#34;&gt;Fairness in reputation-based systems&lt;/h2&gt;
&lt;p&gt;When considering the design of a reputation-based system, &lt;em&gt;fairness&lt;/em&gt;
must be paramount, but what is &lt;em&gt;fairness&lt;/em&gt; in this context?
A reputation-based system can be considered &lt;em&gt;fair&lt;/em&gt; if it appropriately
balances the concerns of the data publisher, the data subject, and
the data consumer.
Regulatory frameworks such as the GDPR attempt to provide guidance
concering how this balance can be accomplished in the general sense
of building internet services, but these frameworks are large and
complicated, and as such make it difficult to provide a definition
which is adequate for a reputation-based trust system.&lt;/p&gt;
&lt;p&gt;To understand how these concerns must be balanced, we must understand
the underlying risks for each participant in a reputation-based system:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;data subject&lt;/strong&gt; is at risk of harm to their professional
reputation due to annotations they did not consent to, and mistakes
in those annotations.
This is a problem which has already captured regulatory ire, as I
will explain later.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;data publisher&lt;/strong&gt; is at risk of being sued for defamation due to
the annotations they publish.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;data consumer&lt;/strong&gt; is at risk of being misled by inaccurate
annotations they consume.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A &lt;em&gt;fair&lt;/em&gt; reputation-based system must attempt to provide an adequate
balance between these concerns through active harm reduction in its
design:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The harm to the &lt;strong&gt;data subject&lt;/strong&gt; from misleading annotations can be
reduced by blinding the identity of the data subject.&lt;/li&gt;
&lt;li&gt;The harm to the &lt;strong&gt;data publisher&lt;/strong&gt; from misleading annotations can
also be reduced by blinding the identity of the data subject.&lt;/li&gt;
&lt;li&gt;The harm to the &lt;strong&gt;data consumer&lt;/strong&gt; from misleading annotations can be
reduced by allowing them to consume annotations from multiple sources.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;shinigami-eyes-or-how-designing-for-fairness-can-be-difficult&#34;&gt;Shinigami Eyes, or how designing for fairness can be difficult&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://shinigami-eyes.github.io/&#34;&gt;Shinigami Eyes&lt;/a&gt; browser extension was designed to help people
establish trust in various web resources using a reputation-based system.
In general, the author attempted to make thoughtful choices to ensure
the system was reasonably fair in its design.
However the system has &lt;a href=&#34;https://eyereaper.evelyn.moe/&#34;&gt;a number of flaws, both technical and social&lt;/a&gt;,
which highlight how building systems of trust requires a detailed
understanding concerning how the underlying primitives interact and
the consequences of those interactions.&lt;/p&gt;
&lt;h3 id=&#34;shinigami-eyes-and-blinding&#34;&gt;Shinigami Eyes and Blinding&lt;/h3&gt;
&lt;p&gt;As already noted, a &lt;em&gt;fair&lt;/em&gt; reputation-based system must blind the identity
of the data subject to protect both the data subject and data publisher.
The approach used by Shinigami Eyes was to use a bloom filter constructed
with a 32-bit &lt;a href=&#34;http://www.isthe.com/chongo/tech/comp/fnv/index.html&#34;&gt;&lt;code&gt;FNV-1a&lt;/code&gt; hash&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The FNV family of hashes are a non-cryptographic family of hashes, which
provide scalability up to 1024 bits, which works by performing an XOR of
the current byte&amp;rsquo;s value against the current hash value, then multiplying
that value by the designated FNV prime.
There is an alternate set of FNV hashes which swaps the XOR and
multiplication steps, which is the variant used by Shinigami Eyes.&lt;/p&gt;
&lt;p&gt;The use of a bloom filter is an acceptable blinding method, assuming that
the underlying hash provides sufficient resolution, such as a 256-bit
or 512-bit hash.
Presumably, due to the constraints of having to run as a JavaScript extension,
the weak 32-bit &lt;code&gt;FNV-1a&lt;/code&gt; hash was used instead.
Because of this, while the reputation lists used by Shinigami Eyes were
acceptably blinded, there was an extremely &lt;a href=&#34;https://twitter.com/x0s1jpnq2sk2&#34;&gt;high risk of false positives
caused by hash collisions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Concerns about the technical implementation of the Shinigami Eyes extension
led Datatilsynet, the Norwegian GDPR regulatory agency, to &lt;a href=&#34;https://www.datatilsynet.no/en/news/2021/varsler-forbud-mot-nettleserutvidelsen-shinigami-eyes-i-norge/&#34;&gt;ban the extension&lt;/a&gt;
at the end of 2021, and development of the extension appears to have
ended as a result of their initial inquiry.&lt;/p&gt;
&lt;h2 id=&#34;can-we-build-systems-like-shinigami-eyes-more-robustly&#34;&gt;Can we build systems like Shinigami Eyes more robustly?&lt;/h2&gt;
&lt;p&gt;The main reason why Shinigami Eyes gained attention of Datatilsynet was due to
the centralized nature of the data processing.
Can we build a system which avoids centralized data processing and promotes
democratic participation?
Yes, it is quite easy, but like most things, the challenge will be delivering
a good user experience.&lt;/p&gt;
&lt;h3 id=&#34;leveraging-the-ocap-model-to-build-a-robust-solution&#34;&gt;Leveraging the OCAP model to build a robust solution&lt;/h3&gt;
&lt;p&gt;The largest problem in building this system is ensuring that the published
reputation data is reliably blinded.
To this end, I propose that feeds are a simple dataset containing a set of
blinded hashes and annotations.
The physical representation of the dataset does not matter, though keeping
it as simple as possible will expand the number of places where the data
can be consumed.&lt;/p&gt;
&lt;p&gt;In the Object Capability model, we can think of the physical feed as an
&lt;em&gt;object&lt;/em&gt;, and a blinding key as a &lt;em&gt;capability&lt;/em&gt; to access that object in a
useful way.
You have to have both in order for either to be useful.&lt;/p&gt;
&lt;p&gt;A participant can publish multiple copies of their feed, with different
blinding keys for each friend they wish to share it with, or they can
choose to publish a single key and share the same key with every friend,
or even the public at large.
Users can then choose which feeds they want to use when making trust
decisions from the collection of feeds and blinding keys they have been
given.&lt;/p&gt;
&lt;p&gt;By comparison to Shinigami Eyes, this better satisfies the conditions for
&lt;em&gt;fairness&lt;/em&gt;: there is no risk of a false positive, the contents of the
reputation lists remain private, and publishers can choose to consent to
data sharing requests however they wish.&lt;/p&gt;
&lt;h3 id=&#34;choosing-a-reasonable-set-of-primitives&#34;&gt;Choosing a reasonable set of primitives&lt;/h3&gt;
&lt;p&gt;To build such a system, I would probably personally choose to use
&lt;code&gt;HMAC-SHA3-256&lt;/code&gt; as the blinding primitive.
This provides a good balance between collision protection,
cryptographic strength, and hash resolution.
A scheme which provides less than 256 bits of hash resolution should
be avoided due to the risk of collisions.&lt;/p&gt;
&lt;p&gt;I would distribute the feeds as CSV files.
This would allow users the most flexibility in managing feeds, they
could distribute different feeds with different meanings, and include
extended data alongside the blinded hash as a form of annotation.&lt;/p&gt;
&lt;p&gt;On the client side, I would calculate sets of blinded hashes for each
possible subset of the URI, all the way to the parent domain.
By doing so, it would be possible for feeds to match against a large
number of children URIs instead of having to list them all manually.&lt;/p&gt;
&lt;p&gt;Implementations should store the learned hashes in a &lt;a href=&#34;https://en.wikipedia.org/wiki/Radix_tree&#34;&gt;radix trie&lt;/a&gt;.
This allows the hash lookups to be done in constant time, as well
as allowing for automatic bucketing, which can be helpful for
implementing quorum requirements.&lt;/p&gt;
&lt;h2 id=&#34;things-we-can-build-with-this&#34;&gt;Things we can build with this&lt;/h2&gt;
&lt;p&gt;The use of friend-to-friend reputation-based systems can be powerful.
They provide accountability (as you know who you are getting your
data from) and collaboration (your friends can consume your data in
exchange).&lt;/p&gt;
&lt;p&gt;They can be used in the way Shinigami Eyes was used: to allow interested
parties to identify resources they should trust or distrust, but they can
also be used to enable collaborative blocking amongst friends and system
administrators.&lt;/p&gt;
&lt;p&gt;They can also be used to determine if e-mail domains or URLs inside e-mails
are actually trustworthy.
The possibilities are truly endless.&lt;/p&gt;
</description>
      <source:markdown>
Since the beginning of the Internet, determining the trustworthiness
of participants and published information has been a significant
point of contention.
Many systems have been proposed to solve these underlying concerns,
usually pertaining to specific niches and communities, but these
pre-existing solutions are nebulous at best.
How can we build infrastructure for truly democratic Webs of Trust?

## Fairness in reputation-based systems

When considering the design of a reputation-based system, *fairness*
must be paramount, but what is *fairness* in this context?
A reputation-based system can be considered *fair* if it appropriately
balances the concerns of the data publisher, the data subject, and
the data consumer.
Regulatory frameworks such as the GDPR attempt to provide guidance
concering how this balance can be accomplished in the general sense
of building internet services, but these frameworks are large and
complicated, and as such make it difficult to provide a definition
which is adequate for a reputation-based trust system.

To understand how these concerns must be balanced, we must understand
the underlying risks for each participant in a reputation-based system:

- The **data subject** is at risk of harm to their professional
  reputation due to annotations they did not consent to, and mistakes
  in those annotations.
  This is a problem which has already captured regulatory ire, as I
  will explain later.
- The **data publisher** is at risk of being sued for defamation due to
  the annotations they publish.
- The **data consumer** is at risk of being misled by inaccurate
  annotations they consume.

A *fair* reputation-based system must attempt to provide an adequate
balance between these concerns through active harm reduction in its
design:

- The harm to the **data subject** from misleading annotations can be
  reduced by blinding the identity of the data subject.
- The harm to the **data publisher** from misleading annotations can
  also be reduced by blinding the identity of the data subject.
- The harm to the **data consumer** from misleading annotations can be
  reduced by allowing them to consume annotations from multiple sources.

## Shinigami Eyes, or how designing for fairness can be difficult

The [Shinigami Eyes][se] browser extension was designed to help people
establish trust in various web resources using a reputation-based system.
In general, the author attempted to make thoughtful choices to ensure
the system was reasonably fair in its design.
However the system has [a number of flaws, both technical and social][er],
which highlight how building systems of trust requires a detailed
understanding concerning how the underlying primitives interact and
the consequences of those interactions.

   [se]: https://shinigami-eyes.github.io/
   [er]: https://eyereaper.evelyn.moe/

### Shinigami Eyes and Blinding

As already noted, a *fair* reputation-based system must blind the identity
of the data subject to protect both the data subject and data publisher.
The approach used by Shinigami Eyes was to use a bloom filter constructed
with a 32-bit [`FNV-1a` hash][fnv].

   [fnv]: http://www.isthe.com/chongo/tech/comp/fnv/index.html

The FNV family of hashes are a non-cryptographic family of hashes, which
provide scalability up to 1024 bits, which works by performing an XOR of
the current byte&#39;s value against the current hash value, then multiplying
that value by the designated FNV prime.
There is an alternate set of FNV hashes which swaps the XOR and
multiplication steps, which is the variant used by Shinigami Eyes.

The use of a bloom filter is an acceptable blinding method, assuming that
the underlying hash provides sufficient resolution, such as a 256-bit
or 512-bit hash.
Presumably, due to the constraints of having to run as a JavaScript extension,
the weak 32-bit `FNV-1a` hash was used instead.
Because of this, while the reputation lists used by Shinigami Eyes were
acceptably blinded, there was an extremely [high risk of false positives
caused by hash collisions][collided-account].

   [collided-account]: https://twitter.com/x0s1jpnq2sk2

Concerns about the technical implementation of the Shinigami Eyes extension
led Datatilsynet, the Norwegian GDPR regulatory agency, to [ban the extension][se-ban]
at the end of 2021, and development of the extension appears to have
ended as a result of their initial inquiry.

   [se-ban]: https://www.datatilsynet.no/en/news/2021/varsler-forbud-mot-nettleserutvidelsen-shinigami-eyes-i-norge/

## Can we build systems like Shinigami Eyes more robustly?

The main reason why Shinigami Eyes gained attention of Datatilsynet was due to
the centralized nature of the data processing.
Can we build a system which avoids centralized data processing and promotes
democratic participation?
Yes, it is quite easy, but like most things, the challenge will be delivering
a good user experience.

### Leveraging the OCAP model to build a robust solution

The largest problem in building this system is ensuring that the published
reputation data is reliably blinded.
To this end, I propose that feeds are a simple dataset containing a set of
blinded hashes and annotations.
The physical representation of the dataset does not matter, though keeping
it as simple as possible will expand the number of places where the data
can be consumed.

In the Object Capability model, we can think of the physical feed as an
*object*, and a blinding key as a *capability* to access that object in a
useful way.
You have to have both in order for either to be useful.

A participant can publish multiple copies of their feed, with different
blinding keys for each friend they wish to share it with, or they can
choose to publish a single key and share the same key with every friend,
or even the public at large.
Users can then choose which feeds they want to use when making trust
decisions from the collection of feeds and blinding keys they have been
given.

By comparison to Shinigami Eyes, this better satisfies the conditions for
*fairness*: there is no risk of a false positive, the contents of the
reputation lists remain private, and publishers can choose to consent to
data sharing requests however they wish.

### Choosing a reasonable set of primitives

To build such a system, I would probably personally choose to use
`HMAC-SHA3-256` as the blinding primitive.
This provides a good balance between collision protection,
cryptographic strength, and hash resolution.
A scheme which provides less than 256 bits of hash resolution should
be avoided due to the risk of collisions.

I would distribute the feeds as CSV files.
This would allow users the most flexibility in managing feeds, they
could distribute different feeds with different meanings, and include
extended data alongside the blinded hash as a form of annotation.

On the client side, I would calculate sets of blinded hashes for each 
possible subset of the URI, all the way to the parent domain.
By doing so, it would be possible for feeds to match against a large
number of children URIs instead of having to list them all manually.

Implementations should store the learned hashes in a [radix trie][rt].
This allows the hash lookups to be done in constant time, as well
as allowing for automatic bucketing, which can be helpful for
implementing quorum requirements.

   [rt]: https://en.wikipedia.org/wiki/Radix_tree

## Things we can build with this

The use of friend-to-friend reputation-based systems can be powerful.
They provide accountability (as you know who you are getting your
data from) and collaboration (your friends can consume your data in
exchange).

They can be used in the way Shinigami Eyes was used: to allow interested
parties to identify resources they should trust or distrust, but they can
also be used to enable collaborative blocking amongst friends and system
administrators.

They can also be used to determine if e-mail domains or URLs inside e-mails
are actually trustworthy.
The possibilities are truly endless.
</source:markdown>
    </item>
    
    <item>
      <title>Twitter&#39;s demise is ActivityPub&#39;s future</title>
      <link>https://ariadne.space/2022/11/11/twitters-demise-is-activitypubs-future.html</link>
      <pubDate>Fri, 11 Nov 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/11/12/twitters-demise-is-activitypubs-future.html</guid>
      <description>&lt;p&gt;Earlier today, I deleted all of my tweets and left Twitter forever.  While I
plan on leaving a nightlight thread for a while, I will eventually close my
account, assuming Elon doesn&amp;rsquo;t do it for me.&lt;/p&gt;
&lt;p&gt;The past week has been an emotional rollercoaster for me as I have watched
everything play out.&lt;/p&gt;
&lt;p&gt;I was one of the original fediverse users when Indymedia UK stood up the
&lt;code&gt;indy.im&lt;/code&gt; StatusNet instance at the end of 2010.
After some time, Evan Prodromou got bored with the StatusNet code base
and started Pump instead, with the network losing the largest instance
at that time, &lt;code&gt;identi.ca&lt;/code&gt;.
With the network fragmented as a result of that switch, I got bored of
it and started using Twitter instead.&lt;/p&gt;
&lt;p&gt;Eventually StatusNet was forked by Matt Lee and a few other FSF staffers and
became GNU Social.
I was not really around during this time, but it was around that time that
GamerGate happened, which created a network where half of the users were
Indymedia contributors and the other half were the initial seeds of the
alt-right.&lt;/p&gt;
&lt;p&gt;While I was not heavily involved from a development perspective in the early
days of what we now call the fediverse, this began to change in late 2016 when
Eugen Rochko started Mastodon.
I was an early adopter of Mastodon, deploying Mastodon 0.6 on Heroku, using the
&lt;code&gt;mastodon.dereferenced.org&lt;/code&gt; domain for my account.
But running Mastodon on Heroku (and later Scalingo) was expensive.  I did not
want to manage a Rails application by hand, and I hadn&amp;rsquo;t started using Docker or
Kubernetes yet.&lt;/p&gt;
&lt;p&gt;In early 2018, a developer psuedonymously known as lain began adding ActivityPub
federation support to Pleroma, and he convinced me to try it out as an alternative
to running Mastodon.
I found Pleroma and developing with Elixir to be exciting and fresh, compared to
other technology I was working with at the time.  I felt empowered to start doing
serious hacking on ActivityPub as a result of writing patches to Pleroma and
sending them to lain.&lt;/p&gt;
&lt;p&gt;After a while, I became a Pleroma developer with commit rights.
I felt like we could use the same strategy I used to promote Alpine to promote Pleroma:
build a coalition of willing influencers to demonstrate the value proposition of
self-hosted social networks for user freedom, and so I started working on building
a group around it.
Because I was showing it to friends I already had, Pleroma grew into being a
project where many of the contributors were from queer and marginalized backgrounds
similar to mine.
Everything was going fine.  As a team, we built a lot of features that are still
innovative in this space, such as MRF and building the LitePub profile of ActivityPub,
which shifted the protocol from being a Content &lt;em&gt;Distribution&lt;/em&gt; protocol to being a
Content &lt;em&gt;Advertisement&lt;/em&gt; protocol.&lt;/p&gt;
&lt;p&gt;Towards the end of 2019, it started going to shit.  By that time, I was running a
public instance, and the database kept having index corruption issues on a daily
basis.
Around the same time, the Soapbox project was launched, and they decided to use
Pleroma as their backend.  This led to a lot of friction inside the project, because
the Soapbox author had a tendency to share &lt;a href=&#34;https://blog.alexgleason.me/trans/&#34;&gt;his ideological positions&lt;/a&gt;
inside the project space as part of his anti-trans activism.
I wound up leaving Pleroma toward the middle of 2020 because of the scalability
issues in the database with Pleroma 2.0 and the lack of any effort to maintain a
welcoming space for everyone.&lt;/p&gt;
&lt;p&gt;I decided to take a break from the fediverse because of that decision, because I
felt a break was warranted.  I decided to try Twitter in earnest during that time,
but to be honest, I&amp;rsquo;ve never found using Twitter to be enjoyable in the same way
as I found the fediverse to be enjoyable.&lt;/p&gt;
&lt;p&gt;As I said a few weeks ago, I think that &lt;a href=&#34;https://ariadne.space/2022/10/27/the-internet-is-broken-due-to-structural-injustice/&#34;&gt;commercial microblogging&lt;/a&gt; has been
an absolute disaster for our society.  Relationships on Twitter are parasocial
and transactional, which leads to poisonous behavior, while relationships in the
fediverse are largely grounded and mutual.&lt;/p&gt;
&lt;p&gt;In April of this year, Elon Musk announced his intention to buy Twitter.  Based
on the experience of watching a &lt;a href=&#34;https://ariadne.space/2021/05/20/the-whole-freenode-kerfluffle/&#34;&gt;rich fanatic purchase and then ruin something he
deeply cared about&lt;/a&gt; and my experience of being a Tesla owner, I thought it
would be relevant to set up an &lt;a href=&#34;https://social.treehouse.systems/&#34;&gt;escape hatch&lt;/a&gt;.  Others were of the same
mind, and we shared notes.&lt;/p&gt;
&lt;p&gt;With the events of the past few weeks, I strongly believe that Twitter&amp;rsquo;s demise is
going to bring all of the proprietary social silos crashing down.  People are starting
to realize that trading freedom for the alleged convenience of using a proprietary
network isn&amp;rsquo;t worth it.  Although not perfect, ActivityPub is eating the world: there&amp;rsquo;s
now a million new users a week and this number is growing.&lt;/p&gt;
&lt;h3 id=&#34;-which-brings-me-to-the-not-so-fun-part-the-things-that-arent-going-so-well&#34;&gt;&amp;hellip; which brings me to the not so fun part, the things that aren&amp;rsquo;t going so well.&lt;/h3&gt;
&lt;p&gt;Although the fediverse is a decentralized and disparate network with many different
groups with their own cultural norms, some of them have tried to enforce their cultural
norms on the new users.  This is normal and to be expected to some extent, as people
don&amp;rsquo;t like big changes.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t want to get into the nuances of some of these conversations.  What I do want
to say is that the fediverse is a diverse network of different people who bring their
own styles and approaches to posting and content curation.  It is entirely fine to
bring your whole self to the conversation in uncensored form if that is what you feel
is right to do.  Do what &lt;em&gt;you&lt;/em&gt; feel is right, and don&amp;rsquo;t worry about people muting or
blocking your account, because you&amp;rsquo;re not here for &lt;em&gt;them&lt;/em&gt;, you&amp;rsquo;re here for &lt;em&gt;yourself&lt;/em&gt;
and you will meet likeminded people regardless of who blocks you.&lt;/p&gt;
&lt;p&gt;The other problem is, of course, a question of scaling anti-abuse tools.  Many have
posted screenshots of abuse they have received, and it comes from a segment of the
larger network where the culture is most diplomatically described as &amp;ldquo;player vs
player.&amp;rdquo;  It is fine for those instances to exist, but we need to build better tools
so that newcomers can be aware of segments of the network that they may want to
exclude themselves from: what we have today where admins informally share threat data
with each other is hard to scale upwards.&lt;/p&gt;
&lt;p&gt;In general these are good problems to have, because they are easy to overcome.  Overall
the future is looking bright.&lt;/p&gt;
&lt;h3 id=&#34;which-instances-are-you-recommending-right-now&#34;&gt;Which instances are you recommending right now?&lt;/h3&gt;
&lt;p&gt;At the moment, I am trying to recommend instances which have a moderation policy
aligned with providing a safe space for marginalized identities like mine which
are also targeted at technical people.&lt;/p&gt;
&lt;p&gt;Some recommendations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://hachyderm.io&#34;&gt;hachyderm.io&lt;/a&gt;, running Mastodon 3.5.3 and administrated
by Kris Nova and other volunteers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://social.restless.systems&#34;&gt;social.restless.systems&lt;/a&gt;, running Mastodon 4.0 and
administrated by NCommander, a tech YouTuber.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://social.treehouse.systems&#34;&gt;social.treehouse.systems&lt;/a&gt;, run by me and other
volunteers.  It also runs Mastodon 4.0.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The reasons why I recommend these instances are because the administrative capabilities
far exceed those required by the Mastodon Server Covenant: the above instances are run
by teams with marginalized backgrounds and extensive SRE experience.&lt;/p&gt;
&lt;p&gt;I am planning to put together a larger tool for finding instances which have been stood
up as part of this new wave of SRE-backed quasi-professional instances.&lt;/p&gt;
&lt;h3 id=&#34;what-next&#34;&gt;What next?&lt;/h3&gt;
&lt;p&gt;Next time, I will write a bit about how my own instance is put together and how it has
evolved over the past few months.  Stay tuned for that one.&lt;/p&gt;
</description>
      <source:markdown>
Earlier today, I deleted all of my tweets and left Twitter forever.  While I
plan on leaving a nightlight thread for a while, I will eventually close my
account, assuming Elon doesn&#39;t do it for me.

The past week has been an emotional rollercoaster for me as I have watched
everything play out.

I was one of the original fediverse users when Indymedia UK stood up the
`indy.im` StatusNet instance at the end of 2010.
After some time, Evan Prodromou got bored with the StatusNet code base
and started Pump instead, with the network losing the largest instance
at that time, `identi.ca`.
With the network fragmented as a result of that switch, I got bored of
it and started using Twitter instead.

Eventually StatusNet was forked by Matt Lee and a few other FSF staffers and
became GNU Social.
I was not really around during this time, but it was around that time that
GamerGate happened, which created a network where half of the users were
Indymedia contributors and the other half were the initial seeds of the
alt-right.

While I was not heavily involved from a development perspective in the early
days of what we now call the fediverse, this began to change in late 2016 when
Eugen Rochko started Mastodon.
I was an early adopter of Mastodon, deploying Mastodon 0.6 on Heroku, using the
`mastodon.dereferenced.org` domain for my account.
But running Mastodon on Heroku (and later Scalingo) was expensive.  I did not
want to manage a Rails application by hand, and I hadn&#39;t started using Docker or
Kubernetes yet.

In early 2018, a developer psuedonymously known as lain began adding ActivityPub
federation support to Pleroma, and he convinced me to try it out as an alternative
to running Mastodon.
I found Pleroma and developing with Elixir to be exciting and fresh, compared to
other technology I was working with at the time.  I felt empowered to start doing
serious hacking on ActivityPub as a result of writing patches to Pleroma and
sending them to lain.

After a while, I became a Pleroma developer with commit rights.
I felt like we could use the same strategy I used to promote Alpine to promote Pleroma:
build a coalition of willing influencers to demonstrate the value proposition of
self-hosted social networks for user freedom, and so I started working on building
a group around it.
Because I was showing it to friends I already had, Pleroma grew into being a
project where many of the contributors were from queer and marginalized backgrounds
similar to mine.
Everything was going fine.  As a team, we built a lot of features that are still
innovative in this space, such as MRF and building the LitePub profile of ActivityPub,
which shifted the protocol from being a Content *Distribution* protocol to being a
Content *Advertisement* protocol.

Towards the end of 2019, it started going to shit.  By that time, I was running a
public instance, and the database kept having index corruption issues on a daily
basis.
Around the same time, the Soapbox project was launched, and they decided to use
Pleroma as their backend.  This led to a lot of friction inside the project, because
the Soapbox author had a tendency to share [his ideological positions][ag-trans]
inside the project space as part of his anti-trans activism.
I wound up leaving Pleroma toward the middle of 2020 because of the scalability
issues in the database with Pleroma 2.0 and the lack of any effort to maintain a
welcoming space for everyone.

   [ag-trans]: https://blog.alexgleason.me/trans/

I decided to take a break from the fediverse because of that decision, because I
felt a break was warranted.  I decided to try Twitter in earnest during that time,
but to be honest, I&#39;ve never found using Twitter to be enjoyable in the same way
as I found the fediverse to be enjoyable.

As I said a few weeks ago, I think that [commercial microblogging][cmb] has been
an absolute disaster for our society.  Relationships on Twitter are parasocial
and transactional, which leads to poisonous behavior, while relationships in the
fediverse are largely grounded and mutual.

   [cmb]: https://ariadne.space/2022/10/27/the-internet-is-broken-due-to-structural-injustice/

In April of this year, Elon Musk announced his intention to buy Twitter.  Based
on the experience of watching a [rich fanatic purchase and then ruin something he
deeply cared about][leenode] and my experience of being a Tesla owner, I thought it
would be relevant to set up an [escape hatch][th-masto].  Others were of the same
mind, and we shared notes.

   [leenode]: https://ariadne.space/2021/05/20/the-whole-freenode-kerfluffle/
   [th-masto]: https://social.treehouse.systems/

With the events of the past few weeks, I strongly believe that Twitter&#39;s demise is
going to bring all of the proprietary social silos crashing down.  People are starting
to realize that trading freedom for the alleged convenience of using a proprietary
network isn&#39;t worth it.  Although not perfect, ActivityPub is eating the world: there&#39;s
now a million new users a week and this number is growing.

### ... which brings me to the not so fun part, the things that aren&#39;t going so well.

Although the fediverse is a decentralized and disparate network with many different
groups with their own cultural norms, some of them have tried to enforce their cultural
norms on the new users.  This is normal and to be expected to some extent, as people
don&#39;t like big changes.

I don&#39;t want to get into the nuances of some of these conversations.  What I do want
to say is that the fediverse is a diverse network of different people who bring their
own styles and approaches to posting and content curation.  It is entirely fine to
bring your whole self to the conversation in uncensored form if that is what you feel
is right to do.  Do what *you* feel is right, and don&#39;t worry about people muting or
blocking your account, because you&#39;re not here for *them*, you&#39;re here for *yourself*
and you will meet likeminded people regardless of who blocks you.

The other problem is, of course, a question of scaling anti-abuse tools.  Many have
posted screenshots of abuse they have received, and it comes from a segment of the
larger network where the culture is most diplomatically described as &#34;player vs
player.&#34;  It is fine for those instances to exist, but we need to build better tools
so that newcomers can be aware of segments of the network that they may want to
exclude themselves from: what we have today where admins informally share threat data
with each other is hard to scale upwards.

In general these are good problems to have, because they are easy to overcome.  Overall
the future is looking bright.

### Which instances are you recommending right now?

At the moment, I am trying to recommend instances which have a moderation policy
aligned with providing a safe space for marginalized identities like mine which
are also targeted at technical people.

Some recommendations:

- [hachyderm.io](https://hachyderm.io), running Mastodon 3.5.3 and administrated
  by Kris Nova and other volunteers.

- [social.restless.systems](https://social.restless.systems), running Mastodon 4.0 and
  administrated by NCommander, a tech YouTuber.

- [social.treehouse.systems](https://social.treehouse.systems), run by me and other
  volunteers.  It also runs Mastodon 4.0.

The reasons why I recommend these instances are because the administrative capabilities
far exceed those required by the Mastodon Server Covenant: the above instances are run
by teams with marginalized backgrounds and extensive SRE experience.

I am planning to put together a larger tool for finding instances which have been stood
up as part of this new wave of SRE-backed quasi-professional instances.

### What next?

Next time, I will write a bit about how my own instance is put together and how it has
evolved over the past few months.  Stay tuned for that one.
</source:markdown>
    </item>
    
    <item>
      <title>The internet is broken due to structural injustice</title>
      <link>https://ariadne.space/2022/10/26/the-internet-is-broken-due.html</link>
      <pubDate>Wed, 26 Oct 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/10/27/the-internet-is-broken-due.html</guid>
      <description>&lt;p&gt;Over the past few years, I&amp;rsquo;ve come to realize that the Internet as we know
it is utterly broken.  Lately, I&amp;rsquo;ve also been pondering how participants
in the modern Internet have enabled and perpetuated harm to society at
large.  Repeatedly, we have seen the independence of the commons chipped
away by powerful men who wish for participants to serve their own whims,
while those who raise concerns with these developments are either shunned,
banned or doxed.&lt;/p&gt;
&lt;p&gt;On Friday, October 28th, we will see another demonstration of these structural
injustices where the commons takes another loss to the whims of a powerful man.
Last time, &lt;a href=&#34;https://ariadne.space/2021/05/20/the-whole-freenode-kerfluffle/&#34;&gt;it was freenode&amp;rsquo;s takeover by Andrew Lee&lt;/a&gt;, and this time it
will be Twitter&amp;rsquo;s takeover by Elon Musk.  No, really, the deal is already
concluded: &lt;a href=&#34;https://seekingalpha.com/news/3896099-twitter-delisting-from-nyse-effective-on-friday-after-musk-completes-deal&#34;&gt;TWTR will be delisted from NASDAQ on Friday&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Will this be the end of Twitter?  Probably not, but it will be the end of the
current relationship the commons shares with Twitter.  Instead of acting as
a self-described &amp;ldquo;public square,&amp;rdquo; it will further evolve into a chaotic
cacophony of trolling and counter-trolling driven in the name of algorithmic
engagement.  Some will move to other microblogging services and networks,
and will likely discover that everything which made Twitter horrible likely
applies in some way to the replacement.&lt;/p&gt;
&lt;h2 id=&#34;are-social-platforms-working-as-designed&#34;&gt;Are social platforms working as designed?&lt;/h2&gt;
&lt;p&gt;The reality is that &lt;strong&gt;microblogging sucks&lt;/strong&gt;, but Twitter managed to make it
addictive for a few reasons, &lt;a href=&#34;https://joinmastodon.org/&#34;&gt;which is &lt;em&gt;why&lt;/em&gt; the most popular alternative,
Mastodon&lt;/a&gt;, is basically a copy of the underlying formula, but tweaked to
work on the ActivityPub federated network (the so-called fediverse).&lt;/p&gt;
&lt;p&gt;The formula is not that hard to understand if you understand how people
think and react to stimulation.  People are inherently social creatures,
and because of the formula used by Twitter, have tried their best to use
Twitter &lt;em&gt;despite&lt;/em&gt; the inherent conceptual flaws behind microblogging.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve ever sat down at a slot machine, you will likely note that they are
constantly making noises as you interact with them.  These sounds are
designed to stimulate the reward center in your brain and thus cause it
to release endorphins.  In the same way, microblogging and other social
platform formulas have built rich notification systems to ensure that
users experience pleasure from being online.  Don&amp;rsquo;t believe me?  Try
muting the notifications from Twitter or Mastodon and see if you remain
interested in it: odds are, after a while, you won&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;The other key part of the formula: sow discord amongst the users.  This
can be done organically (by users) or algorithmically.  People have an
inherent desire to be &lt;em&gt;right&lt;/em&gt;, and this keeps the engagement loop going
as people fight over stupid things like whether Android or iPhones are
better.  The things being argued over do not even have to have any basis
in reality: people are more than happy to hold positions which falter to
any modicum of dialectical analysis, such as whether &lt;em&gt;furries are actually
shitting in litter boxes in schools&lt;/em&gt; (obviously this is bullshit if you think
about it for more than 10 seconds).&lt;/p&gt;
&lt;p&gt;Eventually these pointless arguments evolve into arguments which have
actual societal impact: &lt;em&gt;are trans people legitimate&lt;/em&gt; and &lt;em&gt;do they deserve
rights&lt;/em&gt;?  Obviously, they are, and they do, but in a world where
microblogging discourse is the primary form of media ingestion, the consumer
is manipulated with fight-or-flight challenges to make their own
280-character thought piece on the discourse of the day, which leads them
to consider the possibility that &lt;em&gt;perhaps&lt;/em&gt; Chudlord18 &lt;em&gt;might&lt;/em&gt; be on to something
when he points out that George Soros was seen at the last Bilderberg
meeting, entirely ignoring the part where Chudlord18&amp;rsquo;s post was
disinformation.&lt;/p&gt;
&lt;p&gt;Sadly, as we see in the world today, it turns out that fascism is the
most optimized ideology available given the limited cognitive bandwidth
constraints of a 280-character post.  This is because the answer is always
simple with fascism: generally a death threat towards the marginalized group
of the day will do just fine, which easily fits into 280 characters:
&lt;em&gt;&amp;ldquo;Storm the capitol building!&amp;quot;&lt;/em&gt;?  &lt;em&gt;&amp;ldquo;Hang Mike Pence!&amp;quot;&lt;/em&gt;?
Yep, even congressional members and vice presidents can be marginalized
under the right circumstances, &lt;em&gt;and&lt;/em&gt; it&amp;rsquo;s under 280 characters.&lt;/p&gt;
&lt;h2 id=&#34;spamming-and-scamming&#34;&gt;Spamming and scamming&lt;/h2&gt;
&lt;p&gt;Fascism is hardly the only problem that these networks face.  Almost every
day I get spam like this on either Twitter or Mastodon:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://ariadne.micro.blog/uploads/2025/15c86079bf.jpg&#34; alt=&#34;Spam messages for an affinity-fundraising scam from a Mastodon user&#34;&gt;&lt;/p&gt;
&lt;p&gt;Spam like this is a huge problem with Mastodon, but not with Pleroma, another
ActivityPub server, which provides a robust message filtering facility.
However, due to the combination of mismanagement of the Pleroma project and
an absolutely absurd fediverse turf war, admins of Pleroma instances are
written off by some Mastodon admins as being evil, even if they are otherwise
harmless.&lt;/p&gt;
&lt;p&gt;Between this and the architectural complexity of deploying a BEAM application
like Pleroma on Kubernetes, by comparison to how easy it is to deploy Mastodon
using Knative on Kubernetes, I am using Mastodon.  Since the project mismanagement
issues are largely resolved now, I might suck it up and convert the instance to
Pleroma in the near future just so I can deal with the spam in a more automated
way.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll probably continue to use Mastodon (or maybe Pleroma if I switch my instance
to it), but lately I&amp;rsquo;ve been using microblogging platforms less and less, as I
have realized that ultimately the format doesn&amp;rsquo;t provide the sense of community
I am looking for.&lt;/p&gt;
&lt;p&gt;And this is ultimately the problem with the fediverse: everything on the fediverse
is a clone of a proprietary platform, with basically the same social downsides.
It turns out when you take something useful, and turn it into a &amp;ldquo;social experience,&amp;rdquo;
you basically ruin its utility.&lt;/p&gt;
&lt;h2 id=&#34;social-tools-which-are-actually-respectful&#34;&gt;Social tools which are actually respectful&lt;/h2&gt;
&lt;p&gt;To me, social tools exist to facilitate communication with my friends, and perhaps
expansion of my friend group to others which have the same interests.  It turns out
that we already had good social tools for this all along: blogs and IRC.  Because of
certain realities &amp;ndash; it is inherently easier to clone an open protocol and turn it
into a proprietary service &amp;ndash; for most people, these social tools turned into
centralized platforms like Dreamwidth and Discord.&lt;/p&gt;
&lt;p&gt;Microblogging forces you to shout at people, while IRC (now for the most part Discord)
facilitates thoughtful conversation.  Social photo sharing encourages the editing of
photographs to make people appear more attractive for additional likes, while posting
photos of yourself to your blog removes that dopamine loop and lets you just focus on
living and occasionally documenting your life.&lt;/p&gt;
&lt;p&gt;Yes, the point is that these tools are largely boring.  They aren&amp;rsquo;t &lt;em&gt;meant&lt;/em&gt; to dominate
your life, they are meant to facilitate communication with your friends.  They exist to
serve the needs of the commons.&lt;/p&gt;
&lt;p&gt;Maybe somebody will eventually build the tools I am ultimately looking for.  In the
meantime, I&amp;rsquo;ve expanded my list of contact points to include services I previously
kept mostly private.&lt;/p&gt;
&lt;p&gt;But either way, for the most part, I won&amp;rsquo;t be investing my time in microblogging anymore,
be it on Twitter or Mastodon.&lt;/p&gt;
</description>
      <source:markdown>
Over the past few years, I&#39;ve come to realize that the Internet as we know
it is utterly broken.  Lately, I&#39;ve also been pondering how participants
in the modern Internet have enabled and perpetuated harm to society at
large.  Repeatedly, we have seen the independence of the commons chipped
away by powerful men who wish for participants to serve their own whims,
while those who raise concerns with these developments are either shunned,
banned or doxed.

On Friday, October 28th, we will see another demonstration of these structural
injustices where the commons takes another loss to the whims of a powerful man.
Last time, [it was freenode&#39;s takeover by Andrew Lee][fn], and this time it
will be Twitter&#39;s takeover by Elon Musk.  No, really, the deal is already
concluded: [TWTR will be delisted from NASDAQ on Friday][twtr-delisting].

   [fn]: https://ariadne.space/2021/05/20/the-whole-freenode-kerfluffle/
   [twtr-delisting]: https://seekingalpha.com/news/3896099-twitter-delisting-from-nyse-effective-on-friday-after-musk-completes-deal

Will this be the end of Twitter?  Probably not, but it will be the end of the
current relationship the commons shares with Twitter.  Instead of acting as
a self-described &#34;public square,&#34; it will further evolve into a chaotic
cacophony of trolling and counter-trolling driven in the name of algorithmic
engagement.  Some will move to other microblogging services and networks,
and will likely discover that everything which made Twitter horrible likely
applies in some way to the replacement.

## Are social platforms working as designed?

The reality is that **microblogging sucks**, but Twitter managed to make it
addictive for a few reasons, [which is *why* the most popular alternative,
Mastodon][masto], is basically a copy of the underlying formula, but tweaked to
work on the ActivityPub federated network (the so-called fediverse).

   [masto]: https://joinmastodon.org/

The formula is not that hard to understand if you understand how people
think and react to stimulation.  People are inherently social creatures,
and because of the formula used by Twitter, have tried their best to use
Twitter *despite* the inherent conceptual flaws behind microblogging.

If you&#39;ve ever sat down at a slot machine, you will likely note that they are
constantly making noises as you interact with them.  These sounds are
designed to stimulate the reward center in your brain and thus cause it
to release endorphins.  In the same way, microblogging and other social
platform formulas have built rich notification systems to ensure that
users experience pleasure from being online.  Don&#39;t believe me?  Try
muting the notifications from Twitter or Mastodon and see if you remain
interested in it: odds are, after a while, you won&#39;t.

The other key part of the formula: sow discord amongst the users.  This
can be done organically (by users) or algorithmically.  People have an
inherent desire to be *right*, and this keeps the engagement loop going
as people fight over stupid things like whether Android or iPhones are
better.  The things being argued over do not even have to have any basis
in reality: people are more than happy to hold positions which falter to
any modicum of dialectical analysis, such as whether *furries are actually
shitting in litter boxes in schools* (obviously this is bullshit if you think
about it for more than 10 seconds).

Eventually these pointless arguments evolve into arguments which have
actual societal impact: *are trans people legitimate* and *do they deserve
rights*?  Obviously, they are, and they do, but in a world where
microblogging discourse is the primary form of media ingestion, the consumer
is manipulated with fight-or-flight challenges to make their own
280-character thought piece on the discourse of the day, which leads them
to consider the possibility that *perhaps* Chudlord18 *might* be on to something
when he points out that George Soros was seen at the last Bilderberg
meeting, entirely ignoring the part where Chudlord18&#39;s post was
disinformation.

Sadly, as we see in the world today, it turns out that fascism is the
most optimized ideology available given the limited cognitive bandwidth
constraints of a 280-character post.  This is because the answer is always
simple with fascism: generally a death threat towards the marginalized group
of the day will do just fine, which easily fits into 280 characters:
*&#34;Storm the capitol building!&#34;*?  *&#34;Hang Mike Pence!&#34;*?
Yep, even congressional members and vice presidents can be marginalized
under the right circumstances, *and* it&#39;s under 280 characters.

## Spamming and scamming

Fascism is hardly the only problem that these networks face.  Almost every
day I get spam like this on either Twitter or Mastodon:

![Spam messages for an affinity-fundraising scam from a Mastodon user](https://ariadne.micro.blog/uploads/2025/15c86079bf.jpg)

Spam like this is a huge problem with Mastodon, but not with Pleroma, another
ActivityPub server, which provides a robust message filtering facility.
However, due to the combination of mismanagement of the Pleroma project and
an absolutely absurd fediverse turf war, admins of Pleroma instances are
written off by some Mastodon admins as being evil, even if they are otherwise
harmless.

Between this and the architectural complexity of deploying a BEAM application
like Pleroma on Kubernetes, by comparison to how easy it is to deploy Mastodon
using Knative on Kubernetes, I am using Mastodon.  Since the project mismanagement
issues are largely resolved now, I might suck it up and convert the instance to
Pleroma in the near future just so I can deal with the spam in a more automated
way.

I&#39;ll probably continue to use Mastodon (or maybe Pleroma if I switch my instance
to it), but lately I&#39;ve been using microblogging platforms less and less, as I
have realized that ultimately the format doesn&#39;t provide the sense of community
I am looking for.

And this is ultimately the problem with the fediverse: everything on the fediverse
is a clone of a proprietary platform, with basically the same social downsides.
It turns out when you take something useful, and turn it into a &#34;social experience,&#34;
you basically ruin its utility.

## Social tools which are actually respectful

To me, social tools exist to facilitate communication with my friends, and perhaps
expansion of my friend group to others which have the same interests.  It turns out
that we already had good social tools for this all along: blogs and IRC.  Because of
certain realities -- it is inherently easier to clone an open protocol and turn it
into a proprietary service -- for most people, these social tools turned into
centralized platforms like Dreamwidth and Discord.

Microblogging forces you to shout at people, while IRC (now for the most part Discord)
facilitates thoughtful conversation.  Social photo sharing encourages the editing of
photographs to make people appear more attractive for additional likes, while posting
photos of yourself to your blog removes that dopamine loop and lets you just focus on
living and occasionally documenting your life.

Yes, the point is that these tools are largely boring.  They aren&#39;t *meant* to dominate
your life, they are meant to facilitate communication with your friends.  They exist to
serve the needs of the commons.

Maybe somebody will eventually build the tools I am ultimately looking for.  In the
meantime, I&#39;ve expanded my list of contact points to include services I previously
kept mostly private.

But either way, for the most part, I won&#39;t be investing my time in microblogging anymore,
be it on Twitter or Mastodon.
</source:markdown>
    </item>
    
    <item>
      <title>So you&#39;ve decided to start a free software consultancy...</title>
      <link>https://ariadne.space/2022/08/10/so-youve-decided-to-start.html</link>
      <pubDate>Wed, 10 Aug 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/08/11/so-youve-decided-to-start.html</guid>
      <description>&lt;p&gt;Recently a friend of mine told me that he was planning to start a
free software consultancy, and asked for my advice, as I have an
extensive background doing free software consulting for a living.
While I have already given him some advice on how to proceed, I
thought it might be nice to write a blog expanding on my answer,
so that others who are interested in pursuing free software
consulting may benefit.&lt;/p&gt;
&lt;h2 id=&#34;framing-the-value-proposition&#34;&gt;Framing the value proposition&lt;/h2&gt;
&lt;p&gt;There are many things to consider when launching a free software
consultancy, but the key aspect to consider is how you frame the
value proposition of your consultancy.  A common mistake that new
founders make when starting their free software consultancies is
to frame the value proposition toward developers.  Rather than
doing this, you should frame your value proposition towards
management.&lt;/p&gt;
&lt;p&gt;For example, my friend described the value proposition of his
consultancy like this:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;I help people manage their open source server stuff for money.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This is not a good way to frame the value proposition of a
consultancy, because the manager will inevitably ask a question
like:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Why can&amp;rsquo;t we just hire an intern to manage that?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In this case, the manager is right to ask a question like that,
because the value proposition is not correctly framed.
The purpose of a free software consultancy is to &lt;em&gt;augment&lt;/em&gt; the
business&#39; IT competencies by leveraging the consultant&amp;rsquo;s gained
experience working in FOSS.  When you frame your value proposition
this way, it becomes more clear to the management why they
should engage with your consultancy.&lt;/p&gt;
&lt;p&gt;When pitching your value proposition to a prospective client,
you should try to empathise with the needs of the client, and
tailor your value proposition around how your consultancy can
satisfy their needs.&lt;/p&gt;
&lt;h2 id=&#34;pricing-for-services&#34;&gt;Pricing for services&lt;/h2&gt;
&lt;p&gt;For serious engagements, pricing should be defined as a function
of the value gained for the client from the engagement.  For
example, if the client saves $250k as a result of the engagement,
then you should charge a percentage of that savings.&lt;/p&gt;
&lt;p&gt;Proof of concept engagements should be priced lower than your
standard rate, as they represent higher risk for the client.
Since they are priced lower, the scope of work should also be
reduced verses a normal engagement.  A common strategy is to
split proof of concept engagements into phases, so that the
client does not have to commit budget for the entire engagement
up front, which can provide an opportunity to charge a little
more for the overall engagement.&lt;/p&gt;
&lt;p&gt;When you are starting out, you will also want to focus on
recurring revenue.  This provides two key benefits: first,
you have a bottom line greater than $0 if you aren&amp;rsquo;t able
to close larger engagements, which happens from time to time,
especially during the summer months, as managers tend to go
on holiday.  Second, the recurring revenue customers, assuming
that you provide them with good service, will recommend your
consultancy to others, including large businesses.&lt;/p&gt;
&lt;p&gt;These types of engagements should be priced according to what
you believe to be a fair value for X hours of your time per
month.  As a general rule, most consultancies charge at least
$100 per hour for prepaid consultancy services.&lt;/p&gt;
&lt;p&gt;An example of a recurring service would be something like
server maintenance for a small business.  In this case, you
are augmenting the business with IT services, but the engagement
is likely to not require a large amount of time, meaning that
with automation, you can build out a customer portfolio of a
few hundred of these engagements.&lt;/p&gt;
&lt;h2 id=&#34;professional-services-networks&#34;&gt;Professional services networks&lt;/h2&gt;
&lt;p&gt;It is critical to pursue certifications like the RHCE.  The
value in these certifications are the access to the professional
services networks they provide.  They will also help with
customers who have compliance requirements that state that
the engineers working on a project have to be certified.&lt;/p&gt;
&lt;p&gt;Larger firms like Red Hat largely outsource their professional
services engagements to consultancies which have passed their
certification and joined their partner network.  These types of
relationships are critical: you get to leverage the power of
the larger firm&amp;rsquo;s sales capability to acquire new engagements
by bidding on them.&lt;/p&gt;
&lt;p&gt;Similarly, you should seek out partnerships with other
consultancies, as doing so will expand the range of capabilities
that your consultancy has.  For example, you might not have
familiarity with enterprise networking equipment, but if you have
a relationship with a consultancy that does have the ability to
take on managing enterprise networking equipment, then you can
join forces and bid on contracts which have that requirement
in their success criteria.&lt;/p&gt;
&lt;p&gt;All of the companies from Red Hat to AWS have professional
services networks.  Find the ones relevant to the skills your
consultancy has and join them.&lt;/p&gt;
&lt;h2 id=&#34;invoicing-and-payment&#34;&gt;Invoicing and payment&lt;/h2&gt;
&lt;p&gt;Larger engagements will &lt;strong&gt;always&lt;/strong&gt; be NET-30 at the least, where
NET means no earlier than X days.  This allows the client the
ability to check your work and ensure they are satisfied with
what you have delivered.&lt;/p&gt;
&lt;p&gt;If you need the money sooner, there are a few options.  First
you can offer a discount for paying early, an industry standard
is a 10% early payment discount.  Another option is to use a
factoring company.  Factoring works by selling the obligation
to a third party, which collects on your behalf for a fee and
advances you the payment.  If you use a payment platform such
as Quickbooks or Bill.com, these platforms have integration with
factoring companies, allowing you to get payment sooner.&lt;/p&gt;
&lt;h2 id=&#34;negotiation&#34;&gt;Negotiation&lt;/h2&gt;
&lt;p&gt;An engagement will always consist of a written contract with a
Statement of Work, frequently called an SOW.  The SOW lays out
the success criteria for the engagement.  SOWs can be open-ended
or they can be highly precise.  There are advantages to both
approaches when authoring an SOW, but an open-ended SOW can wind
up creating problems during the engagement, as it provides
flexibility for both you &lt;em&gt;and&lt;/em&gt; your client.&lt;/p&gt;
&lt;p&gt;Always negotiate deals in writing, never take an engagement on
an oral promise alone.  If a deal requires a third-party to provide
some of the success criteria, get their commitment in writing,
or you may be left holding the bag.&lt;/p&gt;
&lt;h2 id=&#34;following-up&#34;&gt;Following up&lt;/h2&gt;
&lt;p&gt;An engagement should ideally be thought of as a free-flowing
conversation that results in the resolution of the success
criteria stated in the SOW.  Accordingly, it is &lt;em&gt;vital&lt;/em&gt; to
keep the conversation going.&lt;/p&gt;
&lt;p&gt;This means that you should follow up with the client on a
regular basis to keep them informed of the progress of the
work being done as part of the engagement, and to solicit
feedback early.  It is far easier to change the course of
an engagement earlier than after hundreds of hours have
gone into the work.&lt;/p&gt;
&lt;p&gt;When discussing the engagement, it should be considered an
active listening exercise: you lay out what your team is
building, and then the client provides feedback based on
your presentation.  From there, the conversation moves into
defining what forward progress looks like.&lt;/p&gt;
&lt;h2 id=&#34;takeaways&#34;&gt;Takeaways&lt;/h2&gt;
&lt;p&gt;These are just my observations from nearly 20 years of doing
professional consulting around FOSS.  There is no singular
right way of running a consultancy, but these are the key
aspects that helped me to maintain good working relationships
with my customers.&lt;/p&gt;
&lt;p&gt;Running a FOSS consultancy is hard work, but can result in
a sustainable business, if you are willing to put in the
work.&lt;/p&gt;
</description>
      <source:markdown>
Recently a friend of mine told me that he was planning to start a
free software consultancy, and asked for my advice, as I have an
extensive background doing free software consulting for a living.
While I have already given him some advice on how to proceed, I
thought it might be nice to write a blog expanding on my answer,
so that others who are interested in pursuing free software
consulting may benefit.

## Framing the value proposition

There are many things to consider when launching a free software
consultancy, but the key aspect to consider is how you frame the
value proposition of your consultancy.  A common mistake that new
founders make when starting their free software consultancies is
to frame the value proposition toward developers.  Rather than
doing this, you should frame your value proposition towards
management.

For example, my friend described the value proposition of his
consultancy like this:

&#34;I help people manage their open source server stuff for money.&#34;

This is not a good way to frame the value proposition of a
consultancy, because the manager will inevitably ask a question
like:

&#34;Why can&#39;t we just hire an intern to manage that?&#34;

In this case, the manager is right to ask a question like that,
because the value proposition is not correctly framed.
The purpose of a free software consultancy is to *augment* the
business&#39; IT competencies by leveraging the consultant&#39;s gained
experience working in FOSS.  When you frame your value proposition
this way, it becomes more clear to the management why they
should engage with your consultancy.

When pitching your value proposition to a prospective client,
you should try to empathise with the needs of the client, and
tailor your value proposition around how your consultancy can
satisfy their needs.

## Pricing for services

For serious engagements, pricing should be defined as a function
of the value gained for the client from the engagement.  For
example, if the client saves $250k as a result of the engagement,
then you should charge a percentage of that savings.

Proof of concept engagements should be priced lower than your
standard rate, as they represent higher risk for the client.
Since they are priced lower, the scope of work should also be
reduced verses a normal engagement.  A common strategy is to
split proof of concept engagements into phases, so that the
client does not have to commit budget for the entire engagement
up front, which can provide an opportunity to charge a little
more for the overall engagement.

When you are starting out, you will also want to focus on
recurring revenue.  This provides two key benefits: first,
you have a bottom line greater than $0 if you aren&#39;t able
to close larger engagements, which happens from time to time,
especially during the summer months, as managers tend to go
on holiday.  Second, the recurring revenue customers, assuming
that you provide them with good service, will recommend your
consultancy to others, including large businesses.

These types of engagements should be priced according to what
you believe to be a fair value for X hours of your time per
month.  As a general rule, most consultancies charge at least
$100 per hour for prepaid consultancy services.

An example of a recurring service would be something like
server maintenance for a small business.  In this case, you
are augmenting the business with IT services, but the engagement
is likely to not require a large amount of time, meaning that
with automation, you can build out a customer portfolio of a
few hundred of these engagements.

## Professional services networks

It is critical to pursue certifications like the RHCE.  The
value in these certifications are the access to the professional
services networks they provide.  They will also help with
customers who have compliance requirements that state that
the engineers working on a project have to be certified.

Larger firms like Red Hat largely outsource their professional
services engagements to consultancies which have passed their
certification and joined their partner network.  These types of
relationships are critical: you get to leverage the power of
the larger firm&#39;s sales capability to acquire new engagements
by bidding on them.

Similarly, you should seek out partnerships with other
consultancies, as doing so will expand the range of capabilities
that your consultancy has.  For example, you might not have
familiarity with enterprise networking equipment, but if you have
a relationship with a consultancy that does have the ability to
take on managing enterprise networking equipment, then you can
join forces and bid on contracts which have that requirement
in their success criteria.

All of the companies from Red Hat to AWS have professional
services networks.  Find the ones relevant to the skills your
consultancy has and join them.

## Invoicing and payment

Larger engagements will **always** be NET-30 at the least, where
NET means no earlier than X days.  This allows the client the
ability to check your work and ensure they are satisfied with
what you have delivered.

If you need the money sooner, there are a few options.  First
you can offer a discount for paying early, an industry standard
is a 10% early payment discount.  Another option is to use a
factoring company.  Factoring works by selling the obligation
to a third party, which collects on your behalf for a fee and
advances you the payment.  If you use a payment platform such
as Quickbooks or Bill.com, these platforms have integration with
factoring companies, allowing you to get payment sooner.

## Negotiation

An engagement will always consist of a written contract with a
Statement of Work, frequently called an SOW.  The SOW lays out
the success criteria for the engagement.  SOWs can be open-ended
or they can be highly precise.  There are advantages to both
approaches when authoring an SOW, but an open-ended SOW can wind
up creating problems during the engagement, as it provides
flexibility for both you *and* your client.

Always negotiate deals in writing, never take an engagement on
an oral promise alone.  If a deal requires a third-party to provide
some of the success criteria, get their commitment in writing,
or you may be left holding the bag.

## Following up

An engagement should ideally be thought of as a free-flowing
conversation that results in the resolution of the success
criteria stated in the SOW.  Accordingly, it is *vital* to
keep the conversation going.

This means that you should follow up with the client on a
regular basis to keep them informed of the progress of the
work being done as part of the engagement, and to solicit
feedback early.  It is far easier to change the course of
an engagement earlier than after hundreds of hours have
gone into the work.

When discussing the engagement, it should be considered an
active listening exercise: you lay out what your team is
building, and then the client provides feedback based on
your presentation.  From there, the conversation moves into
defining what forward progress looks like.

## Takeaways

These are just my observations from nearly 20 years of doing
professional consulting around FOSS.  There is no singular
right way of running a consultancy, but these are the key
aspects that helped me to maintain good working relationships
with my customers.

Running a FOSS consultancy is hard work, but can result in
a sustainable business, if you are willing to put in the
work.
</source:markdown>
    </item>
    
    <item>
      <title>Free software grows as a function of social utility</title>
      <link>https://ariadne.space/2022/08/05/free-software-grows-as-a.html</link>
      <pubDate>Fri, 05 Aug 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/08/06/free-software-grows-as-a.html</guid>
      <description>&lt;p&gt;A frequent complaint I see from users and inexperienced contributors
concerning free software projects is that they are allegedly not doing
enough to grow the userbase, sometimes even asserting that a fork is
necessary to right the course of the project.&lt;/p&gt;
&lt;p&gt;Are these complaints missing the point, or do they have merit?
How do free software projects grow their userbase into thriving
communities?&lt;/p&gt;
&lt;p&gt;In general, these complaints go something like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[PROJECT] developers have explicitly said they do not want the
project to grow.  The [PROJECT] is its own worst enemy, and this
is just the latest example of it I&amp;rsquo;ve seen.  I don&amp;rsquo;t trust the
direction of [PROJECT], and neither should you.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The experienced maintainer understands that we must play the long game,
not the short game.  Tactics such as &lt;em&gt;embrace, extend, extinguish&lt;/em&gt;
are largely only effective when maintainers are looking at the short
term picture.  This is because organic growth in the use of a free
software package is a function of that package&amp;rsquo;s social utility: a
software package which provides utility to its community will experience
growth in adoption because its users will recommend that others try
the software package and join the project&amp;rsquo;s community.&lt;/p&gt;
&lt;p&gt;The social utility of a given software package is not necessarily
tied to mass-market adoption.  It is possible for a software package
to be extremely popular in a tight-knit community, while holding
very little social utility for the mass-market, and this is totally
fine.  In fact, this is the case for most software packages which
exist in the world.&lt;/p&gt;
&lt;p&gt;Likewise, it is possible to pursue new feature development as a
gamble on obtaining mass-market adoption, and destroy the social
utility of the product for its current userbase.  An experienced
maintainer will recognize that such gambles rarely pay off, and
usually wind up damaging the project rather than growing it.&lt;/p&gt;
&lt;p&gt;Unfortunately, society teaches us that we should grow at any cost,
which means that inexperienced maintainers can be swayed by such
arguments to make harmful decisions to their project.  But if we
recognize that these types of arguments are inherently defective,
we can help maintainers to avoid taking them seriously.&lt;/p&gt;
</description>
      <source:markdown>
A frequent complaint I see from users and inexperienced contributors
concerning free software projects is that they are allegedly not doing
enough to grow the userbase, sometimes even asserting that a fork is
necessary to right the course of the project.

Are these complaints missing the point, or do they have merit?
How do free software projects grow their userbase into thriving
communities?

In general, these complaints go something like this:

&gt; [PROJECT] developers have explicitly said they do not want the
&gt; project to grow.  The [PROJECT] is its own worst enemy, and this
&gt; is just the latest example of it I&#39;ve seen.  I don&#39;t trust the
&gt; direction of [PROJECT], and neither should you.

The experienced maintainer understands that we must play the long game,
not the short game.  Tactics such as *embrace, extend, extinguish*
are largely only effective when maintainers are looking at the short
term picture.  This is because organic growth in the use of a free
software package is a function of that package&#39;s social utility: a
software package which provides utility to its community will experience
growth in adoption because its users will recommend that others try
the software package and join the project&#39;s community.

The social utility of a given software package is not necessarily
tied to mass-market adoption.  It is possible for a software package
to be extremely popular in a tight-knit community, while holding
very little social utility for the mass-market, and this is totally
fine.  In fact, this is the case for most software packages which
exist in the world.

Likewise, it is possible to pursue new feature development as a
gamble on obtaining mass-market adoption, and destroy the social
utility of the product for its current userbase.  An experienced
maintainer will recognize that such gambles rarely pay off, and
usually wind up damaging the project rather than growing it.

Unfortunately, society teaches us that we should grow at any cost,
which means that inexperienced maintainers can be swayed by such
arguments to make harmful decisions to their project.  But if we
recognize that these types of arguments are inherently defective,
we can help maintainers to avoid taking them seriously.
</source:markdown>
    </item>
    
    <item>
      <title>Migrating away from WordPress</title>
      <link>https://ariadne.space/2022/08/03/migrating-away-from-wordpress.html</link>
      <pubDate>Wed, 03 Aug 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/08/04/migrating-away-from-wordpress.html</guid>
      <description>&lt;p&gt;Astute followers of this blog might have noticed that the layout has
dramatically changed.  This is because I migrated away from WordPress
last weekend, switching &lt;a href=&#34;https://gohugo.io/&#34;&gt;back to Hugo&lt;/a&gt; after a few years.  This
time around, the blog is fully self-hosted, rather than depending on
GitHub pages, and the deployment pipeline is reasonably secure.
Perhaps we can call it a &amp;ldquo;secure blog factory&amp;rdquo; with some further work,
even.&lt;/p&gt;
&lt;p&gt;When most people deploy static websites anymore, they use a service
like Netlify, or GitHub pages to do it.  These services are reasonable,
but when you do not own your own infrastructure, you are dependent on
a third party continuing to offer the service.  With the latest news
that GitLab has decided to &lt;a href=&#34;https://www.theregister.com/2022/08/04/gitlab_data_retention_policy/&#34;&gt;delete user data that has not been touched
in over a year&lt;/a&gt;, depending on third party services may be
something to start considering in your security and reliability posture.&lt;/p&gt;
&lt;h2 id=&#34;migrating-back-to-self-hosting&#34;&gt;Migrating back to self-hosting&lt;/h2&gt;
&lt;p&gt;Because of the fact that I cannot really depend on third party
services to &lt;a href=&#34;https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/&#34;&gt;conduct themselves in alignment with my own personal ethics&lt;/a&gt;
and expectations for service reliability, even when paying them to do
so, I started to self-host the majority of my services over the past
year.  This has had significant benefit in terms of enabling me to
have actual visibility into the behavior of my services, and to tune
the performance of them, as needed.  Software such as Knative has
enabled me to work with my own infrastructure as if it were a managed
cloud service at one of the big providers.&lt;/p&gt;
&lt;p&gt;There was one problem, however.  Some of the services I adopted when I
decided to start seriously hosting my own services again, have a less
than stellar security record, such as WordPress.  While the core of
WordPress itself has an acceptable security record, as soon as you use
basically any plugin, it goes out the window.  Much of this was
mitigated by the fact that I ran a custom WordPress image as a Knative
service, which meant that if any of the plugins got compromised, I could
just restart the pod, and I would be back to normal, but I have always
thought that it could be done better.&lt;/p&gt;
&lt;h2 id=&#34;setting-up-gitea-because-github-introduced-copilot&#34;&gt;Setting up Gitea because GitHub introduced Copilot&lt;/h2&gt;
&lt;p&gt;Last year, GitHub announced Copilot, a neural model that was trained on
the entire corpus of publicly available source code posted to GitHub.
While Microsoft claims that this is allowed under fair use, the overwhelming
majority of experts disagree so far.  My personal opinion is that this
was a &lt;a href=&#34;https://ariadne.space/2022/07/01/a-silo-can-never-provide-digital-autonomy-to-its-users/&#34;&gt;breach of the public trust&lt;/a&gt; that the FOSS community
originally placed in GitHub, and a lesson that we must own our own
infrastructure in order to maintain &lt;a href=&#34;https://techautonomy.org/&#34;&gt;our autonomy in the digital world&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As a result of all of this, I wound up setting up my own &lt;a href=&#34;https://gitea.treehouse.systems/&#34;&gt;Gitea instance&lt;/a&gt;,
which I use to maintain my own source code.  In addition to Gitea, because
I needed a CI service, I deployed a &lt;a href=&#34;https://woodpecker.treehouse.systems/&#34;&gt;Woodpecker instance&lt;/a&gt;.  Both
of these services are very easy to deploy if you are already using Kubernetes,
and come highly recommended (it is also possible to use Tekton for CI with
Gitea, but it requires more work at the moment).&lt;/p&gt;
&lt;h2 id=&#34;automatically-publishing-the-blog-with-woodpecker&#34;&gt;Automatically publishing the blog with Woodpecker&lt;/h2&gt;
&lt;p&gt;If you look at the &lt;a href=&#34;https://gitea.treehouse.systems/ariadne/ariadne.space&#34;&gt;source code for my blog&lt;/a&gt;, you will notice that
it is largely a normal Hugo site, that has some basic plumbing around
Woodpecker.  I also use &lt;a href=&#34;https://github.com/chainguard-dev/apko&#34;&gt;apko&lt;/a&gt; to build a custom image that has
all of the tools needed to build and deploy the website, which is
self-hosted on the new OCI registry implemented in Gitea 1.17.  For those
interested, you can look at the &lt;a href=&#34;https://woodpecker.treehouse.systems/ariadne/ariadne.space/build/16&#34;&gt;logs of the deploy job&lt;/a&gt;
used to post this article!&lt;/p&gt;
&lt;p&gt;Almost all of the interesting stuff is in &lt;a href=&#34;https://gitea.treehouse.systems/ariadne/ariadne.space/src/branch/main/.woodpecker.yml&#34;&gt;the &lt;code&gt;woodpecker.yml&lt;/code&gt; file&lt;/a&gt;,
however, which does the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Builds an up to date Hugo image from scratch on every deploy using
apko.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Builds the new site.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Fetches the contents of the last announcement post (&lt;code&gt;newswire.txt&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deploys the new site using an SSH key stored as a secret, and a pinned
known SSH key also stored as a secret.  The latter is largely so I can
just update the secret if I change SSH keys on that host.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Checks if the last announcement post is different than the new one,
and if so, sends off a post &lt;a href=&#34;https://social.treehouse.systems/@ariadne&#34;&gt;to my Mastodon account&lt;/a&gt; using a
personal access token.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The last point is the big deal (to me).  While ultimately it was not
difficult to set up, my original reason for using WordPress was that it
could do this type of automation out of the box.  Paying off the technical
debt of having to worry about WordPress being compromised has certainly
been worth it, however.&lt;/p&gt;
&lt;h2 id=&#34;future-improvements&#34;&gt;Future improvements&lt;/h2&gt;
&lt;p&gt;There are some future improvements I would like to do.  For example, I
would like to sign the Hugo image I create with &lt;code&gt;cosign&lt;/code&gt;.  The current
blocker on this is that Gitea does not support a configurable &lt;code&gt;audience&lt;/code&gt;
setting on its OpenID Connect implementation.  Once that is done, it
should be possible to start working towards allowing Gitea instances to
work as OpenID Connect identities for use with the Sigstore
infrastructure.  This will be very powerful when combined with the
new OCI registry support introduced in Gitea 1.17!&lt;/p&gt;
&lt;p&gt;I also have some opinions on Woodpecker and other self-hosted CI systems,
that I plan to go into more detail on in a near-future blog post.&lt;/p&gt;
&lt;p&gt;Hopefully the above provides some inspiration to play with self-hosting
your own website, or perhaps playing with apko outside of the GitHub
ecosystem.  Thanks to the &lt;a href=&#34;https://gitea.io/&#34;&gt;Gitea&lt;/a&gt; and &lt;a href=&#34;https://woodpecker-ci.org/&#34;&gt;Woodpecker&lt;/a&gt; developers for
making software that is easy to deploy, as well.&lt;/p&gt;
</description>
      <source:markdown>
Astute followers of this blog might have noticed that the layout has
dramatically changed.  This is because I migrated away from WordPress
last weekend, switching [back to Hugo][hugo] after a few years.  This
time around, the blog is fully self-hosted, rather than depending on
GitHub pages, and the deployment pipeline is reasonably secure.
Perhaps we can call it a &#34;secure blog factory&#34; with some further work,
even.

   [hugo]: https://gohugo.io/

When most people deploy static websites anymore, they use a service
like Netlify, or GitHub pages to do it.  These services are reasonable,
but when you do not own your own infrastructure, you are dependent on
a third party continuing to offer the service.  With the latest news
that GitLab has decided to [delete user data that has not been touched
in over a year][gl-deletion], depending on third party services may be
something to start considering in your security and reliability posture.

   [gl-deletion]: https://www.theregister.com/2022/08/04/gitlab_data_retention_policy/

## Migrating back to self-hosting

Because of the fact that I cannot really depend on third party
services to [conduct themselves in alignment with my own personal ethics][copilot]
and expectations for service reliability, even when paying them to do
so, I started to self-host the majority of my services over the past
year.  This has had significant benefit in terms of enabling me to
have actual visibility into the behavior of my services, and to tune
the performance of them, as needed.  Software such as Knative has
enabled me to work with my own infrastructure as if it were a managed
cloud service at one of the big providers.

   [copilot]: https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/

There was one problem, however.  Some of the services I adopted when I
decided to start seriously hosting my own services again, have a less
than stellar security record, such as WordPress.  While the core of
WordPress itself has an acceptable security record, as soon as you use
basically any plugin, it goes out the window.  Much of this was
mitigated by the fact that I ran a custom WordPress image as a Knative
service, which meant that if any of the plugins got compromised, I could
just restart the pod, and I would be back to normal, but I have always
thought that it could be done better.

## Setting up Gitea because GitHub introduced Copilot

Last year, GitHub announced Copilot, a neural model that was trained on
the entire corpus of publicly available source code posted to GitHub.
While Microsoft claims that this is allowed under fair use, the overwhelming
majority of experts disagree so far.  My personal opinion is that this
was a [breach of the public trust][gh-cohost] that the FOSS community
originally placed in GitHub, and a lesson that we must own our own
infrastructure in order to maintain [our autonomy in the digital world][da].

   [gh-cohost]: https://ariadne.space/2022/07/01/a-silo-can-never-provide-digital-autonomy-to-its-users/
   [da]: https://techautonomy.org/

As a result of all of this, I wound up setting up my own [Gitea instance][gitea],
which I use to maintain my own source code.  In addition to Gitea, because
I needed a CI service, I deployed a [Woodpecker instance][woodpecker].  Both
of these services are very easy to deploy if you are already using Kubernetes,
and come highly recommended (it is also possible to use Tekton for CI with
Gitea, but it requires more work at the moment).

   [gitea]: https://gitea.treehouse.systems/
   [woodpecker]: https://woodpecker.treehouse.systems/

## Automatically publishing the blog with Woodpecker

If you look at the [source code for my blog][src], you will notice that
it is largely a normal Hugo site, that has some basic plumbing around
Woodpecker.  I also use [apko][apko] to build a custom image that has
all of the tools needed to build and deploy the website, which is
self-hosted on the new OCI registry implemented in Gitea 1.17.  For those
interested, you can look at the [logs of the deploy job][deploy-logs]
used to post this article!

   [src]: https://gitea.treehouse.systems/ariadne/ariadne.space
   [apko]: https://github.com/chainguard-dev/apko
   [deploy-logs]: https://woodpecker.treehouse.systems/ariadne/ariadne.space/build/16

Almost all of the interesting stuff is in [the `woodpecker.yml` file][wpcfg],
however, which does the following:

   [wpcfg]: https://gitea.treehouse.systems/ariadne/ariadne.space/src/branch/main/.woodpecker.yml

 * Builds an up to date Hugo image from scratch on every deploy using
   apko.

 * Builds the new site.

 * Fetches the contents of the last announcement post (`newswire.txt`).

 * Deploys the new site using an SSH key stored as a secret, and a pinned
   known SSH key also stored as a secret.  The latter is largely so I can
   just update the secret if I change SSH keys on that host.

 * Checks if the last announcement post is different than the new one,
   and if so, sends off a post [to my Mastodon account][masto] using a
   personal access token.

   [masto]: https://social.treehouse.systems/@ariadne

The last point is the big deal (to me).  While ultimately it was not
difficult to set up, my original reason for using WordPress was that it
could do this type of automation out of the box.  Paying off the technical
debt of having to worry about WordPress being compromised has certainly
been worth it, however.

## Future improvements

There are some future improvements I would like to do.  For example, I
would like to sign the Hugo image I create with `cosign`.  The current
blocker on this is that Gitea does not support a configurable `audience`
setting on its OpenID Connect implementation.  Once that is done, it
should be possible to start working towards allowing Gitea instances to
work as OpenID Connect identities for use with the Sigstore
infrastructure.  This will be very powerful when combined with the
new OCI registry support introduced in Gitea 1.17!

I also have some opinions on Woodpecker and other self-hosted CI systems,
that I plan to go into more detail on in a near-future blog post.

Hopefully the above provides some inspiration to play with self-hosting
your own website, or perhaps playing with apko outside of the GitHub
ecosystem.  Thanks to the [Gitea][gt] and [Woodpecker][wp] developers for
making software that is easy to deploy, as well.

   [gt]: https://gitea.io/
   [wp]: https://woodpecker-ci.org/
</source:markdown>
    </item>
    
    <item>
      <title>How efficient can cat(1) be?</title>
      <link>https://ariadne.space/2022/07/16/how-efficient-can-cat-be.html</link>
      <pubDate>Sat, 16 Jul 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/07/17/how-efficient-can-cat-be.html</guid>
      <description>&lt;p&gt;There have been a few initiatives in recent years to implement
a new userspace base system for Linux distributions as an
alternative to the GNU coreutils and BusyBox.  Recently, one of
the authors of one of these proposed implementations made the
pitch in a few IRC channels &lt;a href=&#34;https://vimuser.org/cat.c.txt&#34;&gt;that her cat implementation&lt;/a&gt;,
which was derived from OpenBSD’s implementation, was the most
efficient.  But is it actually?&lt;/p&gt;
&lt;h2 id=&#34;understanding-what-cat-actually-does&#34;&gt;Understanding what &lt;code&gt;cat&lt;/code&gt; actually does&lt;/h2&gt;
&lt;p&gt;At the most basic level, &lt;code&gt;cat&lt;/code&gt; takes one or more files and
dumps them to &lt;code&gt;stdout&lt;/code&gt;.  But do we need to actually use &lt;code&gt;stdio&lt;/code&gt;
for this?  Actually, we don’t, and most competent &lt;code&gt;cat&lt;/code&gt;
implementations at least use &lt;code&gt;read(2)&lt;/code&gt; and &lt;code&gt;write(2)&lt;/code&gt; if not
more advanced approaches.&lt;/p&gt;
&lt;p&gt;If we consider &lt;code&gt;cat&lt;/code&gt; as a form of buffer copy between an
arbitrary file descriptor and &lt;code&gt;STDOUT_FILENO&lt;/code&gt;, we can understand
what the most efficient strategy to use for &lt;code&gt;cat&lt;/code&gt; would be: splicing.
Anything which isn’t doing splicing, after all, involves unnecessary
buffer copies, and thus cannot be the most efficient.&lt;/p&gt;
&lt;p&gt;To get the best performance out of spliced I/O, we have to have
some prerequisites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The source and destination file descriptors should be unbuffered.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Any intermediate buffer should be a multiple of the filesystem
block size.  In general, to avoid doing a &lt;code&gt;stat&lt;/code&gt; syscall, we can
assume that a multiple of &lt;code&gt;PAGE_SIZE&lt;/code&gt; is likely acceptable.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;a-simple-cat-implementation&#34;&gt;A simple &lt;code&gt;cat&lt;/code&gt; implementation&lt;/h2&gt;
&lt;p&gt;The simplest way to implement &lt;code&gt;cat&lt;/code&gt; is the way that it is done in
BSD: using &lt;code&gt;read&lt;/code&gt; and &lt;code&gt;write&lt;/code&gt; on an intermediate buffer.  This
results in two buffer copies, but has the best portability.  Most
implementations of &lt;code&gt;cat&lt;/code&gt; work this way, as it generally offers
good enough performance.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-c&#34; data-lang=&#34;c&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;/* This program is released into the public domain. */&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;err.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;errno.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;limits.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;fcntl.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;dumpfile&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;path)
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; srcfd &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; STDIN_FILENO;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; buf[PAGE_SIZE &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;];
	ssize_t nread, nwritten;
	size_t offset;

	&lt;span style=&#34;color:#75715e&#34;&gt;/* POSIX allows - to represent stdin. */&lt;/span&gt;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (&lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;path &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt;)
	{
		srcfd &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; open(path, O_RDONLY);
		&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (srcfd &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
			err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;open %s&amp;#34;&lt;/span&gt;, path);
	}

	&lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; ((nread &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; read(srcfd, buf, &lt;span style=&#34;color:#66d9ef&#34;&gt;sizeof&lt;/span&gt; buf)) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
	{
		&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; (offset &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;; nread &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;; nread &lt;span style=&#34;color:#f92672&#34;&gt;-=&lt;/span&gt; nwritten, offset &lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt; nwritten)
		{
			&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; ((nwritten &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; write(STDOUT_FILENO, buf &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; offset, nread)) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
				err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;write stdout&amp;#34;&lt;/span&gt;);
		}
	}

	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (srcfd &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; STDIN_FILENO)
		(&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt;) close(srcfd);
}

&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;main&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; argc, &lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;argv[])
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; i;

	&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; (i &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;; i &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; argc; i&lt;span style=&#34;color:#f92672&#34;&gt;++&lt;/span&gt;)
		dumpfile(argv[i]);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; EXIT_SUCCESS;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;implementing-spliced-io&#34;&gt;Implementing spliced I/O&lt;/h2&gt;
&lt;p&gt;Linux has no shortage of ways to perform spliced I/O.  For our &lt;code&gt;cat&lt;/code&gt;
implementation, we have two possible ways to do it.&lt;/p&gt;
&lt;p&gt;The first possible option is the venerable &lt;code&gt;sendfile&lt;/code&gt; syscall, which
was &lt;a href=&#34;https://yarchive.net/comp/linux/sendfile.html&#34;&gt;originally added to improve the file serving performance of web
servers&lt;/a&gt;. Originally, &lt;code&gt;sendfile&lt;/code&gt; required the destination
file descriptor to be a socket, but this restriction was removed in
Linux 2.6.33.  Unfortunately, sendfile is not perfect: because it
only supports file descriptors which can be memory mapped, we
must use a different strategy when using copying from &lt;code&gt;stdin&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-c&#34; data-lang=&#34;c&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;/* This program is released into the public domain. */&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;stdbool.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;err.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;errno.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;limits.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;fcntl.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;sys/sendfile.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;bool&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;spliced_copy&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; srcfd)
{
	ssize_t nwritten;
	off_t offset &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;;

	&lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
	{
		nwritten &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sendfile(STDOUT_FILENO, srcfd, &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&lt;/span&gt;offset,
				    PAGE_SIZE &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;);
		&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (nwritten &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
			&lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; false;
	} &lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; (nwritten &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; true;
}

&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;copy&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; srcfd)
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; buf[PAGE_SIZE &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;];
	size_t nread, nwritten, offset;

	&lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; ((nread &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; read(srcfd, buf, &lt;span style=&#34;color:#66d9ef&#34;&gt;sizeof&lt;/span&gt; buf)) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
	{
		&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; (offset &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;; nread &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;;
		     nread &lt;span style=&#34;color:#f92672&#34;&gt;-=&lt;/span&gt; nwritten, offset &lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt; nwritten)
		{
			&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; ((nwritten &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; write(STDOUT_FILENO,
					      buf &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; offset, nread)) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
				err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;write stdout&amp;#34;&lt;/span&gt;);
		}
	}
}

&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;dumpfile&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;path)
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; srcfd &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; STDIN_FILENO;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; buf[PAGE_SIZE &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;];

	&lt;span style=&#34;color:#75715e&#34;&gt;/* POSIX allows - to represent stdin. */&lt;/span&gt;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (&lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;path &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt;)
	{
		srcfd &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; open(path, O_RDONLY);
		&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (srcfd &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
			err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;open %s&amp;#34;&lt;/span&gt;, path);
	}

	&lt;span style=&#34;color:#75715e&#34;&gt;/* Fall back to traditional copy if the spliced version fails. */&lt;/span&gt;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (&lt;span style=&#34;color:#f92672&#34;&gt;!&lt;/span&gt;spliced_copy(srcfd))
		copy(srcfd);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (srcfd &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; STDIN_FILENO)
		(&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt;) close(srcfd);
}

&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;main&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; argc, &lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;argv[])
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; i;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; stdout_flags;

	stdout_flags &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; fcntl(STDOUT_FILENO, F_GETFL);
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (stdout_flags &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
		err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fcntl(STDOUT_FILENO, F_GETFL)&amp;#34;&lt;/span&gt;);
	stdout_flags &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;=&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt;O_APPEND;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (fcntl(STDOUT_FILENO, F_SETFL, stdout_flags) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
		err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fcntl(STDOUT_FILENO, F_SETFL)&amp;#34;&lt;/span&gt;);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; (i &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;; i &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; argc; i&lt;span style=&#34;color:#f92672&#34;&gt;++&lt;/span&gt;)
		dumpfile(argv[i]);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; EXIT_SUCCESS;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Another approach is to use &lt;code&gt;splice&lt;/code&gt; and a pipe.  This allows for true
zero-copy I/O in userspace, as a pipe is simply implemented as a 64KB
ring buffer in the kernel.  In this case, we just use two splice
operations per block of data we want to copy: one to move the data to
the pipe and another to move the data from the pipe to the output file.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-c&#34; data-lang=&#34;c&#34;&gt;&lt;span style=&#34;color:#75715e&#34;&gt;/* This program is released into the public domain. */&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;#define _GNU_SOURCE
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;stdbool.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;err.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;errno.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;limits.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;fcntl.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#include&lt;/span&gt; &lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;sys/sendfile.h&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;#define BLOCK_SIZE ((PAGE_SIZE * 16) - 1)
&lt;/span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&lt;/span&gt;
&lt;span style=&#34;color:#66d9ef&#34;&gt;bool&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;spliced_copy&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; srcfd)
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; pipefd[&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;];
	ssize_t nread, nwritten;
	off_t in_offset &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;bool&lt;/span&gt; ret &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; true;

	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (pipe(pipefd) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
		err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;pipe&amp;#34;&lt;/span&gt;);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;
	{
		nread &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; splice(srcfd, &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&lt;/span&gt;in_offset, pipefd[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;], NULL,
			       BLOCK_SIZE, SPLICE_F_MOVE &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; SPLICE_F_MORE);
		&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (nread &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
		{
			ret &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; nread &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;?&lt;/span&gt; false &lt;span style=&#34;color:#f92672&#34;&gt;:&lt;/span&gt; true;
			&lt;span style=&#34;color:#66d9ef&#34;&gt;goto&lt;/span&gt; out;
		}

		nwritten &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; splice(pipefd[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;], NULL, STDOUT_FILENO, NULL,
				  BLOCK_SIZE, SPLICE_F_MOVE &lt;span style=&#34;color:#f92672&#34;&gt;|&lt;/span&gt; SPLICE_F_MORE);
		&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (nwritten &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
		{
			ret &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; false;
			&lt;span style=&#34;color:#66d9ef&#34;&gt;goto&lt;/span&gt; out;
		}
	} &lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; (nwritten &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;);

out:
	close(pipefd[&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;]);
	close(pipefd[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;]);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; ret;
}

&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;copy&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; srcfd)
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; buf[PAGE_SIZE &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;];
	size_t nread, nwritten, offset;

	&lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; ((nread &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; read(srcfd, buf, &lt;span style=&#34;color:#66d9ef&#34;&gt;sizeof&lt;/span&gt; buf)) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;)
	{
		&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; (offset &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;; nread &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;;
		     nread &lt;span style=&#34;color:#f92672&#34;&gt;-=&lt;/span&gt; nwritten, offset &lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt; nwritten)
		{
			&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; ((nwritten &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; write(STDOUT_FILENO,
					      buf &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; offset, nread)) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
				err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;write stdout&amp;#34;&lt;/span&gt;);
		}
	}
}

&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;dumpfile&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;path)
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; srcfd &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; STDIN_FILENO;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; buf[PAGE_SIZE &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;16&lt;/span&gt;];

	&lt;span style=&#34;color:#75715e&#34;&gt;/* POSIX allows - to represent stdin. */&lt;/span&gt;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (&lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;path &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt;)
	{
		srcfd &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; open(path, O_RDONLY);
		&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (srcfd &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
			err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;open %s&amp;#34;&lt;/span&gt;, path);

		(&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt;) posix_fadvise(srcfd, &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, POSIX_FADV_SEQUENTIAL);
	}

	&lt;span style=&#34;color:#75715e&#34;&gt;/* Fall back to traditional copy if the spliced version fails. */&lt;/span&gt;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (&lt;span style=&#34;color:#f92672&#34;&gt;!&lt;/span&gt;spliced_copy(srcfd))
		copy(srcfd);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (srcfd &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; STDIN_FILENO)
		(&lt;span style=&#34;color:#66d9ef&#34;&gt;void&lt;/span&gt;) close(srcfd);
}

&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;main&lt;/span&gt;(&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; argc, &lt;span style=&#34;color:#66d9ef&#34;&gt;const&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;argv[])
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; i;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;int&lt;/span&gt; stdout_flags;

	stdout_flags &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; fcntl(STDOUT_FILENO, F_GETFL);
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (stdout_flags &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
		err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fcntl(STDOUT_FILENO, F_GETFL)&amp;#34;&lt;/span&gt;);
	stdout_flags &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;=&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt;O_APPEND;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (fcntl(STDOUT_FILENO, F_SETFL, stdout_flags) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
		err(EXIT_FAILURE, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fcntl(STDOUT_FILENO, F_SETFL)&amp;#34;&lt;/span&gt;);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; (i &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;; i &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; argc; i&lt;span style=&#34;color:#f92672&#34;&gt;++&lt;/span&gt;)
		dumpfile(argv[i]);

	&lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; EXIT_SUCCESS;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;honorable-mention-copy_file_range&#34;&gt;Honorable mention: &lt;code&gt;copy_file_range&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;While &lt;code&gt;copy_file_range&lt;/code&gt; is not really that relevant to a &lt;code&gt;cat&lt;/code&gt;
implementation, if both the source and output files are normal
files, you can use it to get even faster performance than using
splice, as the kernel handles all of the details on its own.  An
optimized &lt;code&gt;cat&lt;/code&gt; might try this strategy and then downgrade to
&lt;code&gt;splice&lt;/code&gt;, &lt;code&gt;sendfile&lt;/code&gt;, and the normal &lt;code&gt;read&lt;/code&gt; and &lt;code&gt;write&lt;/code&gt; loop.&lt;/p&gt;
&lt;h2 id=&#34;performance-comparison&#34;&gt;Performance comparison&lt;/h2&gt;
&lt;p&gt;To measure the performance of each strategy, we can simply use
&lt;code&gt;dd&lt;/code&gt; as a sink, running each cat program piped into
&lt;code&gt;dd of=/dev/null bs=64K iflag=fullblock&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The runs in the table below are averaged across 1000 runs on a
8GB RAM Linode, using a 4GB file in tmpfs.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cat-simple&lt;/code&gt; (&lt;code&gt;read&lt;/code&gt; and &lt;code&gt;write&lt;/code&gt; loop)&lt;/td&gt;
&lt;td&gt;3.6 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cat-sendfile&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;6.4 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cat-splice&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;11.6 GB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you are interested in using these implementations in your own
&lt;code&gt;cat&lt;/code&gt; implementation, you may do so under any license terms you
wish.&lt;/p&gt;
</description>
      <source:markdown>
There have been a few initiatives in recent years to implement
a new userspace base system for Linux distributions as an 
alternative to the GNU coreutils and BusyBox.  Recently, one of
the authors of one of these proposed implementations made the 
pitch in a few IRC channels [that her cat implementation][lc],
which was derived from OpenBSD’s implementation, was the most 
efficient.  But is it actually?

   [lc]: https://vimuser.org/cat.c.txt

## Understanding what `cat` actually does

At the most basic level, `cat` takes one or more files and
dumps them to `stdout`.  But do we need to actually use `stdio`
for this?  Actually, we don’t, and most competent `cat`
implementations at least use `read(2)` and `write(2)` if not
more advanced approaches.

If we consider `cat` as a form of buffer copy between an
arbitrary file descriptor and `STDOUT_FILENO`, we can understand
what the most efficient strategy to use for `cat` would be: splicing.
Anything which isn’t doing splicing, after all, involves unnecessary 
buffer copies, and thus cannot be the most efficient.

To get the best performance out of spliced I/O, we have to have
some prerequisites:

 * The source and destination file descriptors should be unbuffered.

 * Any intermediate buffer should be a multiple of the filesystem
   block size.  In general, to avoid doing a `stat` syscall, we can 
   assume that a multiple of `PAGE_SIZE` is likely acceptable.

## A simple `cat` implementation

The simplest way to implement `cat` is the way that it is done in
BSD: using `read` and `write` on an intermediate buffer.  This
results in two buffer copies, but has the best portability.  Most
implementations of `cat` work this way, as it generally offers
good enough performance.

```c
/* This program is released into the public domain. */
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;err.h&gt;
#include &lt;errno.h&gt;
#include &lt;limits.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;unistd.h&gt;

void dumpfile(const char *path)
{
	int srcfd = STDIN_FILENO;
	char buf[PAGE_SIZE * 16];
	ssize_t nread, nwritten;
	size_t offset;

	/* POSIX allows - to represent stdin. */
	if (*path != &#39;-&#39;)
	{
		srcfd = open(path, O_RDONLY);
		if (srcfd &lt; 0)
			err(EXIT_FAILURE, &#34;open %s&#34;, path);
	}

	while ((nread = read(srcfd, buf, sizeof buf)) &gt;= 1)
	{
		for (offset = 0; nread &gt; 0; nread -= nwritten, offset += nwritten)
		{
			if ((nwritten = write(STDOUT_FILENO, buf + offset, nread)) &lt;= 0)
				err(EXIT_FAILURE, &#34;write stdout&#34;);
		}
	}

	if (srcfd != STDIN_FILENO)
		(void) close(srcfd);
}

int main(int argc, const char *argv[])
{
	int i;

	for (i = 1; i &lt; argc; i++)
		dumpfile(argv[i]);

	return EXIT_SUCCESS;
}
```
## Implementing spliced I/O

Linux has no shortage of ways to perform spliced I/O.  For our `cat`
implementation, we have two possible ways to do it.

The first possible option is the venerable `sendfile` syscall, which
was [originally added to improve the file serving performance of web
servers][sf-origin]. Originally, `sendfile` required the destination
file descriptor to be a socket, but this restriction was removed in 
Linux 2.6.33.  Unfortunately, sendfile is not perfect: because it
only supports file descriptors which can be memory mapped, we 
must use a different strategy when using copying from `stdin`.

   [sf-origin]: https://yarchive.net/comp/linux/sendfile.html

```c
/* This program is released into the public domain. */
#include &lt;stdbool.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;err.h&gt;
#include &lt;errno.h&gt;
#include &lt;limits.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;unistd.h&gt;
#include &lt;sys/sendfile.h&gt;

bool spliced_copy(int srcfd)
{
	ssize_t nwritten;
	off_t offset = 0;

	do
	{
		nwritten = sendfile(STDOUT_FILENO, srcfd, &amp;offset,
				    PAGE_SIZE * 16);
		if (nwritten &lt; 0)
			return false;
	} while (nwritten &gt; 0);

	return true;
}

void copy(int srcfd)
{
	char buf[PAGE_SIZE * 16];
	size_t nread, nwritten, offset;

	while ((nread = read(srcfd, buf, sizeof buf)) &gt;= 1)
	{
		for (offset = 0; nread &gt; 0;
		     nread -= nwritten, offset += nwritten)
		{
			if ((nwritten = write(STDOUT_FILENO,
					      buf + offset, nread)) &lt;= 0)
				err(EXIT_FAILURE, &#34;write stdout&#34;);
		}
	}
}

void dumpfile(const char *path)
{
	int srcfd = STDIN_FILENO;
	char buf[PAGE_SIZE * 16];

	/* POSIX allows - to represent stdin. */
	if (*path != &#39;-&#39;)
	{
		srcfd = open(path, O_RDONLY);
		if (srcfd &lt; 0)
			err(EXIT_FAILURE, &#34;open %s&#34;, path);
	}

	/* Fall back to traditional copy if the spliced version fails. */
	if (!spliced_copy(srcfd))
		copy(srcfd);

	if (srcfd != STDIN_FILENO)
		(void) close(srcfd);
}

int main(int argc, const char *argv[])
{
	int i;
	int stdout_flags;

	stdout_flags = fcntl(STDOUT_FILENO, F_GETFL);
	if (stdout_flags &lt; 0)
		err(EXIT_FAILURE, &#34;fcntl(STDOUT_FILENO, F_GETFL)&#34;);
	stdout_flags &amp;= ~O_APPEND;
	if (fcntl(STDOUT_FILENO, F_SETFL, stdout_flags) &lt; 0)
		err(EXIT_FAILURE, &#34;fcntl(STDOUT_FILENO, F_SETFL)&#34;);

	for (i = 1; i &lt; argc; i++)
		dumpfile(argv[i]);

	return EXIT_SUCCESS;
}
```
Another approach is to use `splice` and a pipe.  This allows for true
zero-copy I/O in userspace, as a pipe is simply implemented as a 64KB
ring buffer in the kernel.  In this case, we just use two splice
operations per block of data we want to copy: one to move the data to
the pipe and another to move the data from the pipe to the output file.

```c
/* This program is released into the public domain. */
#define _GNU_SOURCE
#include &lt;stdbool.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;err.h&gt;
#include &lt;errno.h&gt;
#include &lt;limits.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;unistd.h&gt;
#include &lt;sys/sendfile.h&gt;

#define BLOCK_SIZE ((PAGE_SIZE * 16) - 1)

bool spliced_copy(int srcfd)
{
	int pipefd[2];
	ssize_t nread, nwritten;
	off_t in_offset = 0;
	bool ret = true;

	if (pipe(pipefd) &lt; 0)
		err(EXIT_FAILURE, &#34;pipe&#34;);

	do
	{
		nread = splice(srcfd, &amp;in_offset, pipefd[1], NULL,
			       BLOCK_SIZE, SPLICE_F_MOVE | SPLICE_F_MORE);
		if (nread &lt;= 0)
		{
			ret = nread &lt; 0 ? false : true;
			goto out;
		}

		nwritten = splice(pipefd[0], NULL, STDOUT_FILENO, NULL,
				  BLOCK_SIZE, SPLICE_F_MOVE | SPLICE_F_MORE);
		if (nwritten &lt; 0)
		{
			ret = false;
			goto out;
		}
	} while (nwritten &gt; 0);

out:
	close(pipefd[0]);
	close(pipefd[1]);

	return ret;
}

void copy(int srcfd)
{
	char buf[PAGE_SIZE * 16];
	size_t nread, nwritten, offset;

	while ((nread = read(srcfd, buf, sizeof buf)) &gt;= 1)
	{
		for (offset = 0; nread &gt; 0;
		     nread -= nwritten, offset += nwritten)
		{
			if ((nwritten = write(STDOUT_FILENO,
					      buf + offset, nread)) &lt;= 0)
				err(EXIT_FAILURE, &#34;write stdout&#34;);
		}
	}
}

void dumpfile(const char *path)
{
	int srcfd = STDIN_FILENO;
	char buf[PAGE_SIZE * 16];

	/* POSIX allows - to represent stdin. */
	if (*path != &#39;-&#39;)
	{
		srcfd = open(path, O_RDONLY);
		if (srcfd &lt; 0)
			err(EXIT_FAILURE, &#34;open %s&#34;, path);

		(void) posix_fadvise(srcfd, 0, 0, POSIX_FADV_SEQUENTIAL);
	}

	/* Fall back to traditional copy if the spliced version fails. */
	if (!spliced_copy(srcfd))
		copy(srcfd);

	if (srcfd != STDIN_FILENO)
		(void) close(srcfd);
}

int main(int argc, const char *argv[])
{
	int i;
	int stdout_flags;

	stdout_flags = fcntl(STDOUT_FILENO, F_GETFL);
	if (stdout_flags &lt; 0)
		err(EXIT_FAILURE, &#34;fcntl(STDOUT_FILENO, F_GETFL)&#34;);
	stdout_flags &amp;= ~O_APPEND;
	if (fcntl(STDOUT_FILENO, F_SETFL, stdout_flags) &lt; 0)
		err(EXIT_FAILURE, &#34;fcntl(STDOUT_FILENO, F_SETFL)&#34;);

	for (i = 1; i &lt; argc; i++)
		dumpfile(argv[i]);

	return EXIT_SUCCESS;
}
```
## Honorable mention: `copy_file_range`

While `copy_file_range` is not really that relevant to a `cat`
implementation, if both the source and output files are normal 
files, you can use it to get even faster performance than using
splice, as the kernel handles all of the details on its own.  An 
optimized `cat` might try this strategy and then downgrade to
`splice`, `sendfile`, and the normal `read` and `write` loop.

## Performance comparison

To measure the performance of each strategy, we can simply use
`dd` as a sink, running each cat program piped into
`dd of=/dev/null bs=64K iflag=fullblock`.

The runs in the table below are averaged across 1000 runs on a
8GB RAM Linode, using a 4GB file in tmpfs.

| Strategy                               | Throughput |
|----------------------------------------|------------|
| `cat-simple` (`read` and `write` loop) | 3.6 GB/s   |
| `cat-sendfile`                         | 6.4 GB/s   |
| `cat-splice`                           | 11.6 GB/s  |

If you are interested in using these implementations in your own
`cat` implementation, you may do so under any license terms you 
wish.
</source:markdown>
    </item>
    
    <item>
      <title>a silo can never provide digital autonomy to its users</title>
      <link>https://ariadne.space/2022/06/30/a-silo-can-never-provide.html</link>
      <pubDate>Thu, 30 Jun 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/07/01/a-silo-can-never-provide.html</guid>
      <description>&lt;p&gt;Lately there has been a lot of discussion about various silos and their activities, notably GitHub and an up and coming alternative to Tumblr called Cohost. I&amp;rsquo;d like to talk about both to make the point that silos do not, and can not elevate user freedoms, by design, even if they are run with the best of intentions, by analyzing the behavior of both of these silos.&lt;/p&gt;
&lt;p&gt;It is said that if you are not paying for a service, that you are the product. To look at this, we will start with GitHub, who have had a significant controversy over the past year with their now-commercial Copilot service. Copilot is a paid service which provides code suggestions using a neural network model that was trained using the entirety of publicly posted source code on GitHub as its corpus. As many have noted, this is likely a problem from a copyright point of view.&lt;/p&gt;
&lt;p&gt;Microsoft claims that this use of the GitHub public source code is ethically correct and legal, citing fair use as their justification for data mining the entire GitHub public source corpus. Interestingly, in the EU, there is a &amp;ldquo;text and data mining&amp;rdquo; exception to the copyright directive, &lt;a href=&#34;https://deliverypdf.ssrn.com/delivery.php?ID=380124069122109084081011069119068081059089022064027023064104069125083028119005007123033062000029047123108125065064093118008030058071007053078069071085069007101073030038014010096097074114126065017112027071084124110068123116074098119115105064007068091122&amp;amp;EXT=pdf&amp;amp;INDEX=TRUE&#34;&gt;which may provide for some precedent for this thinking&lt;/a&gt;. While the legal construction they use to justify the way they trained the Copilot model is interesting, it is important to note that we, as consumers of the GitHub service, enabled Microsoft to do this by uploading source code to their service.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s talk about &lt;a href=&#34;https://cohost.org&#34;&gt;Cohost&lt;/a&gt;, a recently launched alternative to Tumblr which is paid for by its subscribers, and promises that it will never sell out to a third party. While I think that Cohost will likely be one of the more ethically-run silos out there, it is still a silo, and like Microsoft&amp;rsquo;s GitHub, it has business interests (subscriber retention) which &lt;a href=&#34;https://techautonomy.org/&#34;&gt;place it in conflict with the goals of digital autonomy&lt;/a&gt;. Specifically, like all silos, Cohost&amp;rsquo;s platform is designed to keep users inside the Cohost platform, just as GitHub uses the network effect of its own silo to make it difficult to use anything other than GitHub for collaboration on software.&lt;/p&gt;
&lt;p&gt;Some have argued that, due to the network effects of silos, the only thing which can defeat a bad silo is a good silo. The problem with this argument is that it requires one to accept the supposition that there can be a good silo. Silos, by their very nature of being centralized services under the control of the privileged, cannot be good if you look at the power structures imposed by them. Instead, we should use our privilege to lift others up, something that commercial silos, by design, are incapable of doing.&lt;/p&gt;
&lt;p&gt;How do we do this though? One way is to embrace networks of consent. From a technical point of view, the IndieWeb people have worked on a number of simple, easy to implement protocols, which provide the ability for web services to interact openly with each other, but in a way that allows for a website owner to define policy over what content they will accept. From a social point of view, we should avoid commercial silos, such as GitHub, and use our own infrastructure, either through self-hosting or through membership to a cooperative or public society.&lt;/p&gt;
&lt;p&gt;Although I understand that both of these goals can be difficult to achieve, they make more sense than jumping from one silo to the next after they cross the line. You control where you choose to participate &amp;ndash; for me, that means I am shifting my participation so that I only participate in commercial silos when absolutely necessary. We should choose to participate in power structures which value our communal membership, rather than value our ability to generate or pay revenue.&lt;/p&gt;
</description>
      <source:markdown>
Lately there has been a lot of discussion about various silos and their activities, notably GitHub and an up and coming alternative to Tumblr called Cohost. I&#39;d like to talk about both to make the point that silos do not, and can not elevate user freedoms, by design, even if they are run with the best of intentions, by analyzing the behavior of both of these silos.

It is said that if you are not paying for a service, that you are the product. To look at this, we will start with GitHub, who have had a significant controversy over the past year with their now-commercial Copilot service. Copilot is a paid service which provides code suggestions using a neural network model that was trained using the entirety of publicly posted source code on GitHub as its corpus. As many have noted, this is likely a problem from a copyright point of view.

Microsoft claims that this use of the GitHub public source code is ethically correct and legal, citing fair use as their justification for data mining the entire GitHub public source corpus. Interestingly, in the EU, there is a &#34;text and data mining&#34; exception to the copyright directive, [which may provide for some precedent for this thinking](https://deliverypdf.ssrn.com/delivery.php?ID=380124069122109084081011069119068081059089022064027023064104069125083028119005007123033062000029047123108125065064093118008030058071007053078069071085069007101073030038014010096097074114126065017112027071084124110068123116074098119115105064007068091122&amp;EXT=pdf&amp;INDEX=TRUE). While the legal construction they use to justify the way they trained the Copilot model is interesting, it is important to note that we, as consumers of the GitHub service, enabled Microsoft to do this by uploading source code to their service.

Now let&#39;s talk about [Cohost](https://cohost.org), a recently launched alternative to Tumblr which is paid for by its subscribers, and promises that it will never sell out to a third party. While I think that Cohost will likely be one of the more ethically-run silos out there, it is still a silo, and like Microsoft&#39;s GitHub, it has business interests (subscriber retention) which [place it in conflict with the goals of digital autonomy](https://techautonomy.org/). Specifically, like all silos, Cohost&#39;s platform is designed to keep users inside the Cohost platform, just as GitHub uses the network effect of its own silo to make it difficult to use anything other than GitHub for collaboration on software.

Some have argued that, due to the network effects of silos, the only thing which can defeat a bad silo is a good silo. The problem with this argument is that it requires one to accept the supposition that there can be a good silo. Silos, by their very nature of being centralized services under the control of the privileged, cannot be good if you look at the power structures imposed by them. Instead, we should use our privilege to lift others up, something that commercial silos, by design, are incapable of doing.

How do we do this though? One way is to embrace networks of consent. From a technical point of view, the IndieWeb people have worked on a number of simple, easy to implement protocols, which provide the ability for web services to interact openly with each other, but in a way that allows for a website owner to define policy over what content they will accept. From a social point of view, we should avoid commercial silos, such as GitHub, and use our own infrastructure, either through self-hosting or through membership to a cooperative or public society.

Although I understand that both of these goals can be difficult to achieve, they make more sense than jumping from one silo to the next after they cross the line. You control where you choose to participate -- for me, that means I am shifting my participation so that I only participate in commercial silos when absolutely necessary. We should choose to participate in power structures which value our communal membership, rather than value our ability to generate or pay revenue.
</source:markdown>
    </item>
    
    <item>
      <title>it is correct to refer to GNU/Linux as GNU/Linux</title>
      <link>https://ariadne.space/2022/03/29/it-is-correct-to-refer.html</link>
      <pubDate>Tue, 29 Mar 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/03/30/it-is-correct-to-refer.html</guid>
      <description>&lt;p&gt;You&amp;rsquo;ve probably seen the &amp;ldquo;I&amp;rsquo;d like to interject for a moment&amp;rdquo; quotation that is frequently attributed to Richard Stallman about how Linux should be referred to as GNU/Linux. While I disagree with &lt;em&gt;that&lt;/em&gt; particular assertion, I do believe it is important to refer to GNU/Linux distributions as such, because GNU/Linux is a distinct operating system in the family of operating systems which use the Linux kernel, and it is technically correct to recognize this, especially as different Linux-based operating systems have different behavior, and different advantages and disadvantages.&lt;/p&gt;
&lt;p&gt;For example, besides GNU/Linux, there are the Alpine and OpenWrt ecosystems, and last but not least, Android. All of these operating systems exist outside the GNU/Linux space and have significant differences, both between GNU/Linux and also each other.&lt;/p&gt;
&lt;h2 id=&#34;what-is-gnulinux&#34;&gt;what is GNU/Linux?&lt;/h2&gt;
&lt;p&gt;I believe part of the problem which leads people to be confused about the alternative Linux ecosystems is the lack of a cogent GNU/Linux definition, in part because many GNU/Linux distributions try to downplay that they are, in fact, GNU/Linux distributions. This may be for commercial or marketing reasons, or it may be because they do not wish to be seen as associated with the FSF. Because of this, others, who are fans of the work of the FSF, tend to overreach and claim other Linux ecosystems as being part of the GNU/Linux ecosystem, which is equally harmful.&lt;/p&gt;
&lt;p&gt;It is therefore important to provide a technically accurate definition of GNU/Linux that provides actual useful meaning to consumers, so that they can understand the differences between GNU/Linux-based operating systems and other Linux-based operating systems. To that end, I believe a reasonable definition of the GNU/Linux ecosystem to be distributions which:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;use the GNU C Library (frequently referred to as glibc)&lt;/li&gt;
&lt;li&gt;use the GNU coreutils package for their base UNIX commands (such as &lt;code&gt;/bin/cat&lt;/code&gt; and so on).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a technical perspective, an easy way to check if you are on a GNU/Linux system would be to attempt to run the &lt;code&gt;/lib/libc.so.6&lt;/code&gt; command. If you are running on a GNU/Linux system, this will print the glibc version that is installed. This technical definition of GNU/Linux also provides value, because some drivers and proprietary applications, such as the nVidia proprietary graphics driver, only support GNU/Linux systems.&lt;/p&gt;
&lt;p&gt;Given this rubric, we can easily test a few popular distributions and make some conclusions about their capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Debian-based Linux distributions, including Debian itself, and also Ubuntu and elementary, meet the above preconditions and are therefore GNU/Linux distributions.&lt;/li&gt;
&lt;li&gt;Fedora and the other distributions published by Red Hat also meet the same criterion to be defined as a GNU/Linux distribution.&lt;/li&gt;
&lt;li&gt;ArchLinux also meets the above criterion, and therefore is also a GNU/Linux distribution. Indeed, the preferred distribution of the FSF, Parabola, describes itself as GNU/Linux and is derived from Arch.&lt;/li&gt;
&lt;li&gt;Alpine does not use the GNU C library, and therefore is not a GNU/Linux distribution. Compatibility with GNU/Linux programs should not be assumed. More on that in a moment.&lt;/li&gt;
&lt;li&gt;Similarly, OpenWrt is not a GNU/Linux distribution.&lt;/li&gt;
&lt;li&gt;Android is also not a GNU/Linux, nor is Replicant, despite being sponsored by the FSF.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;on-compatibility-between-distros&#34;&gt;on compatibility between distros&lt;/h2&gt;
&lt;p&gt;Even between GNU/Linux distributions, compatibility is difficult. Different GNU/Linux distributions upgrade their components at different times, and due to dynamic linking, this means that a program built against a specific set of components with a specific set of build configurations may or may not successfully run between GNU/Linux systems, but some amount of binary compatibility is otherwise possible as long as you take care to deal with that.&lt;/p&gt;
&lt;p&gt;On top of this, there is no binary compatibility between Linux ecosystems at large. GNU/Linux binaries require the gcompat compatibility framework to run on Alpine, and it generally is not possible to run OpenWrt binaries on Alpine or vice versa. The situation is the same with Android: without a compatibility tool (such as Termux), it is not possible to run binaries from other ecosystems there.&lt;/p&gt;
&lt;p&gt;Exacerbating the problem, developers also target specific APIs only available in their respective ecosystems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;systemd makes use of glibc-specific APIs, which are not part of POSIX&lt;/li&gt;
&lt;li&gt;Android makes use of bionic-specific APIs, which are not part of POSIX&lt;/li&gt;
&lt;li&gt;Alpine and OpenWrt both make use of internal frameworks, and these differ between the two ecosystems (although there are active efforts to converge both ecosystems).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a result, as a developer, it is important to note which ecosystems you are targeting, and it is important to refer to individual ecosystems, rather than saying &amp;ldquo;my program supports Linux.&amp;rdquo; There are dozens of ecosystems which make use of the Linux kernel, and it is unlikely that a program supports all of them, or that the author is even aware of them.&lt;/p&gt;
&lt;p&gt;To conclude, it is both correct and important, to refer to GNU/Linux distributions as GNU/Linux distributions. Likewise, it is important to realize that non-GNU/Linux distributions exist, and are not necessarily compatible with the GNU/Linux ecosystem for your application. Each ecosystem is distinct, with its own strengths and weaknesses.&lt;/p&gt;
</description>
      <source:markdown>
You&#39;ve probably seen the &#34;I&#39;d like to interject for a moment&#34; quotation that is frequently attributed to Richard Stallman about how Linux should be referred to as GNU/Linux. While I disagree with _that_ particular assertion, I do believe it is important to refer to GNU/Linux distributions as such, because GNU/Linux is a distinct operating system in the family of operating systems which use the Linux kernel, and it is technically correct to recognize this, especially as different Linux-based operating systems have different behavior, and different advantages and disadvantages.

For example, besides GNU/Linux, there are the Alpine and OpenWrt ecosystems, and last but not least, Android. All of these operating systems exist outside the GNU/Linux space and have significant differences, both between GNU/Linux and also each other.

## what is GNU/Linux?

I believe part of the problem which leads people to be confused about the alternative Linux ecosystems is the lack of a cogent GNU/Linux definition, in part because many GNU/Linux distributions try to downplay that they are, in fact, GNU/Linux distributions. This may be for commercial or marketing reasons, or it may be because they do not wish to be seen as associated with the FSF. Because of this, others, who are fans of the work of the FSF, tend to overreach and claim other Linux ecosystems as being part of the GNU/Linux ecosystem, which is equally harmful.

It is therefore important to provide a technically accurate definition of GNU/Linux that provides actual useful meaning to consumers, so that they can understand the differences between GNU/Linux-based operating systems and other Linux-based operating systems. To that end, I believe a reasonable definition of the GNU/Linux ecosystem to be distributions which:

- use the GNU C Library (frequently referred to as glibc)
- use the GNU coreutils package for their base UNIX commands (such as `/bin/cat` and so on).

From a technical perspective, an easy way to check if you are on a GNU/Linux system would be to attempt to run the `/lib/libc.so.6` command. If you are running on a GNU/Linux system, this will print the glibc version that is installed. This technical definition of GNU/Linux also provides value, because some drivers and proprietary applications, such as the nVidia proprietary graphics driver, only support GNU/Linux systems.

Given this rubric, we can easily test a few popular distributions and make some conclusions about their capabilities:

- Debian-based Linux distributions, including Debian itself, and also Ubuntu and elementary, meet the above preconditions and are therefore GNU/Linux distributions.
- Fedora and the other distributions published by Red Hat also meet the same criterion to be defined as a GNU/Linux distribution.
- ArchLinux also meets the above criterion, and therefore is also a GNU/Linux distribution. Indeed, the preferred distribution of the FSF, Parabola, describes itself as GNU/Linux and is derived from Arch.
- Alpine does not use the GNU C library, and therefore is not a GNU/Linux distribution. Compatibility with GNU/Linux programs should not be assumed. More on that in a moment.
- Similarly, OpenWrt is not a GNU/Linux distribution.
- Android is also not a GNU/Linux, nor is Replicant, despite being sponsored by the FSF.

## on compatibility between distros

Even between GNU/Linux distributions, compatibility is difficult. Different GNU/Linux distributions upgrade their components at different times, and due to dynamic linking, this means that a program built against a specific set of components with a specific set of build configurations may or may not successfully run between GNU/Linux systems, but some amount of binary compatibility is otherwise possible as long as you take care to deal with that.

On top of this, there is no binary compatibility between Linux ecosystems at large. GNU/Linux binaries require the gcompat compatibility framework to run on Alpine, and it generally is not possible to run OpenWrt binaries on Alpine or vice versa. The situation is the same with Android: without a compatibility tool (such as Termux), it is not possible to run binaries from other ecosystems there.

Exacerbating the problem, developers also target specific APIs only available in their respective ecosystems:

- systemd makes use of glibc-specific APIs, which are not part of POSIX
- Android makes use of bionic-specific APIs, which are not part of POSIX
- Alpine and OpenWrt both make use of internal frameworks, and these differ between the two ecosystems (although there are active efforts to converge both ecosystems).

As a result, as a developer, it is important to note which ecosystems you are targeting, and it is important to refer to individual ecosystems, rather than saying &#34;my program supports Linux.&#34; There are dozens of ecosystems which make use of the Linux kernel, and it is unlikely that a program supports all of them, or that the author is even aware of them.

To conclude, it is both correct and important, to refer to GNU/Linux distributions as GNU/Linux distributions. Likewise, it is important to realize that non-GNU/Linux distributions exist, and are not necessarily compatible with the GNU/Linux ecosystem for your application. Each ecosystem is distinct, with its own strengths and weaknesses.
</source:markdown>
    </item>
    
    <item>
      <title>the tragedy of gethostbyname</title>
      <link>https://ariadne.space/2022/03/26/the-tragedy-of-gethostbyname.html</link>
      <pubDate>Sat, 26 Mar 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/03/27/the-tragedy-of-gethostbyname.html</guid>
      <description>&lt;p&gt;A frequent complaint expressed on a certain website about Alpine is related to the deficiencies regarding the musl DNS resolver when querying large zones. In response, it is usually mentioned that applications which are expecting reliable DNS lookups should be using a dedicated DNS library for this task, not the &lt;code&gt;getaddrinfo&lt;/code&gt; or &lt;code&gt;gethostbyname&lt;/code&gt; APIs, but this is usually rebuffed by comments saying that these APIs are fine to use because they are allegedly reliable on GNU/Linux.&lt;/p&gt;
&lt;p&gt;For a number of reasons, the assertion that DNS resolution via these APIs under glibc is more reliable is false, but to understand why, we must look at the history of why a &lt;code&gt;libc&lt;/code&gt; is responsible for shipping these functions to begin with, and how these APIs evolved over the years. For instance, did you know that &lt;code&gt;gethostbyname&lt;/code&gt; originally didn&amp;rsquo;t do DNS queries at all? And, the big question: why are these APIs blocking, when DNS is inherently an asynchronous protocol?&lt;/p&gt;
&lt;p&gt;Before we get into this, it is important to again restate that if you are an application developer, and your application depends on reliable DNS performance, you must absolutely use a dedicated DNS resolver library designed for this task. There are many libraries available that are good for this purpose, such as &lt;a href=&#34;https://c-ares.org/&#34;&gt;c-ares&lt;/a&gt;, &lt;a href=&#34;https://www.gnu.org/software/adns/&#34;&gt;GNU adns&lt;/a&gt;, &lt;a href=&#34;https://skarnet.org/software/s6-dns/&#34;&gt;s6-dns&lt;/a&gt; and &lt;a href=&#34;https://github.com/OpenSMTPD/libasr&#34;&gt;OpenBSD&amp;rsquo;s libasr&lt;/a&gt;. As should hopefully become obvious at the end of this article, the DNS clients included with &lt;code&gt;libc&lt;/code&gt; are designed to provide basic functionality only, and there is no guarantee of portable behavior across client implementations.&lt;/p&gt;
&lt;h2 id=&#34;the-introduction-of-gethostbyname&#34;&gt;the introduction of &lt;code&gt;gethostbyname&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Where did &lt;code&gt;gethostbyname&lt;/code&gt; come from, anyway? Most people believe this function came from BIND, the reference DNS implementation developed by the Berkeley CSRG. In reality, it was introduced to BSD in 1982, alongside the &lt;code&gt;sethostent&lt;/code&gt; and &lt;code&gt;gethostent&lt;/code&gt; APIs. I happen to have a copy of the 4.2BSD source code, so here is the implementation from 4.2BSD, which was released in early 1983:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-c&#34; data-lang=&#34;c&#34;&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;struct&lt;/span&gt; hostent &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;
&lt;span style=&#34;color:#a6e22e&#34;&gt;gethostbyname&lt;/span&gt;(name)
	&lt;span style=&#34;color:#66d9ef&#34;&gt;register&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;name;
{
	&lt;span style=&#34;color:#66d9ef&#34;&gt;register&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;struct&lt;/span&gt; hostent &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;p;
	&lt;span style=&#34;color:#66d9ef&#34;&gt;register&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;char&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;**&lt;/span&gt;cp;

	sethostent(&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;);
	&lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; (p &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; gethostent()) {
		&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (strcmp(p&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;h_name, name) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
			&lt;span style=&#34;color:#66d9ef&#34;&gt;break&lt;/span&gt;;
		&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; (cp &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; p&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;h_aliases; &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;cp &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;; cp&lt;span style=&#34;color:#f92672&#34;&gt;++&lt;/span&gt;)
			&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (strcmp(&lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;cp, name) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
				&lt;span style=&#34;color:#66d9ef&#34;&gt;goto&lt;/span&gt; found;
	}
found:
	endhostent();
	&lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; (p);
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;As you can see, the 4.2BSD implementation only checks the &lt;code&gt;/etc/hosts&lt;/code&gt; file and nothing else. This answers the question about why &lt;code&gt;gethostbyname&lt;/code&gt; and its successor, &lt;code&gt;getaddrinfo&lt;/code&gt; do DNS queries in a blocking way: they did not want to introduce a replacement API for &lt;code&gt;gethostbyname&lt;/code&gt; that was asynchronous.&lt;/p&gt;
&lt;h2 id=&#34;the-introduction-of-dns-to-gethostbyname&#34;&gt;the introduction of DNS to &lt;code&gt;gethostbyname&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;DNS resolution was first introduced to &lt;code&gt;gethostbyname&lt;/code&gt; in 1984, when it was introduced to BSD. &lt;a href=&#34;https://github.com/dank101/4.3BSD-Reno/blob/00328b5a67ffe35e67baeba8f7ab75af79f7ae64/lib/libc/net/gethostnamadr.c#L213&#34;&gt;This version, which is too long to include here&lt;/a&gt; also translated dotted-quad IPv4 addresses into a &lt;code&gt;struct hostent&lt;/code&gt;. In essence, the 4.3BSD implementation does the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;If the requested hostname begins with a number, try to parse it as a dotted quad. If this fails, set &lt;code&gt;h_errno&lt;/code&gt; to &lt;code&gt;HOST_NOT_FOUND&lt;/code&gt; and bail. Yes, this means 4.3BSD would fail to resolve hostnames like &lt;code&gt;12-34-56-78.static.example.com&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Attempt to do a DNS query using &lt;code&gt;res_search&lt;/code&gt;. If the query was successful, return the first IP address found as the &lt;code&gt;struct hostent&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If the DNS query failed, fall back to the original &lt;code&gt;/etc/hosts&lt;/code&gt; searching algorithm above, now called &lt;code&gt;_gethtbyname&lt;/code&gt; and using &lt;code&gt;strcasecmp&lt;/code&gt; instead of &lt;code&gt;strcmp&lt;/code&gt; (for consistency with DNS).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A fixed version of this algorithm was also included with BIND&amp;rsquo;s &lt;code&gt;libresolv&lt;/code&gt; as &lt;code&gt;res_gethostbyname&lt;/code&gt;, and the &lt;code&gt;res_search&lt;/code&gt; and related functions were imported into BSD libc from BIND.&lt;/p&gt;
&lt;h2 id=&#34;standardization-of-gethostbyname-in-posix&#34;&gt;standardization of &lt;code&gt;gethostbyname&lt;/code&gt; in POSIX&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;gethostbyname&lt;/code&gt; and &lt;code&gt;getaddrinfo&lt;/code&gt; APIs were first standardized in X/Open Networking Services Issue 4 (commonly referred to as XNS4) specification, which itself was part of the X/Open Single Unix Specification version 3 (commonly referred to as SUSv3), released in 1995. Of note, X/Open tried to deprecate &lt;code&gt;gethostbyname&lt;/code&gt; in favor of &lt;code&gt;getaddrinfo&lt;/code&gt; as part of the XNS5 specification, &lt;a href=&#34;https://pubs.opengroup.org/onlinepubs/009619199/netdbh.htm#tagcjh_06_02&#34;&gt;removing it entirely except for a mention in their specification for &lt;code&gt;netdb.h&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Later, it returned &lt;a href=&#34;https://pubs.opengroup.org/onlinepubs/009696799/functions/gethostbyaddr.html&#34;&gt;as part of POSIX issue 6, released in 2004&lt;/a&gt;. That version says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; In many cases it is implemented by the Domain Name System, as documented in RFC 1034, RFC 1035, and RFC 1886.&lt;/p&gt;
&lt;p&gt;POSIX issue 6, IEEE 1003.1:2004.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Oh no, what is this about, and do application developers need to care about it? Very simply, it is about the &lt;a href=&#34;https://en.wikipedia.org/wiki/Name_Service_Switch&#34;&gt;Name Service Switch&lt;/a&gt;, frequently referred to as NSS, which allows the &lt;code&gt;gethostbyname&lt;/code&gt; function to have hotpluggable implementations. The Name Service Switch was a feature introduced to Solaris, which was implemented to allow support for Sun&amp;rsquo;s NIS+ directory service.&lt;/p&gt;
&lt;p&gt;As developers of other operating systems wanted to support software like Kerberos and LDAP, it quickly was reimplemented in other systems as well, such as GNU/Linux. These days, systems running systemd frequently use this feature in combination with a custom NSS module named &lt;code&gt;nss-systemd&lt;/code&gt; to force use of &lt;code&gt;systemd-resolved&lt;/code&gt; as the DNS resolver, which has different behavior than the original DNS client derived from BIND that ships in most &lt;code&gt;libc&lt;/code&gt; implementations.&lt;/p&gt;
&lt;p&gt;An administrator can disable support for DNS lookups entirely, simply by editing the &lt;code&gt;/etc/nsswitch.conf&lt;/code&gt; file and removing the &lt;code&gt;dns&lt;/code&gt; module, which means application developers depending on reliable DNS service need to care a lot about this: it means on systems with NSS, your application cannot depend on &lt;code&gt;gethostbyname&lt;/code&gt; to actually support DNS at all.&lt;/p&gt;
&lt;h2 id=&#34;musl-and-dns&#34;&gt;musl and DNS&lt;/h2&gt;
&lt;p&gt;Given the background above, it should be obvious by now that musl&amp;rsquo;s DNS client was written under the assumption that applications that have specific requirements for DNS would be using a specialized library for this purpose, as &lt;code&gt;gethostbyname&lt;/code&gt; and &lt;code&gt;getaddrinfo&lt;/code&gt; are not really suitable APIs, since their behavior is entirely implementation-defined and largely focused around blocking queries to a directory service.&lt;/p&gt;
&lt;p&gt;Because of this, the DNS client was written to behave as simply as possible, but the use of DNS for bulk data distribution, such as in DNSSEC, DKIM and other applications, have led to a desire to implement support for DNS over TCP as an extension to the musl DNS client.&lt;/p&gt;
&lt;p&gt;In practice, this will fix the remaining complaints about the musl DNS client once it lands in a musl release, but application authors depending on reliable DNS performance should really use a dedicated DNS client library for that purpose: using APIs that were designed to simply parse &lt;code&gt;/etc/hosts&lt;/code&gt; and had DNS support shoehorned into them will always deliver unreliable results.&lt;/p&gt;
</description>
      <source:markdown>
A frequent complaint expressed on a certain website about Alpine is related to the deficiencies regarding the musl DNS resolver when querying large zones. In response, it is usually mentioned that applications which are expecting reliable DNS lookups should be using a dedicated DNS library for this task, not the `getaddrinfo` or `gethostbyname` APIs, but this is usually rebuffed by comments saying that these APIs are fine to use because they are allegedly reliable on GNU/Linux.

For a number of reasons, the assertion that DNS resolution via these APIs under glibc is more reliable is false, but to understand why, we must look at the history of why a `libc` is responsible for shipping these functions to begin with, and how these APIs evolved over the years. For instance, did you know that `gethostbyname` originally didn&#39;t do DNS queries at all? And, the big question: why are these APIs blocking, when DNS is inherently an asynchronous protocol?

Before we get into this, it is important to again restate that if you are an application developer, and your application depends on reliable DNS performance, you must absolutely use a dedicated DNS resolver library designed for this task. There are many libraries available that are good for this purpose, such as [c-ares](https://c-ares.org/), [GNU adns](https://www.gnu.org/software/adns/), [s6-dns](https://skarnet.org/software/s6-dns/) and [OpenBSD&#39;s libasr](https://github.com/OpenSMTPD/libasr). As should hopefully become obvious at the end of this article, the DNS clients included with `libc` are designed to provide basic functionality only, and there is no guarantee of portable behavior across client implementations.

## the introduction of `gethostbyname`

Where did `gethostbyname` come from, anyway? Most people believe this function came from BIND, the reference DNS implementation developed by the Berkeley CSRG. In reality, it was introduced to BSD in 1982, alongside the `sethostent` and `gethostent` APIs. I happen to have a copy of the 4.2BSD source code, so here is the implementation from 4.2BSD, which was released in early 1983:

```c
struct hostent *
gethostbyname(name)
	register char *name;
{
	register struct hostent *p;
	register char **cp;

	sethostent(0);
	while (p = gethostent()) {
		if (strcmp(p-&gt;h_name, name) == 0)
			break;
		for (cp = p-&gt;h_aliases; *cp != 0; cp++)
			if (strcmp(*cp, name) == 0)
				goto found;
	}
found:
	endhostent();
	return (p);
}
```

As you can see, the 4.2BSD implementation only checks the `/etc/hosts` file and nothing else. This answers the question about why `gethostbyname` and its successor, `getaddrinfo` do DNS queries in a blocking way: they did not want to introduce a replacement API for `gethostbyname` that was asynchronous.

## the introduction of DNS to `gethostbyname`

DNS resolution was first introduced to `gethostbyname` in 1984, when it was introduced to BSD. [This version, which is too long to include here](https://github.com/dank101/4.3BSD-Reno/blob/00328b5a67ffe35e67baeba8f7ab75af79f7ae64/lib/libc/net/gethostnamadr.c#L213) also translated dotted-quad IPv4 addresses into a `struct hostent`. In essence, the 4.3BSD implementation does the following:

1. If the requested hostname begins with a number, try to parse it as a dotted quad. If this fails, set `h_errno` to `HOST_NOT_FOUND` and bail. Yes, this means 4.3BSD would fail to resolve hostnames like `12-34-56-78.static.example.com`.
2. Attempt to do a DNS query using `res_search`. If the query was successful, return the first IP address found as the `struct hostent`.
3. If the DNS query failed, fall back to the original `/etc/hosts` searching algorithm above, now called `_gethtbyname` and using `strcasecmp` instead of `strcmp` (for consistency with DNS).

A fixed version of this algorithm was also included with BIND&#39;s `libresolv` as `res_gethostbyname`, and the `res_search` and related functions were imported into BSD libc from BIND.

## standardization of `gethostbyname` in POSIX

The `gethostbyname` and `getaddrinfo` APIs were first standardized in X/Open Networking Services Issue 4 (commonly referred to as XNS4) specification, which itself was part of the X/Open Single Unix Specification version 3 (commonly referred to as SUSv3), released in 1995. Of note, X/Open tried to deprecate `gethostbyname` in favor of `getaddrinfo` as part of the XNS5 specification, [removing it entirely except for a mention in their specification for `netdb.h`](https://pubs.opengroup.org/onlinepubs/009619199/netdbh.htm#tagcjh_06_02).

Later, it returned [as part of POSIX issue 6, released in 2004](https://pubs.opengroup.org/onlinepubs/009696799/functions/gethostbyaddr.html). That version says:

&gt; **Note:** In many cases it is implemented by the Domain Name System, as documented in RFC 1034, RFC 1035, and RFC 1886.
&gt; 
&gt; POSIX issue 6, IEEE 1003.1:2004.

Oh no, what is this about, and do application developers need to care about it? Very simply, it is about the [Name Service Switch](https://en.wikipedia.org/wiki/Name_Service_Switch), frequently referred to as NSS, which allows the `gethostbyname` function to have hotpluggable implementations. The Name Service Switch was a feature introduced to Solaris, which was implemented to allow support for Sun&#39;s NIS+ directory service.

As developers of other operating systems wanted to support software like Kerberos and LDAP, it quickly was reimplemented in other systems as well, such as GNU/Linux. These days, systems running systemd frequently use this feature in combination with a custom NSS module named `nss-systemd` to force use of `systemd-resolved` as the DNS resolver, which has different behavior than the original DNS client derived from BIND that ships in most `libc` implementations.

An administrator can disable support for DNS lookups entirely, simply by editing the `/etc/nsswitch.conf` file and removing the `dns` module, which means application developers depending on reliable DNS service need to care a lot about this: it means on systems with NSS, your application cannot depend on `gethostbyname` to actually support DNS at all.

## musl and DNS

Given the background above, it should be obvious by now that musl&#39;s DNS client was written under the assumption that applications that have specific requirements for DNS would be using a specialized library for this purpose, as `gethostbyname` and `getaddrinfo` are not really suitable APIs, since their behavior is entirely implementation-defined and largely focused around blocking queries to a directory service.

Because of this, the DNS client was written to behave as simply as possible, but the use of DNS for bulk data distribution, such as in DNSSEC, DKIM and other applications, have led to a desire to implement support for DNS over TCP as an extension to the musl DNS client.

In practice, this will fix the remaining complaints about the musl DNS client once it lands in a musl release, but application authors depending on reliable DNS performance should really use a dedicated DNS client library for that purpose: using APIs that were designed to simply parse `/etc/hosts` and had DNS support shoehorned into them will always deliver unreliable results.
</source:markdown>
    </item>
    
    <item>
      <title>how to refresh older stuffed animals</title>
      <link>https://ariadne.space/2022/02/11/how-to-refresh-older-stuffed.html</link>
      <pubDate>Fri, 11 Feb 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/02/12/how-to-refresh-older-stuffed.html</guid>
      <description>&lt;p&gt;As many of my readers are likely aware, I have a large collection of stuffed animals, but my favorite one is the first generation Jellycat Bashful Bunny that I have had for the past 10 years or so. Recently I noticed that my bunny was starting to turn purple, likely from the purple stain that is applied to my hair, which bleeds onto anything when given the opportunity to do so. As Jellycat no longer makes the first generation bashfuls (they have been replaced with a second generation that uses a different fabric), I decided that my bunny needed to be refreshed, and as there is not really any good documentation on how to clean a high-end stuffed animal, I figured I would write a blog on it.&lt;/p&gt;
&lt;h2 id=&#34;understanding-what-youre-dealing-with&#34;&gt;understanding what you&amp;rsquo;re dealing with&lt;/h2&gt;
&lt;p&gt;What the stuffed animal is made out of is important to know about before coming up with a strategy to refresh it. If the stuffed animal has plastic pellets to help it sit right (which the Jellycat Bashfuls do), then you need to use lower temperatures to ensure the pellets don&amp;rsquo;t melt. If there are glued on components (as is frequently the case with lower-end stuffed animals), forget about trying this and just buy a new one.&lt;/p&gt;
&lt;p&gt;If the stuffed animal has vibrant colors, you should probably avoid using detergent, or, at the very least, you should use less detergent than you would normally. These vibrant colors are created by staining white fabric, rather than dyeing it, in other words, the pigment is sitting on the surface of the fabric, rather than being part of the fabric itself. As with plastic components, you should use lower temperatures too, as the pigment used in these stains tends to wash away if the temperature is warm enough (around 40 celsius or so).&lt;/p&gt;
&lt;h2 id=&#34;the-washing-process&#34;&gt;the washing process&lt;/h2&gt;
&lt;p&gt;Ultimately I decided to play it safe and wash my stuffed bunny with cold water, some fabric softener and a tide pod. However, the spin cycle was quite concerning to me, as it spins quite fast and with a lot of force. To ensure that the bunny was not harmed by the spin cycle, I put him in a pillowcase and tied the end of it. Put the washing machine on the delicate program to ensure it spends the least amount of time in the spin cycle as possible. Also, I would not recommend washing a stuffed animal with other laundry.&lt;/p&gt;
&lt;p&gt;Come back in 30 minutes after the program completes, and put the stuffed animal in the dryer. You should remove the stuffed animal from the pillowcase at this time and dry both the animal and the pillowcase separately. Put the dryer on the delicate program again, and be prepared to run it through multiple cycles. In the case of my bunny, it took a total of two 45 minute cycles to completely dry.&lt;/p&gt;
&lt;p&gt;Once done, your stuffed animal should be back to its usual self, and with the tumble drying, it will likely be a little bit fuzzier than it was before, kind of like it came from the factory.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://twitter.com/ariadneconill/status/1492417671966511110&#34;&gt;twitter.com/ariadneco&amp;hellip;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bonus content: 1 minute of a tumbling bunny.&lt;/p&gt;
</description>
      <source:markdown>
As many of my readers are likely aware, I have a large collection of stuffed animals, but my favorite one is the first generation Jellycat Bashful Bunny that I have had for the past 10 years or so. Recently I noticed that my bunny was starting to turn purple, likely from the purple stain that is applied to my hair, which bleeds onto anything when given the opportunity to do so. As Jellycat no longer makes the first generation bashfuls (they have been replaced with a second generation that uses a different fabric), I decided that my bunny needed to be refreshed, and as there is not really any good documentation on how to clean a high-end stuffed animal, I figured I would write a blog on it.

## understanding what you&#39;re dealing with

What the stuffed animal is made out of is important to know about before coming up with a strategy to refresh it. If the stuffed animal has plastic pellets to help it sit right (which the Jellycat Bashfuls do), then you need to use lower temperatures to ensure the pellets don&#39;t melt. If there are glued on components (as is frequently the case with lower-end stuffed animals), forget about trying this and just buy a new one.

If the stuffed animal has vibrant colors, you should probably avoid using detergent, or, at the very least, you should use less detergent than you would normally. These vibrant colors are created by staining white fabric, rather than dyeing it, in other words, the pigment is sitting on the surface of the fabric, rather than being part of the fabric itself. As with plastic components, you should use lower temperatures too, as the pigment used in these stains tends to wash away if the temperature is warm enough (around 40 celsius or so).

## the washing process

Ultimately I decided to play it safe and wash my stuffed bunny with cold water, some fabric softener and a tide pod. However, the spin cycle was quite concerning to me, as it spins quite fast and with a lot of force. To ensure that the bunny was not harmed by the spin cycle, I put him in a pillowcase and tied the end of it. Put the washing machine on the delicate program to ensure it spends the least amount of time in the spin cycle as possible. Also, I would not recommend washing a stuffed animal with other laundry.

Come back in 30 minutes after the program completes, and put the stuffed animal in the dryer. You should remove the stuffed animal from the pillowcase at this time and dry both the animal and the pillowcase separately. Put the dryer on the delicate program again, and be prepared to run it through multiple cycles. In the case of my bunny, it took a total of two 45 minute cycles to completely dry.

Once done, your stuffed animal should be back to its usual self, and with the tumble drying, it will likely be a little bit fuzzier than it was before, kind of like it came from the factory.

[twitter.com/ariadneco...](https://twitter.com/ariadneconill/status/1492417671966511110)

Bonus content: 1 minute of a tumbling bunny.
</source:markdown>
    </item>
    
    <item>
      <title>JSON-LD is ideal for Cloud Native technologies</title>
      <link>https://ariadne.space/2022/02/10/jsonld-is-ideal-for-cloud.html</link>
      <pubDate>Thu, 10 Feb 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/02/11/jsonld-is-ideal-for-cloud.html</guid>
      <description>&lt;p&gt;Frequently I have been told by developers that it is impossible to have extensible JSON documents underpinning their projects, because there may be collisions later. For those of us who are unaware of more capable graph serializations such as JSON-LD and Turtle, this seems like a reasonable position. Accordingly, I would like to introduce you all to JSON-LD, using a practical real-world deployment as an example, as well as how one might use JSON-LD to extend something like OCI container manifests.&lt;/p&gt;
&lt;p&gt;You might feel compelled to look up JSON-LD on Google before continuing with reading this. My suggestion is to not do that, because &lt;a href=&#34;https://json-ld.org/&#34;&gt;the JSON-LD website&lt;/a&gt; is really aimed towards web developers, and this explanation will hopefully explain how a systems engineer can make use of JSON-LD graphs in practical terms. And, if it doesn&amp;rsquo;t, feel free to DM me on Twitter or something.&lt;/p&gt;
&lt;h2 id=&#34;what-json-ld-can-do-for-you&#34;&gt;what JSON-LD can do for you&lt;/h2&gt;
&lt;p&gt;Have you ever wanted any of the following in the scenarios where you use JSON:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Conflict-free extensibility&lt;/li&gt;
&lt;li&gt;Strong typing&lt;/li&gt;
&lt;li&gt;Compatibility with the RDF ecosystem (e.g. XQuery, SPARQL, etc)&lt;/li&gt;
&lt;li&gt;Self-describing schemas&lt;/li&gt;
&lt;li&gt;Transparent document inclusion&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you answered yes to any of these, then JSON-LD is for you. Some of these capabilities are also provided by the IETF&amp;rsquo;s &lt;a href=&#34;http://json-schema.org/&#34;&gt;JSON Schema project&lt;/a&gt;, but it has a much higher learning curve than JSON-LD.&lt;/p&gt;
&lt;p&gt;This post will be primarily focused on how namespaces and aliases can be used to provide extensibility while also providing backwards compatibility for clients that are not JSON-LD aware. In general, I believe strongly that any open standard built on JSON should actually be built on JSON-LD, and hopefully my examples will demonstrate why I believe this.&lt;/p&gt;
&lt;h2 id=&#34;activitypub-a-real-world-case-study&#34;&gt;ActivityPub: a real-world case study&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://www.w3.org/TR/activitypub&#34;&gt;ActivityPub is a protocol&lt;/a&gt; that is used on the federated social web (thankfully entirely unrelated to Web3), that is built on the ActivityStreams 2.0 specification. Both ActivityPub and ActivityStreams are RDF vocabularies that are represented as JSON-LD documents, but you don&amp;rsquo;t really need to know or care about this part.&lt;/p&gt;
&lt;p&gt;This is a very simplified representation of an ActivityPub actor object:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;{
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@context&amp;#34;&lt;/span&gt;: [
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.w3.org/ns/activitystreams&amp;#34;&lt;/span&gt;,
    {
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;alsoKnownAs&amp;#34;&lt;/span&gt;: {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;as:alsoKnownAs&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;
      },
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;sec&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://w3id.org/security#&amp;#34;&lt;/span&gt;,
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;owner&amp;#34;&lt;/span&gt;: {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sec:owner&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;
      },
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;publicKey&amp;#34;&lt;/span&gt;: {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sec:publicKey&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;
      },
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;publicKeyPem&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sec:publicKeyPem&amp;#34;&lt;/span&gt;,
    }
  ],
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;alsoKnownAs&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://corp.example.org/~alice&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.example.com/~alice&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;inbox&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.example.com/~alice/inbox&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;name&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Alice&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Person&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;publicKey&amp;#34;&lt;/span&gt;: {
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.example.com/~alice#key&amp;#34;&lt;/span&gt;,
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;owner&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.example.com/~alice&amp;#34;&lt;/span&gt;,
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;publicKeyPem&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;...&amp;#34;&lt;/span&gt;
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Pay attention to the &lt;code&gt;@context&lt;/code&gt; variable here, it is doing a few things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It pulls in the entire ActivityStreams and ActivityPub vocabularies by reference. These can be downloaded on the fly or bundled with the application using context preloading.&lt;/li&gt;
&lt;li&gt;It then defines a few terms outside of those vocabularies: &lt;code&gt;alsoKnownAs&lt;/code&gt;, &lt;code&gt;sec&lt;/code&gt;, &lt;code&gt;owner&lt;/code&gt;, &lt;code&gt;publicKey&lt;/code&gt; and &lt;code&gt;publicKeyPem&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When an application that is JSON-LD aware parses this document, it will receive a document that looks like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;{
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@context&amp;#34;&lt;/span&gt;: [
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.w3.org/ns/activitystreams&amp;#34;&lt;/span&gt;,
    {
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;alsoKnownAs&amp;#34;&lt;/span&gt;: {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;as:alsoKnownAs&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;
      },
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;sec&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://w3id.org/security#&amp;#34;&lt;/span&gt;,
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;owner&amp;#34;&lt;/span&gt;: {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sec:owner&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;
      },
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;publicKey&amp;#34;&lt;/span&gt;: {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sec:publicKey&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;
      },
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;publicKeyPem&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sec:publicKeyPem&amp;#34;&lt;/span&gt;,
    }
  ],
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.example.com/~alice&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Person&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;as:alsoKnownAs&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://corp.example.org/~alice&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;as:inbox&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.example.com/~alice/inbox&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;as:name&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Alice&amp;#34;&lt;/span&gt;,
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;sec:publicKey&amp;#34;&lt;/span&gt;: {
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.example.com/~alice#key&amp;#34;&lt;/span&gt;,
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;sec:owner&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.example.com/~alice&amp;#34;&lt;/span&gt;,
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;sec:publicKeyPem&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;...&amp;#34;&lt;/span&gt;
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This allows extensions to interoperate with minimal conflicts, as the application is operating on a normalized version of the document that has as many things namespaced as possible, without the user having to worry about it. This allows a parser to easily ignore things it does not know about, as they aren&amp;rsquo;t defined in the context (which does not actually have to be defined, you can preload a root context), and so they aren&amp;rsquo;t placed in a namespace.&lt;/p&gt;
&lt;p&gt;In other words, that &lt;code&gt;@context&lt;/code&gt; variable can be built into the application, or stored in an S3 bucket somewhere, or whatever you want to do. If you are planning to have an interoperable protocol, however, providing a useful &lt;code&gt;@context&lt;/code&gt; is crucial.&lt;/p&gt;
&lt;h2 id=&#34;how-oci-image-manifests-could-benefit-from-json-ld&#34;&gt;How OCI image manifests could benefit from JSON-LD&lt;/h2&gt;
&lt;p&gt;There was a discussion on Twitter this evening about how extending the OCI image spec with signature references has taken a year. If OCI used JSON-LD (ironically, its JSON vocabulary is already similar to several pre-existing JSON-LD ones), then implementations could just store the pre-existing metadata, mapped to a namespace. In the case of an OCI image, this might look something like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;{
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@context&amp;#34;&lt;/span&gt;: [
    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://opencontainers.org/ns&amp;#34;&lt;/span&gt;,
    {
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;sigstore&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://sigstore.dev/ns&amp;#34;&lt;/span&gt;,
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;reference&amp;#34;&lt;/span&gt;: {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@type&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sigstore:reference&amp;#34;&lt;/span&gt;
      }
    }
  ],
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;config&amp;#34;&lt;/span&gt;: {
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;mediaType&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;application/vnd.oci.image.config.v1+json&amp;#34;&lt;/span&gt;,
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;digest&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sha256:d539cd357acb4a6df2a4ef99db5fe70714458349232dad0ec73e1ed65f6a0e13&amp;#34;&lt;/span&gt;,
    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;size&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;585&lt;/span&gt;
  },
  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;layers&amp;#34;&lt;/span&gt;: [
    {
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;mediaType&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;application/vnd.oci.image.layer.v1.tar+gzip&amp;#34;&lt;/span&gt;,
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;digest&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sha256:59bf1c3509f33515622619af21ed55bbe26d24913cedbca106468a5fb37a50c3&amp;#34;&lt;/span&gt;,
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;size&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;2818413&lt;/span&gt;
    },
    {
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;mediaType&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;application/vnd.example.signature+json&amp;#34;&lt;/span&gt;,
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;size&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;3514&lt;/span&gt;,
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;digest&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17&amp;#34;&lt;/span&gt;,
      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;reference&amp;#34;&lt;/span&gt;: {
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;mediaType&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;application/vnd.oci.image.layer.v1.tar+gzip&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;digest&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;sha256:59bf1c3509f33515622619af21ed55bbe26d24913cedbca106468a5fb37a50c3&amp;#34;&lt;/span&gt;,
        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;size&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;2818413&lt;/span&gt;
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The differences are minimal from a current OCI image manifest. Namely, &lt;code&gt;schemaVersion&lt;/code&gt; has been deleted, because JSON-LD handles this detail automatically, and the signature reference extension has been added as the &lt;code&gt;sigstore:reference&lt;/code&gt; property. Hopefully you can imagine how the rest of the document looks namespace wise.&lt;/p&gt;
&lt;p&gt;One last thing about this example. You might notice that I am using URIs when I define namespaces in the &lt;code&gt;@context&lt;/code&gt;. This is a great feature of the RDF ecosystem: you can put up a webpage at those URIs defining how to make use of the terms defined in the namespace, meaning that JSON-LD tooling can have rich documentation built in.&lt;/p&gt;
&lt;p&gt;Also, since I am well aware that basically all of these OCI tools are written in Go, it should be noted that Go has an &lt;a href=&#34;https://pkg.go.dev/github.com/go-ap/jsonld&#34;&gt;excellent implementation of JSON-LD&lt;/a&gt;, and for those concerned that W3C proposals are sometimes not in touch with reality, the creator of JSON-LD has &lt;a href=&#34;http://manu.sporny.org/2014/json-ld-origins-2/&#34;&gt;some words about it that are interesting&lt;/a&gt;. Now, please, use JSON-LD and stop worrying about extensibility in open technology, this problem is totally solved.&lt;/p&gt;
</description>
      <source:markdown>
Frequently I have been told by developers that it is impossible to have extensible JSON documents underpinning their projects, because there may be collisions later. For those of us who are unaware of more capable graph serializations such as JSON-LD and Turtle, this seems like a reasonable position. Accordingly, I would like to introduce you all to JSON-LD, using a practical real-world deployment as an example, as well as how one might use JSON-LD to extend something like OCI container manifests.

You might feel compelled to look up JSON-LD on Google before continuing with reading this. My suggestion is to not do that, because [the JSON-LD website](https://json-ld.org/) is really aimed towards web developers, and this explanation will hopefully explain how a systems engineer can make use of JSON-LD graphs in practical terms. And, if it doesn&#39;t, feel free to DM me on Twitter or something.

## what JSON-LD can do for you

Have you ever wanted any of the following in the scenarios where you use JSON:

- Conflict-free extensibility
- Strong typing
- Compatibility with the RDF ecosystem (e.g. XQuery, SPARQL, etc)
- Self-describing schemas
- Transparent document inclusion

If you answered yes to any of these, then JSON-LD is for you. Some of these capabilities are also provided by the IETF&#39;s [JSON Schema project](http://json-schema.org/), but it has a much higher learning curve than JSON-LD.

This post will be primarily focused on how namespaces and aliases can be used to provide extensibility while also providing backwards compatibility for clients that are not JSON-LD aware. In general, I believe strongly that any open standard built on JSON should actually be built on JSON-LD, and hopefully my examples will demonstrate why I believe this.

## ActivityPub: a real-world case study

[ActivityPub is a protocol](https://www.w3.org/TR/activitypub) that is used on the federated social web (thankfully entirely unrelated to Web3), that is built on the ActivityStreams 2.0 specification. Both ActivityPub and ActivityStreams are RDF vocabularies that are represented as JSON-LD documents, but you don&#39;t really need to know or care about this part.

This is a very simplified representation of an ActivityPub actor object:

```json
{
  &#34;@context&#34;: [
    &#34;https://www.w3.org/ns/activitystreams&#34;,
    {
      &#34;alsoKnownAs&#34;: {
        &#34;@id&#34;: &#34;as:alsoKnownAs&#34;,
        &#34;@type&#34;: &#34;@id&#34;
      },
      &#34;sec&#34;: &#34;https://w3id.org/security#&#34;,
      &#34;owner&#34;: {
        &#34;@id&#34;: &#34;sec:owner&#34;,
        &#34;@type&#34;: &#34;@id&#34;
      },
      &#34;publicKey&#34;: {
        &#34;@id&#34;: &#34;sec:publicKey&#34;,
        &#34;@type&#34;: &#34;@id&#34;
      },
      &#34;publicKeyPem&#34;: &#34;sec:publicKeyPem&#34;,
    }
  ],
  &#34;alsoKnownAs&#34;: &#34;https://corp.example.org/~alice&#34;,
  &#34;id&#34;: &#34;https://www.example.com/~alice&#34;,
  &#34;inbox&#34;: &#34;https://www.example.com/~alice/inbox&#34;,
  &#34;name&#34;: &#34;Alice&#34;,
  &#34;type&#34;: &#34;Person&#34;,
  &#34;publicKey&#34;: {
    &#34;id&#34;: &#34;https://www.example.com/~alice#key&#34;,
    &#34;owner&#34;: &#34;https://www.example.com/~alice&#34;,
    &#34;publicKeyPem&#34;: &#34;...&#34;
  }
}
```

Pay attention to the `@context` variable here, it is doing a few things:

1. It pulls in the entire ActivityStreams and ActivityPub vocabularies by reference. These can be downloaded on the fly or bundled with the application using context preloading.
2. It then defines a few terms outside of those vocabularies: `alsoKnownAs`, `sec`, `owner`, `publicKey` and `publicKeyPem`.

When an application that is JSON-LD aware parses this document, it will receive a document that looks like this:

```json
{
  &#34;@context&#34;: [
    &#34;https://www.w3.org/ns/activitystreams&#34;,
    {
      &#34;alsoKnownAs&#34;: {
        &#34;@id&#34;: &#34;as:alsoKnownAs&#34;,
        &#34;@type&#34;: &#34;@id&#34;
      },
      &#34;sec&#34;: &#34;https://w3id.org/security#&#34;,
      &#34;owner&#34;: {
        &#34;@id&#34;: &#34;sec:owner&#34;,
        &#34;@type&#34;: &#34;@id&#34;
      },
      &#34;publicKey&#34;: {
        &#34;@id&#34;: &#34;sec:publicKey&#34;,
        &#34;@type&#34;: &#34;@id&#34;
      },
      &#34;publicKeyPem&#34;: &#34;sec:publicKeyPem&#34;,
    }
  ],
  &#34;@id&#34;: &#34;https://www.example.com/~alice&#34;,
  &#34;@type&#34;: &#34;Person&#34;,
  &#34;as:alsoKnownAs&#34;: &#34;https://corp.example.org/~alice&#34;,
  &#34;as:inbox&#34;: &#34;https://www.example.com/~alice/inbox&#34;,
  &#34;as:name&#34;: &#34;Alice&#34;,
  &#34;sec:publicKey&#34;: {
    &#34;@id&#34;: &#34;https://www.example.com/~alice#key&#34;,
    &#34;sec:owner&#34;: &#34;https://www.example.com/~alice&#34;,
    &#34;sec:publicKeyPem&#34;: &#34;...&#34;
  }
}
```

This allows extensions to interoperate with minimal conflicts, as the application is operating on a normalized version of the document that has as many things namespaced as possible, without the user having to worry about it. This allows a parser to easily ignore things it does not know about, as they aren&#39;t defined in the context (which does not actually have to be defined, you can preload a root context), and so they aren&#39;t placed in a namespace.

In other words, that `@context` variable can be built into the application, or stored in an S3 bucket somewhere, or whatever you want to do. If you are planning to have an interoperable protocol, however, providing a useful `@context` is crucial.

## How OCI image manifests could benefit from JSON-LD

There was a discussion on Twitter this evening about how extending the OCI image spec with signature references has taken a year. If OCI used JSON-LD (ironically, its JSON vocabulary is already similar to several pre-existing JSON-LD ones), then implementations could just store the pre-existing metadata, mapped to a namespace. In the case of an OCI image, this might look something like:

```json
{
  &#34;@context&#34;: [
    &#34;https://opencontainers.org/ns&#34;,
    {
      &#34;sigstore&#34;: &#34;https://sigstore.dev/ns&#34;,
      &#34;reference&#34;: {
        &#34;@type&#34;: &#34;@id&#34;,
        &#34;@id&#34;: &#34;sigstore:reference&#34;
      }
    }
  ],
  &#34;config&#34;: {
    &#34;mediaType&#34;: &#34;application/vnd.oci.image.config.v1+json&#34;,
    &#34;digest&#34;: &#34;sha256:d539cd357acb4a6df2a4ef99db5fe70714458349232dad0ec73e1ed65f6a0e13&#34;,
    &#34;size&#34;: 585
  },
  &#34;layers&#34;: [
    {
      &#34;mediaType&#34;: &#34;application/vnd.oci.image.layer.v1.tar+gzip&#34;,
      &#34;digest&#34;: &#34;sha256:59bf1c3509f33515622619af21ed55bbe26d24913cedbca106468a5fb37a50c3&#34;,
      &#34;size&#34;: 2818413
    },
    {
      &#34;mediaType&#34;: &#34;application/vnd.example.signature+json&#34;,
      &#34;size&#34;: 3514,
      &#34;digest&#34;: &#34;sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17&#34;,
      &#34;reference&#34;: {
        &#34;mediaType&#34;: &#34;application/vnd.oci.image.layer.v1.tar+gzip&#34;,
        &#34;digest&#34;: &#34;sha256:59bf1c3509f33515622619af21ed55bbe26d24913cedbca106468a5fb37a50c3&#34;,
        &#34;size&#34;: 2818413
      }
    }
  ]
}
```

The differences are minimal from a current OCI image manifest. Namely, `schemaVersion` has been deleted, because JSON-LD handles this detail automatically, and the signature reference extension has been added as the `sigstore:reference` property. Hopefully you can imagine how the rest of the document looks namespace wise.

One last thing about this example. You might notice that I am using URIs when I define namespaces in the `@context`. This is a great feature of the RDF ecosystem: you can put up a webpage at those URIs defining how to make use of the terms defined in the namespace, meaning that JSON-LD tooling can have rich documentation built in.

Also, since I am well aware that basically all of these OCI tools are written in Go, it should be noted that Go has an [excellent implementation of JSON-LD](https://pkg.go.dev/github.com/go-ap/jsonld), and for those concerned that W3C proposals are sometimes not in touch with reality, the creator of JSON-LD has [some words about it that are interesting](http://manu.sporny.org/2014/json-ld-origins-2/). Now, please, use JSON-LD and stop worrying about extensibility in open technology, this problem is totally solved.
</source:markdown>
    </item>
    
    <item>
      <title>how I wound up causing a major outage of my services and destroying my home directory by accident</title>
      <link>https://ariadne.space/2022/02/03/how-i-wound-up-causing.html</link>
      <pubDate>Thu, 03 Feb 2022 17:00:00 -0700</pubDate>
      
      <guid>http://ariadne.micro.blog/2022/02/04/how-i-wound-up-causing.html</guid>
      <description>&lt;p&gt;As a result of my FOSS maintenance and activism work, I have a significant IT footprint, to support the services and development environments needed to facilitate everything I do. Unfortunately, I am also my own system administrator, and I am quite terrible at this. This is a story about how I wound up knocking most of my services offline and wiping out my home directory, because of a combination of Linux mdraid bugs and a faulty SSD. Hopefully this will be helpful to somebody in the future, but if not, you can at least share in some catharsis.&lt;/p&gt;
&lt;h2 id=&#34;a-brief-overview-of-the-setup&#34;&gt;A brief &lt;strong&gt;overview&lt;/strong&gt; of the setup&lt;/h2&gt;
&lt;p&gt;As noted, I have a cluster of multiple servers, ranging from AMD EPYC machines to ARM machines to a significant interest in a System z mainframe which I talked about at AlpineConf last year. These are used to host various services in virtual machine and container form, with the majority of the containers being managed by kubernetes in the current iteration of my setup. Most of these workloads are backed by an Isilon NAS, but some workloads run on local storage instead, typically for performance reasons.&lt;/p&gt;
&lt;p&gt;Using kubernetes seemed like a no-brainer at the time because it would allow me to have a unified control plane for all of my workloads, regardless of where (and on what architecture) they would be running. Since then, I’ve realized that the complexity of managing my services with kubernetes was not justified by the benefits I was getting from using kubernetes for my workloads, and so I started migrating away from kubernetes back to a traditional way of managing systems and containers, but many services are still managed as kubernetes containers.&lt;/p&gt;
&lt;h2 id=&#34;a-samsung-ssd-failure-on-the-primary-development-server&#34;&gt;A Samsung SSD failure on the primary development server&lt;/h2&gt;
&lt;p&gt;My primary development server, is named treefort. It is an x86 box with AMD EPYC processors and 256 GB of RAM. It had a 3-way RAID-1 setup using Linux mdraid on 4TB Samsung 860 EVO SSDs. I use KVM with libvirt to manage various VMs on this server, but most of the server’s resources are dedicated to the treefort environment. This environment also acts as a kubernetes worker, and is also the kubernetes controller for the entire cluster.&lt;/p&gt;
&lt;p&gt;Recently I had a stick of RAM fail on treefort. I ordered a replacement stick and had a friend replace it. All seemed well, but then I decided to improve my monitoring so that I could be alerted to any future hardware failures, as having random things crash on the machine due to uncorrected ECC errors is not fun. In the process of implementing this monitoring, I learned that one of the SSDs had fallen out of the RAID.&lt;/p&gt;
&lt;p&gt;I thought it was a little weird that one drive failed out of the three, so I assumed it was just due to maintenance, perhaps the drive had been reseated after the RAM stick was replaced, after all. As the price of a replacement 4TB Samsung SSD is presently around $700 retail, I thought I would re-add the drive to the array, assuming it would fail out of the array again during rebuild if it had actually failed.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# mdadm —-manage /dev/md2 —-add /dev/sdb3  
mdadm: added /dev/sdb3
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I then checked /proc/mdstat and it reported the array as healthy. I thought nothing of it, though in retrospect maybe I should have found this suspicious, there was no discussion about the array being in a recovery state, instead it was healthy, with three drives present. Unfortunately, I figured “ok, I guess it’s fine” and left it at that.&lt;/p&gt;
&lt;h2 id=&#34;silent-data-corruption&#34;&gt;Silent data corruption&lt;/h2&gt;
&lt;p&gt;Meanwhile, the filesystem in the treefort environment being backed by the local SSD storage for speed reasons, began to silently corrupt itself. Because most of my services, such as my mail server, DNS and network monitoring, are running on other hosts, there wasn’t really any indicator of anything wrong. Things seemed to be basically working fine: I had been compiling kernels all week long as I tested various mitigations for the execve(2) issue. What I didn’t know at the time was that with each kernel compile I was slowly corrupting the disk more and more.&lt;/p&gt;
&lt;p&gt;I was not aware of the data corruption issue until today, anyway, when I logged into the treefort environment, and decided to fire up nano to finish up some work I had been doing that needed to be resolved this week. That led me to have a rude surprise:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;treefort:~$ nano  
Segmentation fault
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This worried me, after all, why could nano crash if it were working yesterday, and nothing had changed? So, I used apk fix to reinstall nano, making it work again. At this point, I was quite suspicious, that something was up with the server, so I immediately killed all the guests running on it, and focused on the bare metal host environment (what we would call the dom0 if we were still using Xen).&lt;/p&gt;
&lt;p&gt;I ran e2fsck -f on the treefort volumes and hoped for the best. Instead of a clean bill of health, I got lots of filesystem errors. But this still didn’t make any sense to me, I checked the array again, and it was still showing as fully healthy. Accordingly, I decided to run e2fsck -fy on the volumes and hope for the best. This took out the majority of the volume storing my home directory.&lt;/p&gt;
&lt;h2 id=&#34;the-loss-of-the-kubernetes-controller&#34;&gt;The loss of the kubernetes controller&lt;/h2&gt;
&lt;p&gt;Kubernetes is a fickle beast, it assumes you have set everything up with redundancy including, of course, redundant controllers. I found this out the hard way when I took treefort offline, and the worker nodes got confused and took the services they were running offline as well, presumably because they were unable to talk to the controller.&lt;/p&gt;
&lt;p&gt;Eventually, with some help from friends, I was able to recover enough of the volume to allow the system to boot enough to get the controller back up and running enough to restore the services on the workers that were not treefort, but much like the data in my home directory, the services that were running on treefort are likely permanently lost.&lt;/p&gt;
&lt;h2 id=&#34;some-thoughts&#34;&gt;Some thoughts&lt;/h2&gt;
&lt;p&gt;First of all, it is obvious I need to improve my backup strategy from something other than “I’ll figure it out later”. I &lt;a href=&#34;https://github.com/richfelker/bakelite&#34;&gt;plan on packaging Rich Felker’s bakelite tool&lt;/a&gt; to do just that.&lt;/p&gt;
&lt;p&gt;The other big elephant in the room, of course, is “why weren’t you using ZFS in the first place”. While it is true that Alpine has supported ZFS for years, I’ve been hesitant to use it due to the CDDL licensing. In other words, I chose the mantra instilled in me about GPL compatibility since the days when I was using GNU/Linux over pragmatism. And my prize for that decision was this mess. While I think Oracle and the Illumos and OpenZFS contributors should come together to relicense the ZFS codebase under MPLv2 to solve the GPL compatibility problem, I am starting to think that I should care more about having a storage technology I can actually trust.&lt;/p&gt;
&lt;p&gt;I’m also quite certain that the issue I hit is a bug in mdraid, but perhaps I am wrong. I am told that there is a dirty bitmap system and perhaps if all bitmaps are marked clean on both the good pair of drives and the bad drive, it can cause this kind of split-brain issue, but I feel like there should be timestamping on those bitmaps to prevent something like this. It’s better to have an unnecessary rebuild because of clock skew than to go split brain and have 33% of all reads causing silent data corruption due to being out of sync with the other disks.&lt;/p&gt;
&lt;p&gt;Nonetheless, my plans are to rebuild treefort with ZFS and SSDs from another vendor. Whatever happened with the Samsung SSDs has made me anxious enough that I don’t want to trust them for continued production use.&lt;/p&gt;
</description>
      <source:markdown>
As a result of my FOSS maintenance and activism work, I have a significant IT footprint, to support the services and development environments needed to facilitate everything I do. Unfortunately, I am also my own system administrator, and I am quite terrible at this. This is a story about how I wound up knocking most of my services offline and wiping out my home directory, because of a combination of Linux mdraid bugs and a faulty SSD. Hopefully this will be helpful to somebody in the future, but if not, you can at least share in some catharsis.

## A brief **overview** of the setup

As noted, I have a cluster of multiple servers, ranging from AMD EPYC machines to ARM machines to a significant interest in a System z mainframe which I talked about at AlpineConf last year. These are used to host various services in virtual machine and container form, with the majority of the containers being managed by kubernetes in the current iteration of my setup. Most of these workloads are backed by an Isilon NAS, but some workloads run on local storage instead, typically for performance reasons.

Using kubernetes seemed like a no-brainer at the time because it would allow me to have a unified control plane for all of my workloads, regardless of where (and on what architecture) they would be running. Since then, I’ve realized that the complexity of managing my services with kubernetes was not justified by the benefits I was getting from using kubernetes for my workloads, and so I started migrating away from kubernetes back to a traditional way of managing systems and containers, but many services are still managed as kubernetes containers.

## A Samsung SSD failure on the primary development server

My primary development server, is named treefort. It is an x86 box with AMD EPYC processors and 256 GB of RAM. It had a 3-way RAID-1 setup using Linux mdraid on 4TB Samsung 860 EVO SSDs. I use KVM with libvirt to manage various VMs on this server, but most of the server’s resources are dedicated to the treefort environment. This environment also acts as a kubernetes worker, and is also the kubernetes controller for the entire cluster.

Recently I had a stick of RAM fail on treefort. I ordered a replacement stick and had a friend replace it. All seemed well, but then I decided to improve my monitoring so that I could be alerted to any future hardware failures, as having random things crash on the machine due to uncorrected ECC errors is not fun. In the process of implementing this monitoring, I learned that one of the SSDs had fallen out of the RAID.

I thought it was a little weird that one drive failed out of the three, so I assumed it was just due to maintenance, perhaps the drive had been reseated after the RAM stick was replaced, after all. As the price of a replacement 4TB Samsung SSD is presently around $700 retail, I thought I would re-add the drive to the array, assuming it would fail out of the array again during rebuild if it had actually failed.

```
# mdadm —-manage /dev/md2 —-add /dev/sdb3  
mdadm: added /dev/sdb3
```

I then checked /proc/mdstat and it reported the array as healthy. I thought nothing of it, though in retrospect maybe I should have found this suspicious, there was no discussion about the array being in a recovery state, instead it was healthy, with three drives present. Unfortunately, I figured “ok, I guess it’s fine” and left it at that.

## Silent data corruption

Meanwhile, the filesystem in the treefort environment being backed by the local SSD storage for speed reasons, began to silently corrupt itself. Because most of my services, such as my mail server, DNS and network monitoring, are running on other hosts, there wasn’t really any indicator of anything wrong. Things seemed to be basically working fine: I had been compiling kernels all week long as I tested various mitigations for the execve(2) issue. What I didn’t know at the time was that with each kernel compile I was slowly corrupting the disk more and more.

I was not aware of the data corruption issue until today, anyway, when I logged into the treefort environment, and decided to fire up nano to finish up some work I had been doing that needed to be resolved this week. That led me to have a rude surprise:

```
treefort:~$ nano  
Segmentation fault
```

This worried me, after all, why could nano crash if it were working yesterday, and nothing had changed? So, I used apk fix to reinstall nano, making it work again. At this point, I was quite suspicious, that something was up with the server, so I immediately killed all the guests running on it, and focused on the bare metal host environment (what we would call the dom0 if we were still using Xen).

I ran e2fsck -f on the treefort volumes and hoped for the best. Instead of a clean bill of health, I got lots of filesystem errors. But this still didn’t make any sense to me, I checked the array again, and it was still showing as fully healthy. Accordingly, I decided to run e2fsck -fy on the volumes and hope for the best. This took out the majority of the volume storing my home directory.

## The loss of the kubernetes controller

Kubernetes is a fickle beast, it assumes you have set everything up with redundancy including, of course, redundant controllers. I found this out the hard way when I took treefort offline, and the worker nodes got confused and took the services they were running offline as well, presumably because they were unable to talk to the controller.

Eventually, with some help from friends, I was able to recover enough of the volume to allow the system to boot enough to get the controller back up and running enough to restore the services on the workers that were not treefort, but much like the data in my home directory, the services that were running on treefort are likely permanently lost.

## Some thoughts

First of all, it is obvious I need to improve my backup strategy from something other than “I’ll figure it out later”. I [plan on packaging Rich Felker’s bakelite tool](https://github.com/richfelker/bakelite) to do just that.

The other big elephant in the room, of course, is “why weren’t you using ZFS in the first place”. While it is true that Alpine has supported ZFS for years, I’ve been hesitant to use it due to the CDDL licensing. In other words, I chose the mantra instilled in me about GPL compatibility since the days when I was using GNU/Linux over pragmatism. And my prize for that decision was this mess. While I think Oracle and the Illumos and OpenZFS contributors should come together to relicense the ZFS codebase under MPLv2 to solve the GPL compatibility problem, I am starting to think that I should care more about having a storage technology I can actually trust.

I’m also quite certain that the issue I hit is a bug in mdraid, but perhaps I am wrong. I am told that there is a dirty bitmap system and perhaps if all bitmaps are marked clean on both the good pair of drives and the bad drive, it can cause this kind of split-brain issue, but I feel like there should be timestamping on those bitmaps to prevent something like this. It’s better to have an unnecessary rebuild because of clock skew than to go split brain and have 33% of all reads causing silent data corruption due to being out of sync with the other disks.

Nonetheless, my plans are to rebuild treefort with ZFS and SSDs from another vendor. Whatever happened with the Samsung SSDs has made me anxious enough that I don’t want to trust them for continued production use.
</source:markdown>
    </item>
    
  </channel>
</rss>
