Building a security response team in Alpine

Starting this past month, thanks to the generous support of Google and the Linux Foundation, instead of working on the usual Alpine-related consulting work that I do, I’ve had the privilege of working on various initiatives in Alpine relating to security that we’ve needed to tackle for a long time.  Some things are purely technical, others involve formulating policy, planning and recruiting volunteers to help with the security effort.

For example, my work to replace poorly maintained software with better replacements is a purely technical security-related effort, while building a security response team has social aspects as well as designing and building tools for the team to use.  Our security issue tracker has gone live and is presently being tested by the community, and with that work we’re already off to a great start at an organized security response.

If you didn’t know what Alpine Linux is already, it is a popular Linux system with over a billion installations on Docker alone.  By building on efficient building blocks, such as the musl C library and busybox, Alpine maintains a slim installation image size while also providing the conveniences of a general-purpose Linux distribution.  As a result, Alpine has been deployed as the base of many Docker images, has been ported to hundreds of devices as the basis of postmarketOS, has been used to build hundreds of appliances with LinuxKit and has been deployed everywhere from 5G networks to solar farms and oil rigs thanks to the work done by Zededa with Project EVE.  With all of this growth in the few years, it’s important to rethink a lot of things in the distribution including our approach to security.

Why having a security team matters

You might be wondering why distributions need a dedicated security team.  The answer is simple: having a dedicated security team helps to ensure that issues are fixed more quickly, and that the dissemination of security fix information is consistent.  The lack of consistent security fix information has been one of the major concerns for some potential users when evaluating Alpine for their projects, so improving upon this with a proper security team will hopefully assuage those concerns.

The other main benefit to having a dedicated security team is that the team can be a helpful resource for maintainers who need help figuring out how to fix their packages.  Sometimes figuring out which commits need to be backported in order to fix a vulnerability is tricky, because upstreams do not necessarily disclose what commits fix a problem.  Having more eyeballs that can be pinged to look at an issue is helpful here.

How Alpine has historically done security response

In the past, an anonymous user working under a pseudonym reported vulnerabilities to package maintainers by hand as she learned about them.  While this has worked out fairly well for us in the past, this has understandably not scaled very well as Alpine’s package count has grown.  Although the security tracker is not yet integrated into Gitlab, it will soon provide a familiar experience for anyone who has worked issues reported by the anonymous contributor.

Imagine my surprise when I found out that the account opening issues against packages with security vulnerabilities was not a bot, but instead an actual person doing the work by hand!  Needless to say, in 2021, it is time to come up with a better system, which is why I am working on starting an actual security team, and building tools for the security team to use like the security tracker.

Future directions for security-related work in Alpine

One of the main initiatives I have been working on the past few months has been getting Rust on all of Alpine’s release architectures.  At the moment, we are still missing s390x, mips64 and riscv64 (under consideration as a release architecture beginning with 3.15).  While I was not able to complete this work in the 3.14 release cycle, we have established a great working relationship with the Rust team, and have been working on detangling the musl port in Rust.  Once the musl port is detangled in Rust, it will be trivial to bring up support on new architectures we add in the future (such as loongarch, for example), and bootstrap.sh can already build a Rust cross compiler for the targets we support today.

Another initiative is replacing poorly maintained software with modern replacements.  I have a lot of plans in this area, for example, replacing GNU gettext with gettext-tiny, which will make container images smaller and allow Alpine to use musl’s native gettext routines which are considerably more robust than GNU libintl.  Similarly, I am looking into replacing GNU ncurses with netbsd’s implementation for the same reasons.  I am also planning to refactor apk-tools’ cryptographic code to allow using BearSSL as a backend instead of OpenSSL’s libcrypto.

We also want to harden Alpine’s supply chain from supply-chain attacks.  This work is largely going to be focused on two areas: integrating support into abuild for verifying signatures on downloaded source code and rebooting the reproducible builds effort.  The things we need to do to support reproducible builds however is a blog post in and of itself, so that will be coming soon as we are able to move onto working on that.

Getting involved

If working on these kinds of problems interests you, I’d love to get you on board and working on these initiatives.  Feel free to ping me in IRC, or ping me in Alpine’s gitlab.  There is a literal mountain of security-related work to be done in Alpine, so the possibilities are basically endless.

Also, if you work for a company that is interested in funding work on Alpine, there are many initiatives that could be more quickly implemented with funding, such as Laurent Bercot’s planned OpenRC replacement.

A tale of two envsubst implementations

Yesterday, Dermot Bradley brought up in IRC that gettext-tiny’s lack of an envsubst utility could be a potential problem, as many Alpine users use it to generate configuration from templates.  So I decided to look into writing a replacement, as the tool did not seem that complex.  That rewrite is now available on GitHub, and is already in Alpine testing for experimental use.

What envsubst does

The envsubst utility is designed to take a set of strings as input and replace variables in them, in the same way that shells do variable substitution.  Additionally, the variables that will be substituted can be restricted to a defined set, which is nice for reliability purposes.

Because it provides a simple way to perform substitutions in a file without having to mess with sed and other similar utilities, it is seen as a helpful tool for building configuration files from templates: you just install the cmd:envsubst provider with apk and perform the substitutions.

Unfortunately though, GNU envsubst is quite deficient in terms of functionality and interface.

Good tool design is important

When building a tool like envsubst, it is important to think about how it will be used.  One of the things that is really important is making sure a tool is satisfying to use: a tool which has non-obvious behavior or implies functionality that is not actually there is a badly designed tool.  Sadly, while sussing out a list of requirements for my replacement envsubst tool, I found that GNU envsubst has several deficiencies that are quite disappointing.

GNU envsubst does not actually implement POSIX variable substitution like a shell would

In POSIX, variable substitution is more than simply replacing a variable with the value it is defined to.  In GNU envsubst, the documentation speaks of shell variables, and then outlines the $FOO and ${FOO} formats for representing those variables.  The latter format implies that POSIX variable substitution is supported, but it’s not.

In a POSIX-conformant shell, you can do:

% FOO="abc_123"
% echo ${FOO%_*}
abc

Unfortunately, this isn’t supported by GNU envsubst:

% FOO="abc_123" envsubst
$FOO
abc_123
${FOO}
abc_123
${FOO%_*}
${FOO%_*}

It’s not yet supported by my implementation either, but it’s on the list of things to do.

Defining a restricted set of environment variables is bizzare

GNU envsubst describes taking an optional [SHELL-FORMAT] parameter.  The way this feature is implemented is truly bizzare, as seen below:

% envsubst -h
Usage: envsubst [OPTION] [SHELL-FORMAT]
...
Operation mode:
  -v, --variables             output the variables occurring in SHELL-FORMAT
...
% FOO="abc123" BAR="xyz456" envsubst FOO
$FOO
$FOO
% FOO="abc123" envsubst -v FOO
% FOO="abc123" envsubst -v \$FOO
FOO
% FOO="abc123" BAR="xyz456" envsubst \$FOO
$FOO
abc123
$BAR
$BAR
% FOO="abc123" BAR="xyz456" envsubst \$FOO \$BAR
envsubst: too many arguments
% FOO="abc123" BAR="xyz456" envsubst \$FOO,\$BAR
$FOO
abc123
$BAR
xyz456
$BAZ
$BAZ
% envsubst -v
envsubst: missing arguments
%

As discussed above, [SHELL-FORMAT] is a very strange thing to call this, because it is not really a shell variable substitution format at all.

Then there’s the matter of requiring variable names to be provided in this shell-like variable format.  That requirement gives a shell script author the ability to easily break their script by accident, for example:

% echo 'Your home directory is $HOME' | envsubst $HOME
Your home directory is $HOME

Because you forgot to escape $HOME as \$HOME, the substitution list was empty:

% echo 'Your home directory is $HOME' | envsubst \$HOME 
Your home directory is /home/kaniini

The correct way to handle this would be to accept HOME without having to describe it as a variable.  That approach is supported by my implementation:

% echo 'Your home directory is $HOME' | ~/.local/bin/envsubst HOME 
Your home directory is /home/kaniini

Then there’s the matter of not supporting multiple variables in the traditional UNIX style (as separate tokens).  Being forced to use a comma on top of using a variable sigil for this is just bizzare and makes the tool absolutely unpleasant to use with this feature.  For example, this is how you’re supposed to add two variables to the substitution list in GNU envsubst:

% echo 'User $USER with home directory $HOME' | envsubst \$USER,\$HOME 
User kaniini with home directory /home/kaniini

While my implementation supports doing it that way, it also supports the more natural UNIX way:

% echo 'User $USER with home directory $HOME' | ~/.local/bin/envsubst USER HOME 
User kaniini with home directory /home/kaniini

This is common with GNU software

This isn’t just about GNU envsubst.  A lot of other GNU software is equally broken.  Even the GNU C library has design deficiencies which are similarly frustrating.  The reason why I wish to replace GNU software in Alpine is because in many cases, it is defective by design.  Whether the design defects are caused by apathy, or they’re caused by politics, it doesn’t matter.  The end result is the same, we get defective software.  I want better security and better reliability, which means we need better tools.

We can talk about the FSF political issue, and many are debating that at length.  But the larger picture is that the tools made by the GNU project are, for the most part, clunky and unpleasant to use.  That’s the real issue that needs solving.

A Brief History of Configuration-Defined Image Builders

When you think of a configuration-defined image builder, most likely you think of Docker (which builds images for containers).  But before Docker, there were several other projects, all of which came out of a vibrant community of Debian-using sysadmins looking for better ways to build VM and container images, which lead to a series of projects that built off each other to build something better.

Before KVM, there was Xen

The Xen hypervisor is likely something you’ve heard of, and that’s where this story begins.  The mainstream desire to programmatically create OS images came about as Xen became a popular hypervisor in the mid 2000s.  The first development in that regard was xen-tools, which automated installation of Debian, Ubuntu and CentOS guests, by generating images for them using custom perl scripts.  The world has largely moved on from Xen, but it still sees wide use.

ApplianceKit and ApplianceKit-NG

The methods used in xen-tools, while generally effective, lacked flexibility.  Hosting providers needed a way to allow end-users to customize the images they deployed.  In my case, we solved this by creating ApplianceKit.  That particular venture was sold to another hosting company, and for whatever reason, I started another one.  In that venture, we created ApplianceKit-NG.

ApplianceKit and ApplianceKit-NG took different approaches internally to solve a basic problem, taking an XML description of a software image and reproducing it, for example:

<?xml version="1.0" standalone="yes"?>
<appliance>
  <description>LAMP appliance based on Debian squeeze</description>
  <author>
    <name>Ariadne Conill</name>
    <email>ariadne@dereferenced.org</email>
  </author>
  <distribution>squeeze</distribution>
  <packagelist>
    <package>apache2</package>
    <package>libapache2-mod-php5</package>
    <package>mysql-server</package>
    <package>mysql-client</package>
  </packagelist>
</appliance>

As you can see here, the XML description described a desired state for the image to be in at deployment time.  ApplianceKit did this through an actor model: different modules would act on elements in the configuration description.  ApplianceKit-NG instead treated this as a matter of compilation: first, a high-level pass converted the XML into a mid-level IR, then the mid-level IR was converted into a low-level IR, then the IR was converted into a series of commands that were evaluated like a shell script.  (Had I known about skarnet’s execline at that time, I would have used it.)

Docker

Another company that was active in the Debian community and experimenting with configuration-defined image building was dotCloud.  dotCloud took a similar evolutionary path, with the final image building system they made being Docker.  Docker evolved further on the concept outlined in ApplianceKit-NG by simplifying everything: instead of explicitly configuring a desired state, you simply use image layering:

FROM debian:squeeze
MAINTAINER ariadne@dereferenced.org
RUN apt-get update && apt-get install apache2 libapache2-mod-php5 mysql-server mysql-client

By taking a simpler approach, Docker has won out.  Everything is built on top of Docker these days, such as Kubernetes, and this is a good thing.  Even though some projects like Packer have further advanced the state of the art, Docker remains the go-to for this task, simply because its simple enough for people to mostly understand.

The main takeaway is that simply advancing the state of the art is not good enough to make a project compelling.  It must advance the state of simplicity too.

Cryptocurrencies from 10000 feet: the good, the bad, and the fixes

I’ve followed cryptocurrency for a long time.  The first concept I read about was Hashcash, which was a mechanism designed to reduce e-mail spam by acting as a sort of “stamp”.  The proof of work concept introduced by Hashcash of course lead to Bitcoin, which lead to Ethereum and the other popular Proof of Work consensus blockchain-based cryptocurrency platforms out in the world today.  In fact, in the very early days of Bitcoin I mined around 50k BTC total, and well, looking at the price of Bitcoin today, obviously I wish I had hung on to them.  With that said, I think Proof of Work consensus is a massively inefficient way to solve the problem of decentralized finance.  Hopefully this essay convinces you of that too.

Basic Concepts

Before we dive into the technical details of how these schemes work, I think it is important to discuss the basic concepts: there are unfortunately a lot of cryptocurrency advocates who fundamentally do not understand the technology.  I believe that if you are going to invest in an investment vehicle that you should be fully literate in the concepts used to build that investment vehicle.  If you disagree want want to just trade assets that you don’t understand, fine — but this blog will probably not be that helpful to you.

Most decentralized finance platforms, like Bitcoin and Ethereum, are built on top of a shared blockchain.  In technical terms, a blockchain is an append-only binary log that is content-addressable.  Content address-ability is achieved because the blockchain is composed in such a way that it acts as a merkle tree, which is a type of acyclic graph where each node has a distinct identity (derived from a hash function).  These nodes are grouped into blocks, which contain pointers to each element of the graph.  When the blocks in the append-only log are replayed, you can reconstruct the entire tree.  A good analogy for a blockchain would therefore be the ledger your bank uses to track all of its transactions, or the abstract for your house.  However, the use of a blockchain is not required to have a functional decentralized finance system, it is just a common optimization used to provide assayability.

An asset is considered assayable if a party can ascertain its value.  For example, I can assay the value of an options contract by examining the strike price and quantity of shares it covers.  You can assay the value of your purse or wallet by putting currency in only a purse or wallet you trust.  Blockchains are used in many decentralized finance systems to provide assayability: anybody can examine the contents of any purse to determine the value of that purse by walking through the transactions that purse has been involved in.

An asset is considered fungible if we can transfer it to others in exchange for goods and services.  For example, currency is a fungible asset: you exchange currency for goods and services.  However, the asset itself may hold direct value, or its value may simply be symbolic.  Currency that is itself made from precious metal, such as a gold coin, would be considered fungible, but not symbolic.  Paper money such as bank notes are considered both fungible and symbolic.  Cryptographic assets are available in both forms, depending on how they are underwritten.

A smart contract is a program which also acts as a financial instrument.  Most people erroneously believe smart contracts were an invention of Ethereum, but in reality, Mark S. Miller’s seminal work on the topic dates all the way back to 2000.  Bitcoin and other tokens like Ethereum and its subtokens are all the results of various forms of smart contract.  Other forms of smart contracts also exist such as NFTs.

The Current State of the World

At present, there are two main decentralized finance systems that most people have heard something about: Bitcoin and Ethereum.  These are proof of work consensus blockchains, meaning that participant nodes have to solve a cryptographic puzzle (the work) in order to gain authorization (the proof) to append a block to the chain.  Participants are rewarded for appending blocks to the chain by the program which maintains the chain.

As more participants join the network, contention over who is allowed to append to the chain increases.  This is reflected by the difficulty factor, which controls the complexity of the work needed for a cryptographic proof to be accepted as authorization to append a new block to the chain.

The benefit to this design is that it keeps the rate that blocks are appended to the chain at a mostly stable pace, which is desirable in systems where transactions are validated by their depth in the blockchain, as it allows for participants to estimate how long it will take for their transaction to be validated to the level they deem appropriate (e.g. a confirmation that the transaction is N blocks deep in the blockchain will take X minutes to obtain).

However, because these platforms reward appending new blocks to the chain with non-trivial amounts of currency, the competition over rights to append to these blockchains is increasing exponentially: the estimated energy consumption from Bitcoin mining now rivals that of Western Europe.  This obviously isn’t sustainable.

Proof of Stake Could Be A Solution

A different kind of blockchain governance model has emerged in recent history: proof of stake.  These platforms work by choosing a random stakeholder who meets the staking criteria to have rights to append the next block to the blockchain.  This mostly works well, but a lot of these blockchains still have other inefficiency problems, for example the staking requirements can effectively centralize authority in who can perform the appends if it is biased by size each stakeholder holds.

Proof of Stake, however, does get rid of traditional block mining and the energy consumption associated with it.  But these blockchains are still inefficient, because many of them store unnecessary data, which leads to slower lookups of data stored in the blockchain and slower tree building when the log is replayed.

There are many ways a proof of stake based platform can be optimized, including to not require a blockchain at all.  But even in a system where the basic functionality of the network does not require a blockchain, they can still be helpful as an optimization.

Ethereum 2.0 is an example of a credible proof of stake based platform that is still heavily tied to a blockchain.

Smart Contracts as Distributed Object Capabilities

Building on Mark S Miller’s seminal work on smart contracts, we can describe a decentralized finance system as a set of distributed object capabilities.  We will assume a PKI-based cryptographic system as the foundation of the network, such as djb’s Curve25519.  We also assume the presence of a distributed hash table running on participant nodes, such as Kademlia.

Each participant will have one or more keypairs which act as locators in the DHT, which act as pointers to opaque capabilities.  An opaque capability may itself be a contract or a validation proof generated by execution of a contract.

In order to keep participants honest, the network will choose other participants to act as validators.  These validators, when given a proof, will themselves re-execute the underlying contract with the inputs from the proof and validate that the output is the same.  If it is, they will sign the proof and give it back to the party which requested validation.

Given this architecture, lets consider a very simple purse for a token that exists on the network:

struct Purse {
    address_t m_owner;
    int64_t m_tokens;

    Purse(address_t p, int64_t t) : m_owner(p), m_tokens(t) {}

    pair<Purse> sendToAddress(address_t recipient, int64_t tokens) {
        if (tokens >= m_tokens)
            raise NotEnoughMoneyException();

        m_tokens -= tokens;
        return (Purse(m_owner, m_tokens), Purse(recipient, tokens));
    }

    Purse combineFromPurse(Purse p) {
        if (p.m_owner != m_owner)
            raise NotOwnerException();

        Purse newPurse = Purse(m_owner, m_tokens + p.m_tokens);
        m_tokens = p.m_tokens = 0;

        return newPurse;
    }
};

Given this library, we can write a smart contract which mints a purse for a given address with 100 initial tokens, and returns it as the output:

import Purse;

auto Contract::main(address_t owner) -> Purse {
    return Purse(owner, 100);
}

Given the output of this contract, we can send our friend some of our tokens:

import Purse;

auto Contract::main(Purse originalPurse,
                    address_t target, int64_t tokens) -> pair<Purse> {
    auto [ourNewPurse, theirPurse] =
        originalPurse.sendToAddress(target, tokens);
    return (ourNewPurse, theirPurse);
}

Now that we have minted a purse for our friend using some of our own tokens, we simply sign the output of that smart contract and send the message to our friend:

import SendToFriendContract;
import Purse;
import Message;
import Wallet;

auto Contract::main(address_t target, int64_t tokens) -> Message {
    // Get our wallet.
    auto wallet = Wallet.getOurWallet();

    // Get our purses from the wallet.
    auto ourPurses = Wallet.assetsFromIssuingContract(Purse);

    // Mint the new purse for our friend.
    auto [ourPurse, theirPurse] =
        SendToFriendContract(ourPurses[0], target, tokens);

    // Commit our purse to the top of the purse list in our wallet.
    wallet.prependAsset(ourPurse);

    // Seal the result of our execution in an opaque message.
    auto message =
        Message(wallet.address, target,
                [ourPurses[0], target, tokens],
                SendToFriendContract,
                [ourPurse, theirPurse]);

    return message;
}

When our friend receives the message, they can request its validation by forwarding it along to a validator:

import Message;
import ValidationContract;

auto Contract::main(Message input) -> Message {
    return ValidationContract(input);
}

The validator then can respond with a message confirming or denying the validity.  The validated message’s purse would then be used as an input for future transactions.

Note that I have yet to discuss a blockchain here.  In this case, a blockchain is useful for storing pointers to validations, which is helpful because it simplifies the need to maintain assayability by not needing to forward along a chain of previous validations as proof of custodianship.

While what I have outlined is a vast simplification of what is going on, there is a decentralized finance platform that operates like this: it’s name is Tezos and it operates basically under this principle.

Using Tezos is extremely efficient: compiling a smart contract on Tezos uses the same energy as compiling any other program of the same complexity.  Tezos is also capable of supporting sharding in the future should it become necessary, though the blockchain part itself is simply an optimization of how the network operates and Tezos could be adapted to operate without it.

Ultimately, I think if we are going to have decentralized finance, Tezos is the future, not Bitcoin and Ethereum.  But we can still do even better than Tezos.  Maybe I will write about that later.

Let’s build a new service manager for Alpine!

As many of you already know, Alpine presently uses an fairly modified version of OpenRC as its service manager.  Unfortunately, OpenRC maintenance has stagnated: the last release was over a year ago.

We feel now is a good time to start working on a replacement service manager based on user feedback and design discussions we’ve had over the past few years which can be simply summarized as systemd done right.  But what does systemd done right mean?

Our plan is to build a supervision-first service manager that consumes and reacts to events, using declarative unit files similar to systemd, so that administrators who are familiar with systemd can easily learn the new system.  In order to build this system, we plan to work with Laurent Bercot, a globally recognized domain expert on process supervision systems and author of the s6 software supervision suite.

This work will also build on the work we’ve done with ifupdown-ng, as ifupdown-ng will be able to reflect its own state into the service manager allowing it to start services or stop them as the network state changes.  OpenRC does not support reacting to arbitrary events, which is why this functionality is not yet available.

Corporate funding of this effort would have meaningful impact on the timeline that this work can be delivered in.  It really comes down to giving Laurent the ability to work on this full time until it is done.  If he can do that, like Lennart was able to, he should be able to build the basic system in a few months.

Users outside the Alpine ecosystem will also benefit from this work.  Our plan is to introduce a true contender to systemd that is completely competitive as a service manager.  If you believe real competition to systemd will be beneficial toward driving innovation in systemd, you should also want to sponsor this work.

Alpine has gotten a lot of mileage out of OpenRC, and we are open to contributing to its future maintenance while Alpine releases still include it as part of the base system, but our long-term goal is to adopt the s6-based solution.

If you’re interested in sponsoring Laurent’s work on this project, you can contact him via e-mail or via his Twitter account.

Why RMS should not be leading the free software movement

Earlier today, I was invited to sign the open letter calling for the FSF board to resign, which I did.  To me, it was obvious to sign the letter, which on it’s own makes a compelling argument for why RMS should not be an executive director at FSF.

But I believe there is an even more compelling reason.

When we started Alpine 15 years ago, we largely copied the way other distributions handled things… and so Alpine copied the problems that many other FOSS projects had as well.  Reviews were harsh and lacking empathy, which caused problems with contributor burnout.  During this time, Alpine had some marginal amount of success, Docker did eventually standardize on Alpine as platform of choice for micro-containers and postmarketOS was launched.

Due to burnout, I turned in my commit privileges, resigned from the core team and wound up taking a sabbatical from the project for almost a year.

In the time I was taking a break from Alpine, an amazing thing happened: the core team decided to change the way things were done in the project.  Instead of giving harsh reviews as the bigger projects did, the project pivoted towards a philosophy of collaborative kindness.  And as a result, burnout issues went away, and we started attracting all sorts of new talent.

Robert Hansen resigned as the GnuPG FAQ maintainer today.  In his message, he advocates that we should demand kindness from our leaders, and he’s right, it gets better results.  Alpine would not be the success it is today had we not decided to change the way the project was managed.  I want that level of success for FOSS as a whole.

Unfortunately, it is proven again and again that RMS is not capable of being kind.

He screams at interns when they do not do work to his exact specification (unfortunately, FSF staff are forced to sign an NDA that covers almost every minutia of their experience being an FSF staffer so this is not easy to corroborate).

He shows up on e-mail lists and overrides the decisions made by his subordinates, despite those decisions having strong community consensus.

He is a wholly ineffective leader, and his continued leadership will ultimately be harmful to FSF.

And so, that is why I signed the letter demanding his resignation yet again.

NFTs: A Scam that Artists Should Avoid

Non-fungible tokens (NFTs) are the latest craze being pitched toward the artistic communities.  But, they are ultimately a meaningless token which fails to accomplish any of the things artists are looking for in an NFT-based solution.

Let me explain…

So, What are NFTs?

Non-fungible tokens are a form of smart contracts (program) which runs on a decentralized finance platform.

They are considered “non-fungible” because they reference a specific asset, while a fungible token would represent an asset that is not specific.  An example of a non-fungible token in the physical world would be the title to your car or house, while a fungible token would be currency.

These smart contracts could, if correctly implemented, represent title to a physical asset, but implementation of an effective NFT regime would require substantive changes to the contract law in order to be enforceable in the current legal system.

How do NFTs apply to artwork?

Well, the simple answer is that they don’t.  You might hear differently from companies that are selling NFT technology to you, as an artist or art collector, but there is no actual mechanism to enforce the terms of the smart contract in a court and there is no actual mechanism to enforce that the guarantees of the smart contract itself cannot be bypassed.

NFT platforms like ArtMagic try to make it look like it is possible to enforce restrictions on your intellectual property using NFTs.  For example, this NFT is listed as being limited to 1000 copies:

However, there is no mechanism to actually enforce this.  It is possible that only 1000 instances of the NFT can be sold, but this does not restrict the actual number of copies to 1000.  To demonstrate this, I have made a copy of this artwork and reproduced it on my own website, which exists outside of the world where the NFT has any enforcement mechanism.

As you can see, there are now at least 1001 available copies of this artwork.  Except you can download that one for free.

All of these platforms have various ways of finding the master copy of the artwork and thus enabling you to make your own copy.  I am planning on writing a tool soon to allow anyone to download their own personal copy of any NFT.  I’ll write about the internals of the smart contracts and how to get the goods later.

Well, what about archival?

Some companies, like NFTMagic claim that your artwork is stored on the blockchain forever.  In practice, this is a very bold claim, because it requires that:

  • Data is never evicted from the blockchain in order to make room for new data.
  • The blockchain will continue to exist forever.
  • Appending large amounts of data to the blockchain will always be possible.

Lets look into this for NFTMagic, since they make such a bold claim.

NFTMagic runs on the Ardor blockchain platform, which is written in Java.  This is already somewhat concerning because Java is not a very efficient language for writing this kind of software in.  But how is the blockchain stored to disk?

For that purpose, the Ardor software uses H2.  H2 is basically an SQLite clone for Java.  So far, this is not very confidence inspiring.  By comparison, Bitcoin and Ethereum use LevelDB, which is far more suited for this task (maintaining a content-addressable append-only log) than an SQL database of any kind.

How is the data actually archived to the blockchain?  In the case of Ardor, it works by having some participants act as archival nodes.  These archival nodes maintain a full copy of the asset blockchain — you either get everything or you get nothing.  By comparison, other archival systems, like IPFS, allow you to specify what buckets you would like to duplicate.

How does data get to the archival nodes?  It is split into 42KB chunks and committed to the chain with a manifest.  Those 42KB chunks and manifest are then stored in the SQL database.

Some proponents of the NFTMagic platform claim that this design ensures that there will be at least one archival node available at all times.  But I am skeptical, because the inefficient data storage mechanism in combination with the economics of being an archival node make this unlikely to be sustainable in the long term.  If things play out the way I think they will, there will be a point in the future where zero archival nodes exist.

However, there is a larger problem with this design: if some archival nodes choose to be evil, they can effectively deny the rightful NFT owner’s ability to download the content she has purchased.  I believe implementing an evil node on the Ardor network would actually not be terribly difficult to do.  A cursory examination of the getMessage API does not seem to provide any protections against evil archival nodes.  At the present size of the network, a nation state would have more than sufficient resources to pull off this kind of attack.

Conclusion

In closing, I hope that by looking at the NFTMagic platform and it’s underlying technology (Ardor), I have effectively conveyed that these NFT platforms are not terribly useful to artists for more than a glorified “certificate of authenticity” solution.  You could, incidentally, do a “certificate of authenticity” by simply writing one and signing it with something like Docusign.  That would be more likely to be enforceable in a court too.

I also hope I have demonstrated that NFTs by design cannot restrict anyone from making a copy.  Be careful when evaluating the claims made by NFT vendors.  When in doubt, check your fingers and check your wallet, because they are most assuredly taking you for a ride.

The End of a Short Era

Earlier this year, I started a project called Jejune and migrated my blog to it.  For various reasons, I have decided to switch to WordPress instead.

The main reason why is because WordPress has plugins which do everything I wanted Jejune to do, so using an already established platform provides more time for me to work on my more important projects.

For posting to the fediverse, I plan to use a public Mastodon or Pleroma instance, though most of my social graph migrated back to Twitter so I probably won’t be too active there.  After all, my main reason for using social platforms is to communicate with my friends, so I am going to be where my friends actually are.  Feel free to let me know suggestions, though!

Using OTP ASN.1 support with Elixir

The OTP ecosystem which grew out of Erlang has all sorts of useful applications included with it, such as support for encoding and decoding ASN.1 messages based on ASN.1 definition files.

I recently began work on Cacophony, which is a programmable LDAP server implementation, intended to be embedded in the Pleroma platform as part of the authentication components. This is intended to allow applications which support LDAP-based authentication to connect to Pleroma as a single sign-on solution. More on that later, that’s not what this post is about.

Compiling ASN.1 files with mix

The first thing you need to do in order to make use of the asn1 application is install a mix task to compile the files. Thankfully, somebody already published a Mix task to accomplish this. To use it, you need to make a few changes to your mix.exs file:

  1. Add compilers: [:asn1] ++ Mix.compilers() to your project function.
  2. Add {:asn1ex, git: "https://github.com/vicentfg/asn1ex"} in the dependencies section.

After that, run mix deps.get to install the Mix task into your project.

Once you’re done, you just place your ASN.1 definitions file in the asn1 directory, and it will generate a parser in the src directory when you compile your project. The generated parser module will be automatically loaded into your application, so don’t worry about it.

For example, if you have asn1/LDAP.asn1, the compiler will generate src/LDAP.erl and src/LDAP.hrl, and the generated module can be called as :LDAP in your Elixir code.

How the generated ASN.1 parser works

ASN.1 objects are marshaled (encoded) and demarshaled (parsed) to and from Erlang records. Erlang records are essentially tuples which begin with an atom that identifies the type of the record.

Elixir provides a module for working with records, which comes with some documentation that explain the concept in more detail, but overall the functions in the Record module are unnecessary and not really worth using, I just mention it for completeness.

Here is an example of a record that contains sub-records inside it. We will be using this record for our examples.

message = {:LDAPMessage, 1, {:unbindRequest, :NULL}, :asn1_NOVALUE}

This message maps to an LDAP unbindRequest, inside an LDAP envelope. The unbindRequest carries a null payload, which is represented by :NULL.

The LDAP envelope (the outer record) contains three fields: the message ID, the request itself, and an optional access-control modifier, which we don’t want to send, so we use the special :asn1_NOVALUE parameter. Accordingly, this message has an ID of 1 and represents an unbindRequest without any special access-control modifiers.

Encoding messages with the encode/2 function

To encode a message, you must represent it in the form of an Erlang record, as shown in our example. Once you have the Erlang record, you pass it to the encode/2 function:

iex(1)> message = {:LDAPMessage, 1, {:unbindRequest, :NULL}, :asn1_NOVALUE}
{:LDAPMessage, 1, {:unbindRequest, :NULL}, :asn1_NOVALUE}
iex(2)> {:ok, msg} = :LDAP.encode(:LDAPMessage, message)
{:ok, <<48, 5, 2, 1, 1, 66, 0>>}

The first parameter is the Erlang record type of the outside message. An astute observer will notice that this signature has a peculiar quality: it takes the Erlang record type as a separate parameter as well as the record. This is because the generated encode and decode functions are recursive-descent, meaning they walk the passed record as a tree and recurse downward on elements of the record!

Decoding messages with the decode/2 function

Now that we have encoded a message, how do we decode one? Well, lets use our msg as an example:

iex(6)> {:ok, decoded} = :LDAP.decode(:LDAPMessage, msg)
{:ok, {:LDAPMessage, 1, {:unbindRequest, :NULL}, :asn1_NOVALUE}}
iex(7)> decoded == message
true

As you can see, decoding works the same way as encoding, except the input and output are reversed: you pass in the binary message and get an Erlang record out.

Hopefully this blog post is useful in answering questions that I am sure people have about making use of the asn1 application with Elixir. There are basically no documentation or guides for it anywhere, which is why I wrote this post.

Demystifying Bearer Capability URIs

Historically, there has been a practice of combining URIs with access tokens containing sufficient entropy to make them difficult to brute force. A few different techniques have been implemented to do this, but those techniques can be considered implementation specific. One of the earliest and most notable uses of this technique can be observed in the Second Life backend APIs.

An example of a capability URL being used in the fediverse today would be in the way that Pleroma refers to ActivityPub objects and activities: https://socially.whimsic.al/objects/b05e883b-b66f-421b-ac46-c6018f539533 is a capability URL that allows access to a specific object.

However, while it is difficult for capability URLs to be brute forced due to containing tokens that have sufficient entropy to make guessing them expensive, they are have a serious flaw — since the access token is part of the resource being accessed, they tend to leak the access token in two ways:

  • Browser Referer headers
  • Server access and error logs

Since the usage of capability URLs has become widespread, a wonderful specification has been released by the IETF: OAuth 2.0. OAuth provides a possible solution in the form of bearer tokens, which can be passed around using an Authorization header.

Bearer Capability URIs (aka bearcaps) are essentially a construct which combines a resource with an access token that happens to grant some form of access to that resource, while keeping that access token out of band. Since the access token is out of band, it cannot be leaked in a Referer header or in a server log.

But what does a bearcap URI look like? It’s actually quite simple. Let’s build on the example URL I gave above — it’s bearcap URI would be something like: bearcap:?u=https://socially.whimsic.al/notice/9nnmWVszgTY13FduAS&t=b05e883b-b66f-421b-ac46-c6018f539533.

Let’s break this down into parts.

bearcap is the name of the URI scheme, and then that is further broken down into a pair of URI query parameters:

  • u is the URI that the supplied token happens to be able to access in some way
  • t is the access token itself

In this case, a client which wanted to consume this bearcap (say to fetch the object behind it) would make a GET request to the URI specified, with the token specified:

GET /notice/9nnmWVszgTY13FduAS HTTP/1.1
Host: socially.whimsic.al
Authorization: Bearer b05e883b-b66f-421b-ac46-c6018f539533

HTTP/1.1 200 OK
[...]

In a future post, we will go into how bearcap URIs can help to bring some aspects of capability-based security to ActivityPub.