Building fair webs of trust by leveraging the OCAP model

Since the beginning of the Internet, determining the trustworthiness of participants and published information has been a significant point of contention. Many systems have been proposed to solve these underlying concerns, usually pertaining to specific niches and communities, but these pre-existing solutions are nebulous at best. How can we build infrastructure for truly democratic Webs of Trust?

Fairness in reputation-based systems

When considering the design of a reputation-based system, fairness must be paramount, but what is fairness in this context? A reputation-based system can be considered fair if it appropriately balances the concerns of the data publisher, the data subject, and the data consumer. Regulatory frameworks such as the GDPR attempt to provide guidance concering how this balance can be accomplished in the general sense of building internet services, but these frameworks are large and complicated, and as such make it difficult to provide a definition which is adequate for a reputation-based trust system.

To understand how these concerns must be balanced, we must understand the underlying risks for each participant in a reputation-based system:

  • The data subject is at risk of harm to their professional reputation due to annotations they did not consent to, and mistakes in those annotations. This is a problem which has already captured regulatory ire, as I will explain later.
  • The data publisher is at risk of being sued for defamation due to the annotations they publish.
  • The data consumer is at risk of being misled by inaccurate annotations they consume.

A fair reputation-based system must attempt to provide an adequate balance between these concerns through active harm reduction in its design:

  • The harm to the data subject from misleading annotations can be reduced by blinding the identity of the data subject.
  • The harm to the data publisher from misleading annotations can also be reduced by blinding the identity of the data subject.
  • The harm to the data consumer from misleading annotations can be reduced by allowing them to consume annotations from multiple sources.

Shinigami Eyes, or how designing for fairness can be difficult

The Shinigami Eyes browser extension was designed to help people establish trust in various web resources using a reputation-based system. In general, the author attempted to make thoughtful choices to ensure the system was reasonably fair in its design. However the system has a number of flaws, both technical and social, which highlight how building systems of trust requires a detailed understanding concerning how the underlying primitives interact and the consequences of those interactions.

Shinigami Eyes and Blinding

As already noted, a fair reputation-based system must blind the identity of the data subject to protect both the data subject and data publisher. The approach used by Shinigami Eyes was to use a bloom filter constructed with a 32-bit FNV-1a hash.

The FNV family of hashes are a non-cryptographic family of hashes, which provide scalability up to 1024 bits, which works by performing an XOR of the current byte’s value against the current hash value, then multiplying that value by the designated FNV prime. There is an alternate set of FNV hashes which swaps the XOR and multiplication steps, which is the variant used by Shinigami Eyes.

The use of a bloom filter is an acceptable blinding method, assuming that the underlying hash provides sufficient resolution, such as a 256-bit or 512-bit hash. Presumably, due to the constraints of having to run as a JavaScript extension, the weak 32-bit FNV-1a hash was used instead. Because of this, while the reputation lists used by Shinigami Eyes were acceptably blinded, there was an extremely high risk of false positives caused by hash collisions.

Concerns about the technical implementation of the Shinigami Eyes extension led Datatilsynet, the Norwegian GDPR regulatory agency, to ban the extension at the end of 2021, and development of the extension appears to have ended as a result of their initial inquiry.

Can we build systems like Shinigami Eyes more robustly?

The main reason why Shinigami Eyes gained attention of Datatilsynet was due to the centralized nature of the data processing. Can we build a system which avoids centralized data processing and promotes democratic participation? Yes, it is quite easy, but like most things, the challenge will be delivering a good user experience.

Leveraging the OCAP model to build a robust solution

The largest problem in building this system is ensuring that the published reputation data is reliably blinded. To this end, I propose that feeds are a simple dataset containing a set of blinded hashes and annotations. The physical representation of the dataset does not matter, though keeping it as simple as possible will expand the number of places where the data can be consumed.

In the Object Capability model, we can think of the physical feed as an object, and a blinding key as a capability to access that object in a useful way. You have to have both in order for either to be useful.

A participant can publish multiple copies of their feed, with different blinding keys for each friend they wish to share it with, or they can choose to publish a single key and share the same key with every friend, or even the public at large. Users can then choose which feeds they want to use when making trust decisions from the collection of feeds and blinding keys they have been given.

By comparison to Shinigami Eyes, this better satisfies the conditions for fairness: there is no risk of a false positive, the contents of the reputation lists remain private, and publishers can choose to consent to data sharing requests however they wish.

Choosing a reasonable set of primitives

To build such a system, I would probably personally choose to use HMAC-SHA3-256 as the blinding primitive. This provides a good balance between collision protection, cryptographic strength, and hash resolution. A scheme which provides less than 256 bits of hash resolution should be avoided due to the risk of collisions.

I would distribute the feeds as CSV files. This would allow users the most flexibility in managing feeds, they could distribute different feeds with different meanings, and include extended data alongside the blinded hash as a form of annotation.

On the client side, I would calculate sets of blinded hashes for each possible subset of the URI, all the way to the parent domain. By doing so, it would be possible for feeds to match against a large number of children URIs instead of having to list them all manually.

Implementations should store the learned hashes in a radix trie. This allows the hash lookups to be done in constant time, as well as allowing for automatic bucketing, which can be helpful for implementing quorum requirements.

Things we can build with this

The use of friend-to-friend reputation-based systems can be powerful. They provide accountability (as you know who you are getting your data from) and collaboration (your friends can consume your data in exchange).

They can be used in the way Shinigami Eyes was used: to allow interested parties to identify resources they should trust or distrust, but they can also be used to enable collaborative blocking amongst friends and system administrators.

They can also be used to determine if e-mail domains or URLs inside e-mails are actually trustworthy. The possibilities are truly endless.