SSI-on-Blockchain is Objectively a Bad Thing

Summary / TLDR: Blockchain (or “DLT”) adds no functionality to a SSI/Identity system that is not equally well, if not better, provided by a QR code on paper. None of the benefits Blockchain is supposed to bring hold up under mild scrutiny. Blockchain adds significant complexity and cost, as well as usability issues and serious privacy concerns. “Blockchain” in SSI exists for PR only, not for engineering reasons.

Note: I am only going to talk about the “blockchain” part of Self-sovereign Identity. Many things, good and bad, can be said about self-sovereign identity, but in order to keep the scope of this document manageable, I’ll leave the broader SSI-discussion to others.

Terminology
SSI and Blockchain
Put Your Engineering Hat On and Nothing Adds Up
- The Bad
- The Ugly
A Tour Through Literature & Projects
Parting Words

Terminology

Blockchain, or DLT (“Distributed Ledger Technology”), typically describes a database system (“ledger”) similar to the one cryptocurrencies, like Bitcoin, are built on. In theory, they are supposed to be permissionless (anyone can write and read the ledger), decentralized (no single authority controls the ledger). They are typically constructed as immutable, append-only structures - meaning you can never modify, even delete, data from it. (Certain optimizations exist that allow participants to phase out old data in some DLTs, but deletion cannot be enforced - i.e., it is insufficient, or at least a gray area, for GDPR purposes)
Self-Sovereign Identity (SSI) is the idea that, instead of having a central party (e.g. government, Google/Apple, …) issue you an ID, you issue your ID yourself and have the third parties notarize it. This is frequently sold as a “take the power back” approach from Big Tech (or Governments). The most well-known standards for digital identity are W3C’s Decentralized Identifiers (DID) and Verifiable Credentials (VC). These are heavily sponsored & driven by blockchain-based startups.

SSI and Blockchain

The intersection of Self-Sovereign Identity and Blockchain is a great example for how Blockchain technology does not live up to it’s hype. Government institutions, such as the EU, pump millions into research directly, as well as indirectly - the top 2 blockchain use-cases on the European Blockchain Services Infrastructure (EBSI) website are “Identity” and the SSI-based “Diploma”. The top Google result for “real world blockchain uses” prominently features Identity-startups.

But it’s certainly possible to use Digital Identity / VCs without any blockchain: The EU’s digital vaccination certificates use W3C’s VCs. The EU has managed to scale it up to 1.7 Billion certificates - but that’s after they got rid of all the blockchain stuff, which they unsuccessfully experimented with before. So if SSI works just fine without blockchain, the obvious question is, what value does Blockchain add to SSI?

It turns out, even proponents of the SSI-on-Blockchain idea don’t seem to have a clear narrative - everyone seems to claim something else. There are, however, 6 commonly touted benefits:

Blockchain increases trust in the digital identity credentials, e.g. your digital passport [e.g. IBM SSI Blog: “Why Blockchain”],
Blockchain increases trust in the issuers of the credentials [IBM SSI Blog: ‘Trusted Labels’, Forbes]
Blockchain makes fraud harder [e.g. Techtarget, Fintech News, TU Delft],
Blockchain allows trusted time-stamping, i.e. proofs that credentials were issued before some specific point in time [EU Blockchain Forum],
Blockchain avoids third parties for storage/access [IBM again]
Users have control over their data (found in virtually every blog on the subject, and even in academic papers, e.g. this or this)

All of these claims fall apart when even mild scrutiny is applied.

It is almost comical how every article or paper on this subject avoids describing, on a high level, the tradeoffs in terms of risk, or trust, or engineering.

In a problem space that deals exclusively with (chains, webs, direct, indirect) trust, you’d think someone would write a comparison of DLT and non-DLT, on “which third parties do I need to trust, and what happens when they fail, or they are malicious”. It seems like engineering 101 - yet such comparisons are completely absent from anything I encountered while researching this article.

It is somewhat mind-boggling how the “blockchain is the future”-assumption just keeps getting cargo-culted through the pitch decks - and even through scientific papers -, and nobody seems to fact-check that assumption, ever.

Put Your Engineering Hat On and Nothing Adds Up

The Bad

The following discusses the claimed common benefits mentioned before, and how blockchain adds - in the best case - no value, but frequently makes the problem area worse.

Your digital identity does not become more trusted. If everyone can write to the blockchain, the fact that “your identity is on the blockchain” doesn’t mean anything by itself. Therefore there is no difference in authenticity (or trustworthiness) of some “piece of data” whether it’s on a QR code on your smartphone, or in a public blockchain.

Authenticity is typically established through digital signatures, e.g. a federal agency would digitally sign a document stating “Taylor Swift is 21+ years old”. But that signature can be easily be embedded in a QR code too (and so can a “chain” of signatures and certificates, allowing for trust hierarchies). These systems exist, and work (see the vaccination passport example).
The identity issuers do not become more trusted. The previous point holds for the other side of the equation, the identity of the signature issuer (Certificate Authority), too. There are (by necessity) only going to be few trusted entities that can act as CAs - after all, for the verification to work, both issuer and subject need to trust the same CA. Coordinating this will by necessity only leave a few government agencies, or maybe (very large) trusted corporations (such as nationwide utilities, or some Big Tech companies).
1. If there are only few, and they are well-known - for example, a few government entities per country - then you can hard-code them in your app - which is exactly what’s being done today through your OS’ or browsers CA store.
2. If there are multiple smaller entities - for example, each landlord issuing digital keys, in every city, worldwide - you will need to delegate trust, typically through a hierarchy (the landlords are trusted by the city, which in turn are trusted by the federal government, or a national utility company - who’s certificates are hard-coded in your app). This is the exact structure (virtually) all internet encryption is based on. Additionally, the existing system - X509 - supports cross-signing, having credentials signed by multiple parties - despite claims to the opposite, this is not a unique feature of W3C VCs.
Detection of forgery or conflicting credentials is not made easier. A very common argument is that storing claims on a DLT would make it impossible for someone to show one certificate to person A and another to person B: The verifier could just look at the public ledger, and see the conflict! Variations of this claim include Sovrin’s “The blockchain can tell you which [certificate] is the most recent one”.

But this system requires the verifier to look up other credentials that match some identifying information - for example, they need to find all IDs named Angela Merkel. Being able to perform such lookups requires that the data on the blockchain is PII, and storing PII on an immutable ledger is a straight-forward GDPR (and generally, privacy) violation.

There are approaches that try to address this - e.g. using hashes, encryption, or zero-knowledge proofs, to obscure the data and make the problem a bit harder, but the fundamental problem remains the same - you need to be able to look things up for the forgery/conflict detection to work, and that lookup necessarily exposes PII.

Additionally, there might be valid reasons for a person to have conflicting versions of their IDs over time. People change names, change genders, typos happen etc. - applications need to be able to handle these. Consider someone in a witness protection program, where the existence of a conflicting ID would be outright life-threatening.
Verified timestamping adds no practical value, and can be dangerous. The publication timestamp is the only bit of verifiable data that a blockchain actually adds. In theory, this proves that a piece of data was created no later than a specific time T - i.e. that a document has not been “backdated”.

Now, this sounds like a useful feature to have, but, as before, it falls apart under closer inspection - mainly because there is no real-world situation where this is truly relevant: You’d need to construct a situation where you already trust the issuer to make true statements about everything that’s in the document (e.g. in a passport, Name, biometric data, issuance and expiry date), but you somehow don’t trust them that they issued (printed) the document in a timely manner.

This is simply not a problem anyone is having, but it gets worse:
- Even if this was an actual problem: Nothing stops the document issuer from simply issuing multiple entries at different timestamps. As in the previous point (“forgery”), this is impossible to detect for third parties.
- As above, there might be very valid use cases for having a document being backdated. Having an anomaly in the issuance timestamp can be anything from uncomfortable to outright dangerous.
- This only works when you trust your Blockchain API gateway - the “anchoring timestamp” is part of the block, not the statement, so an untrustworthy API gateway can simply change it.¹
Of course, timestamping services don’t actually require blockchain. RFC3161, the most popular timestamping standard, has existed since 2001 and can be reasonably decentralized. But even that standard is rarely used: a quick github search for RFC3161 finds only 28 repositories and 1k commits at the time of writing - compared to 29k repositories and tens of millions of commits for the search term “identity”. DKIM covers this usecase too, and is standard for most emails today.

You are relying on the same amount of third parties, but likely less trustworthy ones. There are two “third parties” to consider here: The cryptographic trustee, and the data provider. The trust model in the former is unchanged, as discussed in the first two points.

The latter third party stores the blockchain for you, and provides you with the data through some API. This is required if the DID-system is to scale to a large population, like a country or the EU: The blockchain would typically contain some entry for every DID or VC; whether that’s the document itself, a hashed/encrypted version, or an audit log entry. You would then end up with billions of entries, not something that will fit on an average phone, or that you’d want to keep updated day and night.

(A few projects, like Sovrin, store only a handful - maybe a few 100 - “root” DIDs in a private ledger. As only sovrin can add or revoke from the ledger, the “added third party” trust problem remains the same.)

Adding the dependency on a “blockchain API” is equivalent to adding a dependency on a Government or Big Tech, though in most of the existing examples the gateway is operated by a far smaller (and/or less reputable) company. What if they are offline? What if they get hacked? What if that company has financial incentives to not be honest at all times?
You actually give up control over your data, not gaining more. The “you own your data” claim is simply not true, the exact opposite is true. When you upload data to a blockchain, it stays there forever. Any Big Tech company that you tried to avoid by using DID can happily read along.

Some systems claim they provide the ability to delete data, but even if that were true - by necessity, you pushed the data onto a public system, you don’t know if all systems actually act on your deletion request. The NSA probably won’t.

The “control” argument also often seems rooted in some kind of encryption scheme, where only the user holds the private key. But this is orthogonal to blockchain - if the data is encrypted, you might as well store it at some Big Tech provider, even if you don’t trust them. At least then you have one entity to sue if your data gets mis-handled. From a privacy perspective, you’re almost certainly better off with that approach, too: Metadata analysis companies like Chainalysis make a living, and headlines, by correlating data that’s properly encrypted, but on a blockchain.

The idea that you gain control over your data by publicly posting it on the internet is so obviously stupid, it’s hard to understand why people insist on repeating it.

The Ugly

The “Engineering” section would not be complete without pointing out a few obvious additional drawbacks when using DLTs:

Additional complexity. Adding more moving parts in any IT system should always be done with caution, even more so when the system is exposed to the internet, and even more so when it is deployed on end-user devices you can’t control and thus can’t centrally coordinate. Complexity adds cost, slows development, and adds failure modes.
A dependency on an internet connection. You cannot get the most up-to-date state of the blockchain when you’re in a tunnel, in the mountains, or the connection is slow and unreliable at a festival where the 5G network is overloaded. This dependency of course also means higher latency independent of how good your connection is - if you need to pull additional data, things go slower.
Transaction cost. Distributed ledgers - virtually by design - can’t utilize economies of scale, making transactions computationally more expensive. In addition, all commonly used public ledgers charge fees (and private ledgers require you to trust an additional party).

None of these above would be deal breakers, of course, if they were offset by some additionally gained functionality. But there simply is nothing.

A Tour Through Literature & Projects

Obviously, thousands of blog posts, articles and papers have been written on the subject. I’ve tried to collect a representative sample from the most popular (per google ranking or citations) below:

Blogs & Articles

EBSI, the European Blockchain initiative, is advertising digital identity as one of their 4 usecases. Technically, 2 others - “Diploma verification” and “Health Insurance verification” are close enough to DID that it might count, too. They have documentation, but it too is devoid of any *actual *information what the blockchain actually does. Their documentation actually implies the opposite: All the usecases that they have envisioned so far seem to only write to the ledger. None of them actually read from it. Seriously, even the “verify credentials” use-case only writes an audit log to the ledger, which is apparently never read.

The IBM Blog mentioned earlier talks a lot about how they avoid trusted third parties, and establish trust through the blockchain. They conveniently ignore that all of IBMs blockchain products are based on their Hyperledger product, a private blockchain developed and typically hosted by IBM - a third party to the application. In the case of Maersk Tradelens, a product that had some success through being forced onto Maersk’s customers, all interaction with the blockchain requires an IBM.com account.

Wikipedia’s Self-Sovereign Identity article states that SSI is “verified using public-key cryptography anchored on a distributed ledger.”, but the term “anchored” is never defined.

The source for that statement is a document titled “Blockchain and Identity” from the EU Blockchain Observatory. The word “anchored” does not appear in it. Despite the title, this 27-page document discusses the “Blockchain” aspect on less than 1 ½ pages, in which they merely hint at possible options - with plenty “can”, “could”, “might”. Eventually, they identify ‘timestamped data for eIDAS [the EUs digital identity system]‘ as the only viable use case. As discussed above, this is infeasible in practice. They also mention plenty of problems around GDPR, and that they really don’t know whether digital signatures on a DLT hold any legal weight in the EU.

The Forbes article mentioned earlier claims that with DID, “In the future [online shopping] deliveries could be made to the person, wherever they may be at that time, rather than just relying on a home or office address.”. I was not aware that this was a problem of presenting an ID. According to that article, DIDs are “biometrically secure” (the author does not provide a reference for that claim).

In Academia

I went through the first 5 results on Google Scholar for the search query “self-sorvereign identity blockchain”.

The Title Self-Sovereign Identity Solutions: The Necessity of Blockchain Technology certainly sounds promising, but unfortunately, the abstract already lowers ones expectations:

We conclude that blockchain technology is not explicitly required for a Self-Sovereign Identity solution but it is a good foundation to build up on, due to various technical advantages that the blockchain has to offer.

The paper does not substantiate that claim - all they do is compare 9 Blockchain-based solutions with 2 non-Blockchain-based solutions.

The paper states that they only focus on self-sovereign identity, and have thus excluded some identity solutions, and then find that almost all the remaining ones use blockchain technology! How very convenient, even more convenient that they left out OpenID or PGP/GPG, arguably two large non-blockchain players in (or close) to the DID space.

Still they can’t seem to manage to create a convincing argument, resorting to statements like “Still it is more difficult to prove that there are no hidden algorithms when not using blockchain. So blockchain is definitely a better base for this property.“ In another spot they casually mention

“However, if a private key is lost it will be difficult to change or remove the data.”

- likely illegal under GDPR (because the existence of data that a private-key can change or delete implies that there is identifiable data on the blockchain - even public keys could be PII under GDPR).

In the parts of their comparison where the assumed blockchain solution is not in direct violation of privacy laws, they helpfully point out that either DLT or non-DLT would get the job done.

The big reference in the Wikipedia article is “In Search of Self-Sovereign Identity Leveraging Blockchain Technology”. Of its 21 pages, only 3 are actually dedicated to Blockchain, half of which defines terminology, the other half briefly presents 4 projects that they identify serious shortcomings with - but they claim *might *work in the future. They briefly sketch a few use-cases, noting that all of them would work without blockchain too, and conclude with “[…] In fact, there have been a few attempts in the form of different blockchain-based self-sovereign identity systems. However, as per our analysis, none of them satisfies all the properties of a self-sovereign identity system. […]”.

In the paper A Survey on Essential Components of a Self-Sovereign Identity, Section II, they claim that the Blockchain takes the place of “the registrar in a classic DID system”, without explaining what their job actually is. The key points though are

In order to accept the identity, the relying party needs to have a trustful relationship with the claim issuer. and The actual identity claim is stored in the user controlled storage, typically off-chain for privacy considerations. The relying party, also called claim-verifier, can then compare the publicely [sic] available identifier with the identifier in the claim that has been handed to him by the user. This makes zero sense at all. The user hands the claim-verifer the claim, the claim is signed by the issuer, the claim-verifier has a trusting relationship with the issuer. We’re done! There is no added benefit to comparing the ID to a database that anyone can write to.

Deployment of a Blockchain-Based Self-Sovereign Identity is actually reasonably well readable. They sketch some parts of how an on-blockchain system would work, a bit more in-depth than other papers, but at the expense of completely ignoring any privacy concerns - while they do come up with a sketch for (but not an actual description) of a zero-knowledge system to ~encrypt the claims, they fail to describe how the other claimed benefit, the audit log, would work with this: You can either publicly audit claims, or you can have privacy. Not both. The paper ignores this.

They do repeat frequently that the user has to be in “control” over what’s on the blockchain, but this can never be fully true - a blockchain is a one-way street, once published, you can’t “control” the data back out of it. They seem to be implying that a “personalized Blockchain” is required, a parallel blockchain akin to a branch in IOTAs Tangle (but neither would the desired level of control be available in the IOTA system, nor has IOTA ever created even a working (decentralized) prototype, also apparently IOTA is still a thing in 2022 lol).

They did, however, solve how to identify terrorists at the airport: You just need to write “Not a terrorist” on the Blockchain!

Lastly, in the worst case, some claims may require real-time proof of correctness (not being a terrorist when checking in to the airport).

The paper actually points out that a lot of this stuff would work without blockchain, too:

The claims do not require any blockchain to be evaluated and may be shared with other platforms.

They never actually specify why it is worth going through all that trouble with Blockchain. Instead, they refer to another paper:

The concept of using blockchains as vessels for identity has been explained by Zyskind et al. [5]

That paper, Decentralizing Privacy: Using Blockchain to Protect Personal Data, unsurprisingly, also fails to make an argument how blockchain is any better - as with all the papers above, it merely describes complexity that would need to be added to make something work at all:

One of the major contributions of this paper is demonstrating how to overcome the public nature of the blockchain.

The only explanation of benefits is a vague “Given this model, only the user has control over her data.” - based on ownership of the encryption key, orthogonal to the blockchain; and that an adversary cannot “corrupt the network”. Unfortunately, in the very next paragraph point out that adversaries can actually corrupt data, unless it is sufficiently replicated:

Note that while data integrity is not ensured in each node, since a single node can tamper with its local copy or act in a byzantine way, we can still in practice minimize the risk with sufficient distribution and replication of the data.

This is unfortunately as far as that section goes.

Finally, Towards Self-Sovereign Identity using Blockchain Technology looks like a Bachelor or Master thesis, and I’m not in the mood of trash-talking someone’s thesis. TL; DR is that It falls into the same traps as discussed above, although it looks a bit nicer.

Real-World implementations?

The article In Search of Self-Sovereign Identity Leveraging Blockchain Technology was from June 2019, so what happened to the 4 projects they mentioned?

Uport has shut down. It split into serto.id, which seems dead, as their twitter feed and blog stop in Oct ‘21 - and veramo.io, which pivoted to “A JavaScript framework for verifiable data”, and seems to have completely dropped anything “blockchain/DLT” - a google search only shows 3 pages on their site that mention those terms, all in passing.
Jolo, who’s homepage now prominently features a blog post titled “Self Sovereign Identity ≠ Blockchain”, containing “In the past two years, the SSI community has emancipated itself from Blockchain technology” and “Why you do not need a Blockchain for Self Sovereign Identities.”.
Blockcert seemed to have stopped their core work in 2017 - which is the date of the most recent blog post and most of the documentation - and their main product seems to not have left prototype phase: it hardcodes the wallet password to <empty> and seems to skip quite a few certificate verification steps [1], [2] (!!).

The FAQ, under the section ”Why use a blockchain instead of a PKI infrastructure?” claims that the blockchain provides tamper-proofs (in the presence of a trusted issuer key/identity, a problem solved since 1977). They do use the trusted timestamp argument, with a PoW chain, claiming that this avoids a trusted third party. According to their code they depend on the third parties blockchain.info, learningmachines.com and their own blockcerts.org website, the trust model for those is not discussed. Neither is discussed how trust with the certificate issuer is established. The FAQ refers to the wiki for further information, which is empty.
Sovrin, which is likely the biggest player in the space. Their paper “What goes on the ledger” explicitly states that ordinary users would never store identifiers on the DLT - in fact, only larger entities, such as governments or corporations, would write their “public DIDs” to the Blockchain. Ordinary people would keep their “private DIDs” to themselves, those work just fine without blockchain. But Sovrin doesn’t explain what the public DIDs gain from being on the blockchain. Also, the security of the current internet already has a very similar model of ~150 public government & corporate “Root CAs” - Sovrin doesn’t explain how their approach is fundamentally better (or even different, apart from “blockchain!”).

The architecture diagram and websitefor their (only?) real-world implementation, IATA Travel Pass, does not mention anything blockchain related. I reached out to them and asked whether they used blockchain at all; they confirmed that they do use hyperledger (a permissioned blockchain only they can write to), but that IATA, the main stakeholder, also keeps their own, separate list of trusted certificates - making it fundamentally unclear why Sovrin even bothers running their blockchain instance.

They also confirmed that Sovrin is moving away from a pure blockchain focus for their products.

Additionally, the “Necessity of Blockchain Technology” article mentioned

IDChainZ - at the time of writing, the images on the website don’t load, making it hard to evaluate the architecture. The “Download or Brochure” link 404s.
EverID - all their social media went dead in April ‘21.
LifeID, which is dead, because the author pivoted to a phone number-as-NFT project.
ShoCard, acquired by a company called Ping Identity in 2020. There is no indication that Ping’s PingOne Cloud Platform has anything to do with Blockchain.
SelfKey, whose website and Twitter just screams “2017 ICO scam”. That being said though, in their whitepaper, they seem almost excited about how they don’t use any DLT for SSI either, and somewhat argue that they shouldn’t. Their blockchain is only used to pay for attestations (i.e., pay a utility company to sign your credentials using their blockchain-based token).

It cannot be stressed enough that the only known wide-scale deployments of anything related to W3C’s DID concept are the QR-code-on-paper based EU vaccination certificate, and India’s equivalent “DIVOC” passport. The fact that EBSI & co. are waving around DID as a means to get grants, in 2022 (most of the projects above died years ago, the papers were mostly writen in ‘18/’19) is saddening.

Parting Words

A lot of Blockchain + SSI projects start with the goal “take the power back”. This is, at it’s core, a noble goal. But instead of solving the hard problems, like

how to do ID recovery,
how to ensure privacy and metadata resistance,
how to deal with identity theft,
how to ensure decentralization and interoperability,
how to make SSI work for tech-illiterate people

they instead burned all the funding on trying - unsuccessfully - to make Blockchain as useful as a piece of paper. Imagine how much further we would be in Digital Identity today, if a huge chunk of the funding would not have been burned on a fundamentally incompatible technology.

Blockchain adds a serious amount of complexity, drawbacks & costs to something that can demonstrably be done without. This is an “the emperor has no clothes” situation - many people have heavily invested in this. Nobody wants to admit they’re wrong. But the reality is: A QR code is the superior tech.

This could partially be mitigated using SPV-style verification, but this requires
- a PoW chain with reasonably high difficulty - unlikely for any practical application, and
- that the end-device is capable of tracking at least the block headers. Technically feasible for chains with long block intervals (e.g. Bitcoin) but still somewhat resource intensive).
Hardly a good selling point. ↩