In this series of crypto-bites, you're in for a treat! We introduce and explain Riposte[CGBM15], an anonymizing micro-blogging platform for millions of users. This brainchild of Henry Corrigan-Gibbs, Dan Boneh, and David Mazières, uses innovative cryptographic techniques to thwart mass-surveillance. It is so resilient that it maintains anonymity of writers even if \(n-1\) out of \(n\) compromized Riposte servers are colluding in de-anonymizing users. Unlike classic anonymizing systems that use mix-nets, Riposte is based on a variation of PIR (private information retrieval) database, where the database server has no means to know which user wrote a specific row.
Keywords: anonymity set, meta-data, secret sharing, writable pir, distributed point function, zero-knowledge proofs.
Setting the Stage
We're living in Geoge Orwell's disturbingly prophetic dystopia Nineteeneightyfour. Big Brother is watching us, all of us, 24/7/365, and there's no hiding from the Eye of Sauron and the Ears of Alexa, Siri, and Cortana. Just like the slaves from times past had to pay their own shackles, we're paying premium prices for the telescreens in our homes, and for our shiny tracking bracelets on the go, which thank to their planned obsolency, we have to replace every 2 to 4 years, so they can grow even better eyes, ears, and tracking abilities.
We're also busily writing our own Stasi file (still free of charge... for now), and we've all but sacrificed our minted freedom on the altar of transparent and freezable plastic money. The last whistleblowers sought political asylum in safe places with freezing winters and in embassies, and the powers that be are happily crushing the last remnants of dissent at home, and spreading wars, religious extremism, and general mayhem abroad. Things are looking bleak indeed, and everybody has bowed to the inevitable. Everybody? A small little village of cryptographers begs to differ...
"We Kill People Based on Metadata"
General Michael Hayden (retired), former head of the NSA, is on record for saying "We kill people based on metadata":
In a world of ubiquitous end-to-end encryption and the internet increasingly "going dark" thanks to crypto, the NSA's and other members' of the international Intelligence Community only hope to fulfill their controversial mission of global surveillance is
- to circumvent ciphers by using weaknesses in key management, implementations and communication protocols, as well as leveraging backdoors in operating systems and hardware,
- to collect meta-data on a huge scale and correlate those meta-data to discover their targets' social relationships graph to enable them pin-point people.
In this article, we'll focus on meta-data.
Please don't get me wrong: even though for dramaturgic reasons, I'm casting the NSA in a bad light, I don't think they're the bad guys they are cracked up to be by privacy advocates all around the world. On the contrary: their role is important in catching the bad guys out there. I also don't think that they already have the resources or the will to spy into everyone's bedroom for their own perverse pleasure (but boys will be boys...). And if NSA collects Kompromat on the German Chancellor's phone, at least it has the upside of making for more compliant and malleable commercial rivals allies, doesn't it? It's good to be the king... of secrets.
Anyway, enough politics. It's time to get get crypto!
Dan Boneh's Riposte
Dan Boneh, a cryptography professor at Stanford University, developed with his student Henry Corrigan-Gibbs and his colleague David Mazières an astonishing system for anonymous micro-blogging, i.e. a Twitter-like platform for whistleblowers. They aptly named it Riposte. It is best introduced by Dan himself in this distinguished CSE lecture at UW's Paul G. Allen School. The real presentation starts at 12:55, so you may want to skip the introduction, but be advised that you'll miss an off-topic, albeit faszinating topic[BSRBL12] which is well worth learning about:
I'll explain Riposte in this article for those who didn't watch the presentation or who need additional food for thought. You may also grab the academic paper [CGBM15].
Silent Whispers, Ltd.
Silent Whispers, Ltd[1] is a fictional off-shore company founded by crypto rebels. Its continuing mission: to provide an anonymous micro-blogging platform to their customers, to save democracy from itself by restoring anonymous free speech, to boldly hide metadata that no one has hid before... (and to make a buck or two, get noticed by the big tech giants, and sell out for billions of dollars to the highest bidder, but shttt, don't spell the beans til the deal is closed).
Silent Whispers offers Whispers, an app which connects to a cluster of Riposte servers located mostly outside the jurisdiction of the Fourteen Eyes, but the company insists that even if most of their servers were seized, its customers' identities can't be associated with the 160-characters short whispers (whisps for the afficionados) that they post with the app.
Whispers' users go in the millions, with 10,000 up to 100,000 whisps being sent on a slow-news day and 10 to 100 times more on busy days by countless whistleblowers, file sharers, teenagers, bored housewives, trolls, and spammers. All those whisps are showing up in various categories in the app (which gets updates every two minutes) for the eager readership to digest ad nauseam.
In the very beginning, Silent Whispers used a mix-net based backend: the app encapsulated each whisp like an onion in multiple layers of encryption and sent it down a maze of mixers. Each mixer removed one layer of encryption, shuffled the whisps, and sent them out towards the next mixer... until it arrived at it final destination. Sadly, this system didn't scale well: the more users posted whisps, the slower the mixers... until they came to a grinding halt. The culprit was quickly uncovered: to detect malicious mixers under control of hostile entities, each mixer had to continously run a so-called zero-knowledge proof (ZK-proof) of permutation to convince the other mixers that it did indeed its job as specified. Any seized mixer which pulled some shenigans such as dropping all incoming whisps except for one in order to deanonymize a specific user, couldn't successfully complete its ZK-proof and stood out like a sore thumb before its mixer peers as being compromized. Unfortunately, ZK-proofs ain't cheap: the more work a mixer had to do, the slower the ZK-proof for him and all its peers. A mix-net based system just didn't cut it.
By replacing the ageing mix-net based backend with Riposte, a new system invented by a group of talented cryptographers at Stanford University, Silent Whispers managed to scale its operations tremendously. Without Riposte, the anonymization specialists platform wouldn't be the go-to point on the Internet for all privacy-conscious users that it is today.
Fun fact: Did you know that each time your Whispers app fetches a batch of new whisps, you automatically join the anonymity set of the writing whisperers without writing anything, and help them blend in with all readers? The more whisps you read, the stronger everyones' anonymity.
Secret Sharing + Writable PIR + Distributed Point Function = Riposte
Corrigan-Gibbs, Boneh, and de Mazières, combined three cryptographic primitives, one of them in a novel way, to come up with the core of Riposte:
- Secret Sharing
- Writable PIR
- Distributed Point Function
To this, they added an audit server (which we cover later) to rule out misbehaving or cheating users who don't adhere to the Riposte protocol from thashing the database for everyone.
Secret Sharing
When Alice wants to submit a whisp to a cluster of \(n\) Riposte servers, her app turns that whisp into \(n\) different messages, which she sends to the servers. She hides the whisp from prying eyes and even from the servers by using a form of secret sharing due to Yao [Y82]. To understand secret sharing, look at this 2-server example:
Alice wants to send whisp \(m_a\) to Riposte servers \(R_1\) and \(R_2\). But she doesn't want \(R_1\) to be able to know \(m_a\) without the help of \(R_2\). On the other hand, she also doesn't want \(R_2\) to know \(m_a\) without the help of \(R_1\). She doesn't object that \(m_a\) is revealed later, when \(R_1\) and \(R_2\) are comparing notes.
So, how would she do that? If we interpret \(m_a\) as a number in a finite field \(\mathbb F\) where \(\mathbb F\) is the set \(\{0, 1, 2, \cdots, 2^{8 \times 160} - 1\}\) of all possible 160 chars whisps, and if all additions and subtractions are performed in that field (i.e. modulo \(|\mathbb F|\)):
- Alice picks a random number \(r \overset{R}{\leftarrow} {\mathbb F}\).
- Alice sends \(r\) to \(R_1\)
- Alice sends \(-r + m_a\) to \(R_2\)
It is obvious that as long as \(R_1\) and \(R_2\) aren't comparing notes, both are merely seeing random numbers. The message \(m_a\) is well hidden and can't be deduced, neither by \(R_1\) nor by \(R_2\)... on their own. But when both servers come together, they exchange their data, and each computes independenty the sum, revealing \(m_a\):
\[ r + (-r + m_a) = m_a \]
Note: Although we've called this scheme "secret sharing", it is not to be confused with "key sharing", which is a totally different concept.
Writable PIR
The second primitive is writable PIR. PIR stands for Private Information Retrieval. It is a well-established technique that allows a user to secretly fetch a specific row from a database, such that the database doesn't know which row the user is actually interested in. This may sound like voodoo, but in fact, it isn't all that mysterious, once you've seen it in action. A simple query that selects all rows is a trivial, albeit not very efficient way to PIR-select a database. There exist much more efficient methods with distributed databases[CKGS98], which we won't go into here.
Writable PIR is a novel way to PIR: here, the user submits data to be stored at a specific row to a database, such that the database doesn't know which row the user actually updated. This is somewhat more tricky than regular PIR. If we follow the paradigm of regular PIR shown above, we deduce that the user needs to update all rows of the database, because othewise, the database server would know which rows the user didn't touch. But if the database server simply updated its rows with the user-supplied rows, it would have thrashed all other users' rows.
But, surprisingly, there is a solution to this problem in the context of Riposte! We'll show that solution later in this series.
Distributed Point Function (DPF)
If you've already read the article on the Riposte v1 with two servers, you'll know by now that each time a user wants to submit a whisp, she must send a whole vector of random numbers to each of the servers. If, instead of a hosting 5 slots as in the toy example, a realistic Riposte server hosted \(1000000\) slots, which is a reasonable size for a period of 2 minutes, and each slot contains \(160\) chars or \(8 \times 160\) bits, she would have to send a total of \(n \times 1000000 \times 160\) bytes per whisp, or approximately \(160\) megabytes to each server. Now if that ain't overkill! And since all those random numbers have very high entropy, the 160 MB are actually incompressible: so much data will have to be transmitted. Good luck with your mobile data plan and battery charge!
Fortunately, there is a very effective way to reduce the amount of data that needs to be transmitted to each Riposte server: a Distributed Point Function. In this case, using DPF reduces the amount of data to be transmitted from 160 megabytes down to some 160 kilobytes per whisp, which is more than affordable.
Distributed Point Functions are also the key that unlocked the potential of multi-server Riposte clusters with more than 2 servers, such that the anonymity guarantee holds as long as at most \(n - 1\) out of \(n\) servers are colluding.
We'll have more to say about Distributed Point Functions in another article of this series.
Before we show Riposte v1, it is useful to have a look at the traditional way of anonymizing messages: mix-nets. This is the topic of the next article.
Notes
- [1] Any name collision with real-world past, present and future corporations is unintended and purely coincidental.
- [CGBM15] Henry Corrigan-Gibbs, Dan Boneh, and David Mazières: Riposte: An Anonymous Messaging System Handling Millions of Users. IEEE Symposium on Security and Privacy (Oakland) May 18-20, 2015, San Jose, California. (PDF, Slides, Prototype).
- [CKGS98] Benny Chor, Eyal Kushilevitz, Oded Goldreich, Madhu Sudan: Private Information Retrieval. Journal of the ACM (JACM), Volume 45 Issue 6, Nov. 1998, Pages 965-981 (acm.org, paywalled, full pdf).
- [BSRBL12] Hristo Bojinov, Daniel Sanchez, Paul Reber, Dan Boneh, Patrick Lincoln: Neuroscience Meets Cryptography: Designing Crypto Primitives Secure Against Rubber Hose Attacks. (abstract, full paper).
- [Y82] Andrew C. Yao: Protocols for Secure Computations (extended abstract). (IEEE (paywalled). full pdf)