Having introduced Riposte on a very high level, the next step will be to study a toy example with 2 Riposte servers to get a feeling for writable PIR. But before diving in, let's first see how anonymization is done the good ole traditional way: with mix-nets.
Mix nets are an important construction to study. Almost all current anonymizers, including Tor, are mix-net based. So during the intermission let's see what mix-nets are. We'll return to our regular scheduled programming in the next article.
Hiding metadata in a haystack
Let's recall what we want to achieve. We want that registered and therefore known users Alice, Bob, Carol, David, Eve, ... post messages \(m_1\), \(m_2\), \(m_3\) etc. to the platform in such a way that an attacker can't tell who wrote what. As an example, it should be hard to tell that \(m_2\) was posted by David, and that \(m_1\) was posted by Alice.
If the server to which the users connect to isn't compromized and is trusted, there ain't much to do: simply display the messages (of course without the author's name) in the order as they come in. After all, because the messages don't contain the author's name, anybody reading that server's output can't tell who wrote what. Seems secure... but is it?
Unfortunately, an eavesdropper listening on that server's incoming links will see that Alice connects first, and David connects second. Looking at the list of messages \(m_1\), \(m_2\) etc. displayed by that server, she concludes that \(m_1\) must have been posted by Alice, and that \(m_2\) must have been posted by David.
There's another problem with this simplistic setup: by immediately outputting received messages, any relatively quiet underused server would give away Alice's authorship of \(m_1\) to the eavesdropper that monitors Alice and a couple of other writers.
The scheme just described leaks two kind of metadata:
- the sequence of its received messages
- the time it received messages
Shuffling to the rescue
Obviously, to hide both kinds of metadata, the server needs to buffer and then shuffle the messages before outputting them. Picture a dealer in a poker game shuffling a deck of cards and you'll get a pretty accurate representation of what a server could do to protect the anonymity of its users. We divide the work of the server into 3 distinct phases:
- collect messages from the users
- shuffle the messages in a random order
- display the shuffled list of messages.
This sequence of phases then repeats over and over.
If the collection phase is long enough, and the number of users connecting and submitting messages is large enough, a shuffling server (we'll use the term mixer) is a pretty good anonymizer... with one major caveat: is must not be compromized.
Cascading mixers, a.k.a. mix-nets
Traditionally, i.e. in the pre-Riposte era, the solution to the problem of compromized mixers is to cascade many of them, locating each of them in a different location. The hope is that at least one mixer won't be compromized, and would at least minimally mix the messages. Such a network of mixers is called a mix net.
Picture a mix net like this: Alice, Bob, etc. connect to mixer \(M_1\) and submit messages. \(M_1\) buffers then shuffles all received messages. It then forwards the shuffled sequence of messages to mixer \(M_2\). \(M_2\) shuffles the messages again, and forwards the shuffled sequence to mixer \(M_3\)... The last mixer in the chain will mix the messages one last time and will finally output them.
There are many variations of mix nets. The topology of the cascade of mixers just described was linear. But nothing prevents us from building more complex topologies. And it is not necessary that all users connect to the same initial mixer, so long as each message traverses at least one non-compromized mixer.
Onion Routing
Suppose that an attacker compormized the first mixer \(M_1\). When Alice connects and submits \(m_1\), the attacker immediately knows that \(m_1\)'s author is Alice. Obviously, we need to fix that, pronto.
In the new and improved setting, Alice puts multiple layers of encryption around \(m_1\), one layer for each mixer in the cascade. Suppose that the public key of \(M_1\) is \(k_1\), the public key of \(M_2\) is \(k_2\), and the public key of the last mixer \(M_3\) is \(k_3\). If \(\{m\}_k\) designates \(m\) encrypted with key \(k\), i.e. \(\operatorname{Enc}(k, m)\), then Alice sends \(\{\{\{m_1\}_{k_3}\}_{k_2}\}_{k_1}\) to \(M_1\). The first mixer strips the outer layer of encryption using his private key, shuffles the messages, and sends as member of the shuffled sequence \(\{\{m_1\}_{k_3}\}_{k_2}\) to \(M_2\). The second mixer strips the next layer of encryption with his private key, shuffles the messages again, and sends among others \(\{m_1\}_{k_3}\) to \(M_3\). The last mixer strips the last layer of encryption, shuffles the messages again, and finally outputs all messages, including \(m_1\).
If \(M_1\) were compromized, the attacker still couldn't read \(m_1\), because Alice's message would still be encrypted by 2 layers of encryption. If the attacker compromized \(M_2\) as well, she still won't read \(m_1\). Of course, if she compromized \(M_3\) as well, then it's game over for Alice.
Zero-knowledge Proofs of Permutation
Unfortunately, not all is well with our onion routing. Suppose that an attacker compromized \(M_1\). She could wait until Alice connects and submits \(\{\{\{m_1\}_{k_3}\}_{k_2}\}_{k_1}\) as before. Now, the attacker would replace the content of all other slots of \(M_1\) except Alice's with random junk, but properly encrypted as \(\{\{junk\}_{k_3}\}_{k_2}\), and send the messages (with or without shuffling) to \(M_2\). At the end of the chain, \(M_3\) will display a sequence of \(junk\) messages... and a single meaningful message \(m_1\). The attacker deduces that Alice is indeed \(m_1\)'s author.
What enabled this attack, is that the other servers have no means to detect that \(M_1\) didn't properly shuffle the incoming messages. In other words, \(M_2\) and \(M_3\) couldn't verify that \(M_1\) executed the shuffle faithfully.
To solve that problem, \(M_1\) could prove to \(M_2\) and \(M_3\) that it did a proper shuffle. How can he do this? Well, he could send the original sequence of messages alongside the shuffled messages to its peers as a proof of shuffle. \(M_2\) and \(M_3\) would verify that the shuffled list is indeed a shuffle of the original sequence, and accept \(M_1\)'s proof. If \(M_1\) was compromized somehow, he would send a broken proof, which \(M_2\) and \(M_3\) couldn't verify
Let's set aside for now that a compromized \(M_1\) could simply forge a proof that all other messages are the \(junk\) he created. We assume that all peers' verifiers have somehow access to enough information to not only depend on \(M_1\) to compute verifications. This issue has solutions, but we won't dive into it for the sake of brevity.
It you've paid attention, you'll probably say: "Wait a sec! If \(M_1\) sends both the pre-shuffle and the post-shuffle sequence to \(M_2\) and \(M_3\), what's the point of shuffling the messages in \(M_1\) at all?" This is an absolutely valid objection! Sending the pre-shuffle sequence as proof-of-shuffle is therefore not an option. What is needed is for \(M_1\) to send a proof that it did indeed correctly shuffle a sequence, without disclosing the original pre-shuffle sequence. Or, seen from the perspective of the verifyers \(M_2\) and \(M_3\): how can they verify the proof of \(M_1\) with absolutly zero knowledge of the pre-shuffle sequence?
And that's exactly what a zero-knowledge proof (zk-proof) is all about.
In the case of mix-nets, in addition to onion-routing, the mixers need to execute a zero-knowledge proof of shuffling. There is an extensive literature on so called zk-proofs for secret shuffling (e.g. [Neff01, Astolfi11, FS01]), but suffice it so say that all those proofs ain't cheap and the time it takes to verify them grows at least linearly with the size of the shuffled sequence, times some factor. If we get a zk-proof that takes \(16 N\) operations with \(N\) being the size of the shuffle sequence, we would be very lucky indeed.
Riposte, a completely different construction, can do without zk-proofs of shuffle, and is thus much more scalable than mix-net constructions.
Understanding zero-knowledge proofs
If zero-knowledge proofs are still a mysterious concept to you (I know it was for me when I first met them), here is a beautiful presentation of zk-proofs by Prof. (emeritus) Michael Rabin (the Rabin of Miller-Rabin Primality Test fame), using "Where is Waldo?" and Sudoku examples:
And now, back to our regular scheduled programming.
Literature
- [Neff01] C. Andrew Neff: A Verifiable Secret Shuffle and its Application to E-Voting. 2001.
- [Astolfi11] Andrea Astolfi: On Proving Permutations in Zero-Knowledge. Master Thesis, 2011.
- [FS01] Jun Furukawa, Kazue Sako: An Efficient Scheme for Proving a Shuffle. In: Lecture Notes in Computer Science. Advances in Cryptology. CRYPTO 2001, pp 368-387. (full pdf, alternative source)
- [GMW91] Oded Goldreich, Silvio Micali, Avi Widgerson: Proofs that Yield Nothing But Their Validity or All Languages in NP Have Zero-Knowledge Proof Systems. (full pdf)
- [Goldreich10] Oded Goldreich: Zero-Knowledge: a tutorial.
- [GMR89] Shafi Goldwasser, Silvio Micali, Charles Rackoff: The Knowledge Complexity of Interactive Proof Systems (pdf).
- [Ryan14] Henry Ryan: Efficient Zero-Knowledge Proofs and Applications. PhD-Thesis.
- A Website on Zero-Knowledge Proofs (website, github sources)