From Mix-nets to Writable PIR

: Last Updated: February 11 2019

If you didn't skip the intermission, you'll recall that one necessary, albeit not sufficient, condition to guarantee anonymity was the shuffling of collected user messages before publication. In mix-net based systems, the shuffling was done by the server(s), which we called mixers.

A Gedankenexperiment

What if we moved the burden of mixing messages from the mixers to the users? Let's do a little silly Gedankenexperiment: Alice selects a random row index \(r_a\) to put her message \(m_a\) in. She connects to \(R_0\), and sends the pair \((r_a, m_a)\). \(R_0\) briefly verifies that \(0 \le r_a \lt N\) where \(N\) is the number of rows in the table, and then updates his table by unconditionally overwriting row \(r_a\) with \(m_a\). In other words, Alice "mixes" her message \(m_a\) into \(R_0\)'s table at a random location \(r_a\). Now, Bob comes along and mixes his message \(m_b\) at random location \(r_b\) of \(R_0\)'s table. Then Carol, David, and Eve do the same ... until the period is over, at which point \(R_0\) simply publishes the table, and starts a new period with a fresh empty table.

Note that no shuffle was performed, neither by the server, nor by the users. The mixing was implicit by users selecting random rows to add their messages in.

Exercise: Does this system provide anonymity in the sense that we are pursuing, i.e. that at the end of the period, it is impossible to tell who added \(m_a\) and \(m_b\)? If not, what needs to be fixed? Assume that all users connect to \(R_0\) with secure channels (providing secrecy and integrity). Assume however that a passive eavesdropper can see who connects to \(R_0\) and when. Discuss.

Analyzing the Gedankenexperiment

Good news first: the messages at the end of the period are indeed in a random order. A passive eavesdropper watching Alice, Bob, etc. establish secure channels to \(R_0\) can't deduce from the order of connections that the first message published by \(R_0\) at the end of the period belongs to the user that connected first. The same holds true for the second message and user connecting second, and so on. Since \(R_0\) collects a sufficient large number of messages before releasing the table, the passive eavesdropper can't mount a time-based traffic analysis to deanonymize users.

Bad news next: Alice and Bob could both select at random the same row \(r\). Alice submits \((r, m_a)\), and Bob submits \((r, m_b)\). Depending on who came last, his or her message will overwrite the other message. Of course, that would be a collision. While pesky, does this collision deanonymize Alice or Bob? Not necessarily, if at least two different rows have been populated by Alice, Bob, and possibly other users. Indeed, the only case where the passive eavesdropper can deanonymize a user is if all users collided on a single row \(r\): in this case, the eavesdropper concludes that the last user connecting to \(R_0\) must be the author of the message at row \(r\). As soon as another row \(R_1\) is populated, the eavesdropper can't tell anything with certainty... However, the more collisions occur in one period, the weaker the anonymity guarantee. If enough users submit messages at random locations and if the table is big enough, occasional collisions don't significantly hurt anonymity.

Really bad news last, but no least: If the server \(R_0\) is compromized, all users lose their anonymity, because \(R_0\) knows exactly on which row Alice, Bob, etc. are mixing their messages in.

Secret sharing to the rescue?

Remember secret sharing from the high-level introduction to Riposte? This is how Alice could send \(m_a\) to \(R_0\) and \(R_1\), such that neither \(R_0\) nor \(R_1\) could read \(m_a\) without the help of the other server [Y82]:

Alice picks a random \(r\): \(r \overset{R}{\leftarrow} \mathbb F\)
Alice sends \(r\) to \(R_0\)
Alice sends \((-r + m_a)\) to \(R_1\)

All additions understood modulo \(|\mathbb F|\). At the end of the period, \(R_0\) and \(R_1\) came together and added their messages to reveal the secret \(m_a\):

\[ r + (-r + m_a) = m_a \]

Could we somehow integrate secret sharing into our Gedankenexperiment? Let's try it. Assume two non-colluding servers \(R_0\) and \(R_1\), i.e. at most one of them may be compromized, but not the other.

Alice may want to "blind" her message \(m_a\) with a random \(r\) just as she did with secret sharing. Therefore, she picks a random blinding \(r\) and a random row index \(r_a\). She then sends the pair \((r_a, r)\) to \(R_0\), and the pair \((r_a, -r + m_a)\) to \(R_1\). Then Bob does the same with \(m_b\) at location \(r_b\), and so on. At the end of the period, \(R_0\) and \(R_1\) would add their tables row-wise. At row index \(r_a\), they'll recover \(m_a\), at index \(r_b\), they'll recover \(m_b\), etc.

Looks like a neat trick! But does it satisfy our anonymity requirements?

Let's assume that \(R_1\) is compromized. While Alice successfully hid \(m_a\) from the prying eyes of \(R_1\), she leaked \(r_a\) when she sent \((r_a, -r + m_a)\) to \(R_1\). In other words, the attacker at \(R_1\) now knows that barring collisions, the message at \(r_a\) whatever it is, belongs to Alice. When, at the end of the period, after adding the rows with \(R_0\), \(m_a\) shows up in row \(r_a\), the attacker in \(R_1\) will easily deduce that Alice submitted \(m_a\).

So, we're getting closer, but we're not there yet: Alice needs not only to blind \(m_a\), she also needs to blind \(r_a\) somehow.

The Path to Writable PIR

Effectively blinding the location of the updated row can be extremely tricky. Instead, Alice now decides to update all rows of \(R_0\) and of \(R_1\), hiding her message \(m_a\) in one of these rows. This is what she could do, assuming that the database tables contain \(L\) rows:

Alice picks \(L\) random numbers: \((r_0, r_1, r_2, \cdots, r_{L-1}) \overset{R}{\leftarrow} {\mathbb F}^L\)
Alice picks a random row index \(r_a \overset{R}{\leftarrow} \{0, 1, 2, \cdots, L-1\}\)
Alice sends \((0, r_0), (1, r_1), \cdots, (r_a-1, r_{r_a-1}), (r_a, r), (r_a+1, r_{r_a+1}), \cdots, (L-1, r_{L-1})\) to \(R_0\)
Alice sends \((0, -r_0), (1, -r_1), \cdots, (r_a-1, -r_{r_a-1}), (r_a, -r + m_a), (r_a+1, -r_{r_a+1}), \cdots, (L-1, -r_{L-1})\) to \(R_1\)

Since all users now update all rows, we can simplify the protocol, by omitting the indices. Steps 3. and 4. change to:

Alice sends \((r_0, r_1, \cdots, r_{r_a-1}, r, r_{r_a+1}, \cdots, r_{L-1})\) to \(R_0\)
Alice sends \((-r_0, -r_1, \cdots, -r_{r_a-1}, -r + m_a, -r_{r_a+1}, \cdots, -r_{L-1})\) to \(R_1\)

In other words, she sends different random numbers to all rows except to the row \(r_a\) that will contain her blinded message: \(r\) for \(R_0\) and \(-r + m_a\) for \(R_1\). An attacker sitting in either \(R_0\) or \(R_1\) can't tell which row Alice selected for her message, i.e. \(r_a\) is well-hidden. Since Alice updated every row with what looks like random values, neither \(R_0\) nor \(R_1\) knows which row Alice selected for \(m_a\), nor do they know \(m_a\) before the end of the period.

Unfortunatly, if the servers \(R_0\) and \(R_1\) overwrote all their rows with the new values provided by Alice as in the Gedankenexperiment, they'll lose all other users' messages (and random numbers) that they previously stored in their tables. Did Alice just thrash the common database by updating all rows? Well, yes, of course she did!

Fixing the database thrashing problem

We modify the Gedankenexperiment one last time: instead of overwriting the rows with user-supplied values, \(R_0\) and \(R_1\) are now supposed to add those values to the already existing values. In other words, the code than \(R_0\) and \(R_1\) are running looks something like this:

for (int i=0; i != table.size(); ++i) { 
    // table[i] = user_values[i]; 
    table[i] += user_values[i]; 
}

Of course, the addition is meant modulo \(|\mathbb F|\).

Have you noticed that the random values we've sent to \(R_1\) are the additive inverse of those we've sent to \(R_0\)? This was intentional and not a typo.

What happens when Bob comes along after Alice? Let's run a complete concrete walked example on a set of two servers with only 5 rows to see what happens.

A Toy Example

Suppose we have 2 Riposte servers, \(R_0\) and \(R_1\). Each one of them maintains a table in RAM with 5 rows. Each row can store a 160 chars whisp, which we interpret as a number in the finite field \(\mathbb F = \{0, 1, 2, \cdots, 2^{8 \times 160} - 1\}\). As usual with finite fields, all additions and subtractions on elements of \(\mathbb F\) occur modulo \(|\mathbb F|\).

In practice, Riposte servers will store tables with many more rows than 5. Something like 1000000 rows seem appropriate. But for now, 5 rows suffice.

Both \(R_0\) and \(R_1\) are synchronized in the sense that they agree when a new period starts. At the beginning of a new period, their tables are empty, i.e. all zeroes:

\(R_0: (0, 0, 0, 0, 0)\)
\(R_1: (0, 0, 0, 0, 0)\)

Alice wants to send message \(m_a \in \mathbb F\). She randomly choses a row, and decides that row to be 2 (we start counting rows at index 0). So she generates the vector \((0, 0, m_a, 0, 0)\). Then, she creates a shared secret on this vector like this:

\( v_1 := (r_0, r_1, r_2, r_3, r_4) \overset{R}{\leftarrow} {\mathbb F}^5 \)
\( v_2 := (-r_0, -r_1, -r_2+m_a, -r_3, -r_4) \)

Here, \(v_1\) is a vector of random numbers chosen uniformly in \(\mathbb F\), and the negative values are understood to be taken modulo \(|\mathbb F|\), of course, so they are still in \(\mathbb F\).

Note that to an outsider, both \(v_1\) and \(v_2\) look perfectly random.

Now, Alice logs into \(R_0\) and \(R_1\), and sends \(v_1\) to \(R_0\) and \(v_2\) to \(R_1\). Both Riposte servers first authenticate Alice, and then add the vectors they've just received to their own tables. Since they just started a new period:

\(R_0: (0, 0, 0, 0, 0) + v_1 = (r_0, r_1, r_2, r_3, r_4) \)
\(R_1: (0, 0, 0, 0, 0) + v_2 = (-r_0, -r_1, -r_2+m_a, -r_3, -r_4) \)

Now, Bob comes along and wants to send message \(m_b\). He randomly choses row 4 and computes:

\( w_1 := (s_0, s_1, s_2, s_3, s_4) \overset{R}{\leftarrow} {\mathbb F}^5 \)
\( w_2 := (-s_0, -s_1, -s_2, -s_3, -s_4+m_b) \)

Again, \(w_1\) and \(w_2\) look perfectly random. You've guessed it: he logs into \(R_0\) and into \(R_1\) and sends \(w_1\) to \(R_0\) and \(w_2\) to \(R_1\). Both servers, having verified Bob's credentials, add \(w_1\) and \(w_2\) to their tables:

\(R_0: (r_0, r_1, r_2, r_3, r_4) + w_1 = (r_0 + s_0, r_1 + s_1, r_2 + s_2, r_3 + s_3, r_4 + s_5) \)
\(R_1: (-r_0, -r_1, -r_2+m_a, -r_3, -r_4) + w_2 = (-r_0-s_0, -r_1-s_1, -r_2+m_a-s_2, -r_3-s_3, -r_4-s_4+m_b) \)

Then Carol, David, and Eve would come along and the cycle repeats... until the end of a period. We assume for now that only two whisps were accepted.

At this point, \(R_0\) and \(R_1\) momentarily stop accepting new vectors. They come together and add their tables. Both compute

\[ (r_0 + s_0, r_0 + s_1, r_2 + s_2, r_3 + s_3, r_4 + s_4) + (-r_0-s_0, -r_1-s_1, -r_2+m_a-s_2, -r_3-s_3, -r_4-s_4+m_b) \]

and lo and behold!, they both get \((0, 0, m_a, 0, m_b)\).

In other words, Alice's \(m_a\) and Bob's \(m_b\), by the magic of shared secret, materialized there where Alice and Bob have decided to put them. All other rows are empty / unused.

How does this provide anonymity?

1. Before the period ends

From the perspective of \(R_0\), Alice and Bob are merly sending junk. At no point in time does \(R_0\) know that Alice is the author of \(m_a\), nor that Bob is the author of \(m_b\). More interestingly, neither does \(R_1\): even though \(m_a\) and \(m_b\) are hidden in the vectors submitted by Alice and Bob to \(R_1\), since these values are blinded by random junk, \(R_1\) still can't distinguish both cases.

Even though it might seem that all users are sending random junk to \(R_0\) and blinded messages hidden in random junk to \(R_1\), it doesn't have to be that way. In this simple example, Bob could just as well send \(w_1\) to \(R_1\) and \(w_2\) to \(R_0\). In this case, the secret sharing method is symmetric, because addition modulo \(|\mathbb F|\) is commutative.

The main point though is that Alice touches all rows of \(R_0\) and of \(R_1\) (and so does Bob). Even if, say, \(R_1\) were compromized, knowing that \(v_2\) was indeed sent by Alice would still reveal nothing... at least as long as the period is still running its course.

2. After the period ends

When \(R_0\) and \(R_1\) exchange their tables so that they may add them to their own, the messages \(m_a\) by Alice and \(m_b\) by Bob are revealed, as well as are the rows where Alice and Bob have put them in. Does this de-anonymize Alice and Bob? Or, asked more precisely since the registered users Alice and Bob are obviously known to the Riposte servers, does it allow \(R_0\), \(R_1\), or the outside world to tell that \(m_a\) was from Alice and that \(m_b\) was from Bob?

Let's stop and think about it for a second. What does \(R_0\) see w.r.t Alice and Bob?

\(R_0\) sees \(v_1\) from Alice: \((r_0, r_1, r_2, r_3, r_4)\)
\(R_0\) sees \(w_1\) from Bob: \((s_0, s_1, s_2, s_3, s_4)\)
\(R_0\) sees \(ss_2 := (-r_0-s_0, -r_1-s_1, -r_2+m_a-s_2, -r_3-s_3, -r_4-s_4+m_b)\) from \(R_1\)
\(R_0\) computes \(v_1 + w_1 + s_2 = (0, 0, m_a, 0, m_b)\)

Steps 1. and 2. reveal absolutely nothing about Alice and Bob.

What about \(ss_2\)? No matter what \(R_0\) does with \(v_1\), \(w_1\), and \(so_2\), there's no way to isolate \(m_a\) or \(m_b\), save form exposing both simultaneously. Therefore, it is impossible to identify Alice's or Bob's message. Think about it if you're not convinced.

The same holds true for \(R_1\): by looking at \(v_2\), \(w_2\) and \(ss_1 := (r_0 + s_0, r_1 + s_1, r_2 + s_2, r_3 + s_3, r_4 + s_4)\), \(R_1\) can't isolate \(m_a\) or \(m_b\). Here too, all \(R_1\) can do is to expose \(m_a\) and \(m_b\) simultaneously.

The crucial point is that \(R_0\) doesn't reveal the intermediary steps needed to compute \(ss_1\) to \(R_1\), and vice-versa that \(R_1\) doesn't reveal the intermediary steps needed to compute \(ss_2\) to \(R_0\). By not revealing these steps, and only exposing the aggregated state at the end of the period, both servers are said to not have colluded.

3. What if both servers colluded?

Then, all bets are off. Recall that colluding means to reveal intermediary steps to the other server. The following scenarios come to mind:

If \(R_0\) had seen \(v_2\) from \(R_1\), this would have immediately exposed Alice as the author of \(m_a\) to \(R_0\)
If \(R_0\) had seen \(w_2\) from \(R_1\), this would have immediately exposed Bob as the author of \(m_b\) to \(R_0\)
If \(R_1\) had seen \(v_1\) from \(R_0\), this would have immediately exposed Alice as the author of \(m_a\) to \(R_1\)
If \(R_1\) had seen \(w_1\) from \(R_0\), this would have immediately exposed Bob as the author of \(m_b\) to \(R_1\)

As long as at least one of the servers stays honest and doesn't collude, Alice and Bob's identities as the authors of a specific message remain hidden.

Of course, the anonymity set (here \(\{Alice, Bob\}\)) has to contain at least two writers. The more, the better.

4. What about the rest of the readers?

The users of the app only see the final state after the period has completed. I.e. they see \((0, 0, m_a, 0, m_b\)). In practice, the app may blend out the empty slots, so it would display \((m_a, m_b)\), but this doesn't change anything. As long as Alice and Bob chose their slots randomly in a uniform matter, neither compromized servers, nor users can deduce anything about the writers. Without this requirement, Alice could easily leak her identity, if she had some bias in choosing her rows.

5. What can the NSA do?

The NSA has the following options at her disposal:

She could seize \(S_1\) or \(S_2\). If she seizes both, all bets are off, and Alice and Bob lose. If she seizes only one of both, she's in the same difficult spot we were above, when the other server doesn't collude.
She could intercept the synchronization traffic between \(S_1\) and \(S_2\) at the end of the period. All she would see are the aggregated states of both servers. But since she lacks the intermediary steps (careful here, there be dragons!) she won't be able to associate Alice to \(m_a\) or Bob to \(m_b\).
She could modify that post-period synchronization traffic. This would merely corrupt the end-result, but won't help her at all
She could modify the period synchronization signals between \(S_1\) and \(S_2\), such that both servers, even when not compromized, are no more in sync.
She could perform traffic analysis, but since the Riposte servers wait a period for the anonymity set to grow reasonably large before publishing their final state, even if the NSA controls \(n-1\) out of \(n\) servers, and observes their traffic, it would still be no good (or does it?)
Other options

We'll elaborate on the warning in item 2 in just a moment. But let's first concentrate on item 4. I didn't find anything in the Riposte paper about this special kind of attack, and although I didn't try to prove it, my feeling is that even without colluding, even honest servers, just by being mis-synchronized, could leak identites somehow. This should really be investigated in future work.

An important caveat

If the NSA, thanks to her trememdous interception capabilites, can passively observe the traffic of all Riposte servers of a cluster, all bets are off. Why is that so? With the current version of the Riposte protocol, all the NSA has to do is run her own sets of shadow Riposte servers, and to send copies of users' traffic to those servers, which would perform the exact same calculations than the legitimate servers. Note that the internal states of those servers would faithfully mirror those of the legitimate servers. But, and that's the kicker, since the NSA would have full control of \(n\) out of her \(n\) shadow servers, she can have them all collude, and again, it's game over for Alice and Bob (and Carol, and David, and Eve...).

Of course, if the NSA can't eavesdrop on at least one ot the servers of the Riposte cluster, all is well.

The conclusion of this is: at least one non-colluding server is needed, and the traffic to this server needs to be hidden from a global adversary.

IMHO, future work should focus on mitigating this attack. We need to update the Riposte model/protocol in such a way that servers include secret information, so that a passive eavesdropper on that server's traffic still can't clone it. Presently, the only line of defense is that the Riposte clients and servers used secure channels (encrypted and authenticated), which hinders mirroring. But that seems rather fragile. Hardening the Riposte servers against cloning of the full server farm may be needed and should be investigated.

Improving Riposte v1

We need to take care of the following points in subsequent articles of this mini-series:

We'll reduce the traffic form clients to the Riposte servers with Distributed Point Functions
We'll increase the amount of whisps that servers can store, without increasing the number of rows
We'll prevent malicious users who don't send consistent vectors to the servers from thashing the database by adding an audit server
We'll move from \(2\) to \(n\) Riposte servers setup, of which \(n - 1\) may collude without adverse effects to the anonymity of the users.

Literature

[Y82] Andrew C. Yao: Protocols for Secure Computations (extended abstract). (IEEE (paywalled). full pdf)