One drawback of Riposte v1, and of writable PIR schemes in general, was the huge amount of traffic from the clients to the Riposte servers. Distributed point functions (DPF), a cryptographic primitive, save the day by dramatically reducing this amout of traffic. In this article, we'll introduce distributed point functions, In the following article, we'll see how \((2,1)\)-distributed point functions, a special case of DPF for the two servers setup, are implemented. The more general \((n, n-1)\)-distributed point functions we'll investigate later.

The story so far

Suppose that Alice wants to insert a whisp \(m_a\) at a random location \(1\) of a 5-rows table on two toy Riposte servers:

  1. \(\vec{m} := (m_0, m_1, m_2, m_3, m_4) \leftarrow (0, m_a, 0, 0, 0)\)
  2. \(\vec{k_0} := (k_0, k_1, k_2, k_3, k_4) \overset{R}{\leftarrow} {\mathbb F}^5\)
  3. \(\vec{k_1} := -\vec{k_0} + \vec{m}\)

She then sends \(\vec{k_0}\) to server \(R_0\) and \(\vec{k_1}\) to server \(R_1\). Both servers then add those vectors to their tables.

But, of course, sending \(\vec{k_0}\) and \(\vec{k_1}\) is very expensive, the bigger they get. We want to somehow "compress" the amount of data sent to \(R_0\) and \(R_1\).

A distributed Client/Server computation

If you think about it in very general terms, the scheme above is a distributed computation, with Alice on the client side, and the Riposte servers on the server side each performing some algorithms. Let's formalize what each side really does.

  • On the client side, Alice runs some algorithm \(\vec{m} \mapsto (\vec{k_0}, \vec{k_1})\) to generate the data to send to the servers. She then sends \(\vec{k_0}\) to \(R_0\) and \(\vec{k_1}\) to \(R_1\).
  • On the server side, each of the servers \(R_i\) run a state-transforming algorithm on their tables \((\vec{t},\vec{k}) \mapsto \vec{t}+\vec{k}\), where \(\vec{t}\) is the table of the server, \(\vec{k}\) is the vector received from Alice, and \(+\) is a vector addition (component-wise modulo \(|\mathbb F|\)), with \(L\) being the number of rows of the table \(t\)).

Introducing Gen and Eval

In the previous section, \(\vec{m}\) was an arbitrary vector. But in reality, it is very special: only one coordinate contained the whisp \(m_a\), and all other coordinates were \(0\). In other words, \(\vec{m}\) was sparse. Because of this, we don't really need to pass all of \(\vec{m}\) to both client-side and server-side algorithms. Passing the index \(l\) and the value \(m\) (the message) suffices.

We therefore introduce a client-side algorithm \(\operatorname{Gen}: (l,m) \mapsto (\vec{k_0}, \vec{k_1})\), with \(m\) being the message to insert in the servers' tables at row number \(l\). Note that this algorithm isn't deterministic, it is randomized.

On the server-side, we introduce an Algorithm \(\operatorname{Eval}: (\vec{k}, l) \mapsto m'\), which given a vector \(\vec{k}\) from a client as generated by algorithm \(\operatorname{Gen}(l,m)\), computes the value \(m'\) that the server is supposed to add to the value already stored at row number \(l\). \(\operatorname{Eval}\) is supposed to be executed for each row of the table \(\vec{t}\) by the state-transforming algorithm like this:

\[ (\vec{t}, \vec{k}) \mapsto \vec{t} + \begin{pmatrix} \operatorname{Eval}(\vec{k}, 0) \\ \operatorname{Eval}(\vec{k}, 1) \\ \cdots \\ \operatorname{Eval}(\vec{k}, L-1) \end{pmatrix} \]

Riposte v1's Gen and Eval

As an example, \(\operatorname{Gen}\) and \(\operatorname{Eval}\) for the Riposte v1 shown above, can be specified like this:

Algorithm \(\operatorname{Gen}(l,m)\):

  1. \(\vec{k_0} := (k_0, k_1, k_2, \cdots, k_{L-1}) \overset{R}{\leftarrow} {\mathbb F}^L\)
  2. \(\vec{k_1} := -\vec{k_0}\)
  3. \(\vec{k_1}[l] \leftarrow \vec{k_1}[l] + m\)
  4. Return \((\vec{k_0}, \vec{k_1})\)

Algorithm \(\operatorname{Eval}(\vec{k}, l)\):

  1. Return \(\vec{k}[l]\)

Spoiler alert: reducing the traffic

All this formalism with \(\operatorname{Gen}(l,m)\) and \(\operatorname{Eval}(\vec{k}, l)\) didn't reduce the traffic between a client and the Riposte servers. But we're not bound to use the specific algorithms for \(\operatorname{Gen}\) and \(\operatorname{Eval}\) that we specified for Riposte v1. If we could somehow come up with algorithms that generate shorter "keys" \(\vec{k_0}\) and \(\vec{k_1}\) on the client side, and which would "expand" such a key \(\vec{k}\) on the server side to a vector of table size \(L\), this would elegantly solve our bandwidth problem.

Such a solution is indeed possible, thanks to the work of Gilboa and Ishai who introduced Distributed Point Functions, which are themselves a special case of Function Secret Sharing[BGI15], a very important cryptographic primitive.

Formal Definition of Distributed Point Functions

Gilboa and Ishai[GI14] defined and introduced \((2,1)\)-distributed point functions for a 2-server setup, where at most one server can be compromized without destroying anonymity. Corrigan-Gibbs, Boneh, and Mazières[CgBM15] generalized the definitions to \((n, n-1)\)-distributed point functions to better suit an \(n\)-server setup, of which up to \(n-1\) servers can collude, yet still preserve anonymity. We'll repeat their definition here.

First of all, let's define a (non-distributed) point function, to justify the name "distributed point function:"

Definition: (Point function) Fix a positive integer \(L\) and a finite field \(\mathbb F\). For all \(l\) in \(\mathbb Z_L\), and \(m\) in \(\mathbb F\), the point function \(P_{l,m} : \mathbb Z_{L} \rightarrow \mathbb F\) is a function such that \(P_{l,m}(l) = m\) and \(P_{l,m}(l') = 0\, \forall l' \ne l\).

In other words, \(P_{l,m} = 0\) everywhere except at index \(l\), where it is equal to \(m\). The whole "weight" \(m\) of the function is concentrated in row \(l\).

In the context of Riposte, \(L\) is the number of rows of a Riposte server. Each row contains a whisp in \(\mathbb F\). Examples for \(L\) is \(5\) for the toy example with 5 rows, or \(1,000,000\) for a more realistic example. Picture \(\mathbb F\) as a single whisp, e.g. of 160 bytes: in this case \(\mathbb F = \{0, 1, 2, \cdots, 2^{8 \times 160}-1\}\).

Now is time to define an \((s,t)\)-distributed point function. Intuitively, it is a function for \(s\) servers, of which up to \(t\) may collude.

Definition: (\((s,t)\)-distributed point function) Fix a positive integer \(L\) and a finite field \(\mathbb F\). An \((s,t)\)-distributed point function consists of a pair of possibly randomized algorithms that implement the following functionalities:

  • \(\operatorname{Gen}(l,m) \mapsto (k_0, \cdots, k_{s-1}) \): given an integer \(l \in \mathbb Z_L\) and a value \(m \in \mathbb F\), output a list of \(s\) keys \(k_i \in \mathcal K\).
  • \(\operatorname{Eval}(k,l') \mapsto m'\): given a key \(k \in \mathcal K\) generated using \(\operatorname{Gen}\) and an index \(l' \in \mathbb Z_L\), return a value \(m' \in \mathbb F\).

As you can see, the Riposte v1 algorithms \(\operatorname{Gen}\) and \(\operatorname{Eval}\) that we've defined in the previous section define an \((2, 1)\)-distributed point function.

An \((s,t)\)-distributed point function generates \(s\) keys with the algorithm \(\operatorname{Gen}\). The domain \(\mathcal K\) of those keys wasn't defined in [CgBM15]: it is up to the implementation to pick a suitable domain. The idea is that key \(k_0\) is being sent to server \(R_0\), key \(k_1\) ot server \(R_1\), ... and key \(k_{s-1}\) to server \(R_{s-1}\). Those keys encode both the message \(m \in \mathbb F\), as well as the destination row index \(l \in \mathbb Z_L = \{0, 1, 2, \cdots, L-1\}\) of the \(L\)-rows table that we want \(m\) to be stored in.

When a key \(k\) arrives at a server \(R_i\), this server will then apply \(\operatorname{Eval}(k,l')\) to each row index \(l'\) of its \(L\)-element table to restore a value \(m' \in \mathbb F\) that it will add to the value that is already stored there, just as we explained in the previous section.

Correctness and Privacy of DPFs

The above definition alone isn't sufficient to give the desired semantic connection between \(\operatorname{Gen}(l,m)\) and \(\operatorname{Eval}(k,l')\) that we've just described. To tie both together, we need the properties of correctness and privacy.

Definition: (correctness of DPF) \(\forall l,l' \in \mathbb Z_L, \forall m \in \mathbb F\):

\[ \operatorname{Pr}[(k_0, \cdots, k_{s-1}) \leftarrow \operatorname{Gen}(l,m): \sum_{i=0}^{s-1} \operatorname{Eval}(k_i, l') = \operatorname{P}_{l,m}(l') ] = 1 \]

Here, \(\operatorname{Pr}\) is the probability that the event in square brackets occurs. In other words, if we send the keys \((k_0, \cdots, k_{s-1})\) generated by \(\operatorname{Gen}(l,m)\) at the client side to the servers \(R_0, \cdots, R_{s-1}\) respectively, and if each server \(R_i\) calculates the vector \(\vec{m_i}' := \begin{pmatrix} \operatorname{Eval}(k_i, 0) \\ \operatorname{Eval}(k_i, 1) \\ \cdots \\ \operatorname{Eval}(k_i, L-1) \end{pmatrix} \), then when all those  servers come together and add those vectors \(\vec{m}' := \sum_{i=0}^{s-1} \vec{m_i}'\), we get the vectorial output of the point function \(\begin{pmatrix} P_{l,m}(0) \\ P_{l,m}(1) \\ \cdots \\ P_{l,m}(L-1) \end{pmatrix}\), or, in other words, the vector \((0, \cdots, 0, m, 0, \cdots, 0)\) with \(m\) at index \(l\).

Intuitively, this means that each of the keys \(k_i\), when evaluated by server \(R_i\), generates a vector \(\vec{m_i}'\), which contains parts of the shared secret of their cross-servers sum, which inserts \(m\) in the row index \(l\), without touching the remaining rows.

Definition: (privacy of DPF) Let \(S\) be any subset of \(\{0,1, \cdots, s-1\}\), such that \(|S| \le t\). Then \(\forall l \in \mathbb Z_L, \forall m \in \mathbb F\), let \(D_{s,l,m}\) denote the distribution of the keys \(\{(k_i) | i \in S\}\) induced by \((k_0, \cdots, k_{s-1}) \leftarrow \operatorname{Gen}(l,m)\). An \((s,t)\)-distributed function maintains privacy if there exists a probabilistic polynomial time (p.p.t) algorithm \(\operatorname{Sim}\), such that the following distributions are computationally indistinguishable:

\[ D_{s,l,m} \approx_c \operatorname{Sim}(S) \]

In other words: no subset of the keys \(\{k_0, \cdots, k_{s-1}\}\) with \(t\) or fewer keys leaks any information about the encoded message \(m\) or its location \(l\).

If you're interested, [CgBM15] uses these definitions to prove that a cluster of \(s\) Riposte servers, of which up to \(t\) servers may collude, which is realized with an \((s,t)\)-distributed point function that is correct and that maintains privacy does indeed behave as intended in collecting all messages sent to it (correctness) and that this collusion won't breach anonymity of the senders w.r.t. who sent what message (privacy).

How Lurkers Can Increase The Anonymity Set For Free

Before we conclude this episode, let's see how (read-only) readers can increase the anonymity set of a Riposte cluster for free. This is a nice property of the writable PIR scheme.

In a real-world setting, we expect that the number of lurkers, i.e. or users who read but don't write, will be many orders of magnitude that of writers. For example, a single active whistleblower's whisps could be read by many thousands of interested subscribers, who in turn aren't that much interested in whispering. Interestingly, we can modify the Riposte protocol in such a way that merely reading would be equivalent to submitting a message \(0\) to a random row of the distributed PIR database. Since adding \(0\) to a row doesn't change its content, we'll ultimately get a no-op upon consolidation at the end of the epoch. Yet, each read operation still forces the servers to update their local tables... and since they can't establish by their own what was written, they can only assume that the lurkers are also writing, i.e. that they are part of the anonymity set: eventually, every client that connects to the servers in a period is a potential writer, as far as the servers go.

In the next episode...

You've already seen how Riposte v1 was realized with the help of \((2,1)\)-distributed point function using a specific set of algorithms \(\operatorname{Gen}\) and \(\operatorname{Eval}\). However, each key \(k\) had a length of the order \(O(L)\) of the whole table.

\[ |k| = O(L), \forall k \in \mathcal K = \mathbb F^L\ \]

In the next article, we'll implement a \((2,1)\)-distributed point function which generates keys of length \(O(\sqrt{L})\).