The Sum of Our Earnings

: Last Updated: January 15 2019

In the previous episode, we saw how to compute the AND function of two inputs using the secure multiparty computation (MPC) paradigm. Recall that in MPC, parties contribute inputs to a function \(f\) which they collectively compute, but without knowing the inputs provided by the other parties. In this crypto bite, we'll take first steps towards computing the SUM of \(n\) inputs in secure multiparty computation. We won't show the full MPC scheme here yet. Instead, we'll whet your appetite by computing the sum of the earnings of all the people sitting in a room using a (weak) form of secret sharing called masking, due to Yao [Y82].

How much do we make together?

Suppose that \(n\) people \(P_1, P_2, \dots, P_n\) are sitting in a room. \(P_1\) earns \(x_1\), \(P_2\) earns \(x_2\) and so on. For some reason, they need to know how much they earn together, i.e. the sum of their earnings \(\sum_{i=1}^n x_i\). Of course, they don't want to tell the other people in the room how much they earn. They also don't want the other people to be able to infer by the computations how much they make.

There's a general solution to this problem, using Shamir's Threshold Scheme, which is a clever method of implementing secret sharing. In this crypto bite, we'll present a simpler, albeit not so general solution, and defer Shamir's scheme to the next episodes.

A Trivial Solution

The trivial solution involves an additional trusted third party \(P_0\): each participant \(P_1, P_2, \dots, P_n\) secretly tells \(P_0\) his/her earnings \(x_i\), and \(P_0\) privatly computes the sum \(y = \sum_{i=1}^n x_i\). Finally \(P_0\) announces \(y\) to all parties.

Now, how can we do without \(P_0\)? Can the parties \(P_1, P_2, \dots, P_n\) collectively compute the sum, without revealing their earnings \(x_1, x_2, \dots, x_n\) to the others? Surprisingly, they can!

A Privacy-preserving Solution

The idea here is to let the parties compute the sum on masked data. What does it mean? Each party should get the partial sum computed by the predecessors, but this sum should not be the real sum.

Assumptions

The \(n\) people are sitting in a circle, and are only allowed to talk to their direct neighbors.
We can estimate an upper bound \(M\) of the sum of all earnings. \(M\) doesn't have to be very tight, but it shouldn't be too high.
All computations occur modulo \(M\).

The algorithm

Blinded Sum

\(P_1\) runs the following algorithm:

\(r \overset{R}{\leftarrow} \{0, 1, \dots, M-1\}\)
\(m_1 = x_1 + r \pmod{M}\)
Send \(m_1\) to \(P_2\)
Receive \(m_n\) from \(P_n\)
\(y = m_n - r \pmod{M}\)
Announce \(y\) to all parties

In other words, \(P_1\) uniformly samples a random \(r\) from the set of all possible sums \(\{0, 1, \dots, M-1\}\) and adds this random value to his own earnings \(x_1\). He then sends this sum \(m_1\) to his next neighbor \(P_2\). That's it. Then \(P_1\) waits until all other parties finish their computations, and will receive a result \(m_n\) from his neighbor \(P_n\) on the other side. After subtracting \(r\) from \(m_n\), he announces the result \(y\) to all parties.

We'll see in a moment why \(y\) will be the sum of all \(x_i\), while at the same time preserving privacy. But for this, we first need to show the algorithm ran by the other parties.

\(P_i \colon i \in \{2, 3, \dots, n\}\) runs the following algorithm:

Receive \(m_{i-1}\) from \(P_{i-1}\)
\(m_i = x_i + m_{i-1} \pmod{M}\)
Send \(m_i\) to \(P_{i+1}\)

That's it! Each party \(P_i\) except \(P_1\) simply adds its own earnings \(x_i\) to the message received from his predecessor, and sends the result to his successor \(P_{i+1}\).

Why does this work?

First of all, both algorithms are well-defined. \(P_1\) only uses its own earnings \(x_1\), the random \(r\) it sampled, and the final message \(m_n\) it got from \(P_n\). Each party \(P_i, i \in \{2, 3, \dots, n\}\) only uses data it is supposed to know: its own earnings \(x_i\), and the message \(m_{i-1}\) it got from its predecessor.

Finally, the computed result \(y\) is indeed the desired sum:

\[ \underbrace{\underbrace{\underbrace{r + x_1}_{m_1} + x_2}_{m_2} + \dots + x_n}_{m_n} - r = \sum_{i=1}^n x_i \pmod{M} \]

modulo \(M\), but since \(M\) was an upper bound for all likely sums, \(y\) is actually the sum itself.

Why does it preserve privacy?

It may be obvious to the experienced cryptologist, but in case you have any doubts, here's an informal argument. As an exercise, try to turn it into a formal proof:

\(P_2\) can't infer \(x_1\) from \(m_1 = x_1 + r\), because \(r\) was a random number uniformly sampled from a rather large set. We say that \(P_1\) masked \(x_1\) by adding \(r\) to it. As far as \(P_2\) is concerned, \(m_1\) is indistinguishable from a random number drawn uniformly from the set \(\{0, 1, 2, \dots, M-1\}\).

\(x_1\) is only really hidden, if all additions are performed modulo \(M\), with a suitable value for \(M\). If we computed the sums in \(\mathbb{N}\), there would be some leakage, even if \(r\) is uniformly random. We won't delve into this here, but feel free to further explore this issue.

If you've read this crypto bite on Riposte, you may recognize that we're using the very same secret hiding scheme.

\(P_3\) can't infer \(x_1\) from \(m_2\) for the very same reason. Furthermore, \(P_3\) can't infer \(x_2\) from \(m_2\). Why? Because as far as \(P_3\) is concerned, it merely got something of the form \(x_2 + r'\), with \(r'\) being indistinguishible from a random number uniformly sampled from the same set as \(r\).

For the same reasons, all other parties \(P_i\) can't infer \(x_1, x_2, \dots, x_{i-1}\) from \(m_{i-1}\).

In the other direction, \(P_i\) can't infer \(x_{i+1}, x_{i+2}, \dots, x_n\), since he/she has no access to \(m_{i+1}, m_{i+2}, \dots, m_n\).

Exercise: What about \(P_1\)? Since \(P_1\) knows \(r\), he can unmask the sum. Since he's some kind of "special party", does he have more powers than his peers? In other words, can he also infer, say, \(x_{n}\)? \(x_{n-1}\)? \(x_2\)? What does this tell us about the minimum value of \(n\)?

Exercise: Can any of the parties \(P_1, P_2, \dots P_n\) infer something from the announced sum \(y\) at the end of the computations, except their own \(x_i\)? Could this lead to the discovery of an \(x_j, j \ne i\)?

To summarize: in a nutshell, for each \(i \in \{2, 3, \dots, n\}\), all messages received by \(P_i\) are indistinguishable from a random \(r'\) sampled uniformly from the set of all possible sums. This indistinguishability from a uniform distribution is what makes this scheme private.

What about cheating?

Gossipy, but otherwise honest parties

If some party violated the protocol by being gossipy, all bets are off. This is the case even when all parties are faithfully following the protocol (except for being gossipy), i.e. if they were otherwise honest.

For example: suppose that \(P_2\) was gossipy and sent \(m_2\) not only to \(P_3\) but also to her best friend forever \(P_4\).

Blinded Sum With Gossipy Cheater

\(P_4\) would therefore get:

\(m_3 = x_3 + x_2 + x_1 + r \pmod{M}\) from \(P_3\) as required by the protocol
\(m_2 = x_2 + x_1 + r \pmod{M}\) from her gossipy friend \(P_2\) in violation of the protocol

and would simply subtract both values:

\(m_3 - m_2 = x_3 \pmod{M} = x_3\)

In other words, \(P_4\) will be able to infer earnings \(x_3\) of \(P_3\).

Exercise: Can \(P_4\) also infer \(x_1\) or \(x_2\) now that he/she got \(x_3\)? Can \(P_4\) infer even more, after getting the sum \(y\) from \(P_1\) at the end of the computations?

Dishonest parties

Some parties may be totally dishonest by not following the protocol at all, und doing "their own thing". What could happen? Here are some possible outcomes:

The wrong sum is computed: party \(P_i\) could add something else than his earnings \(x_i\) or previous message \(m_{i-1}\)
There's no masking: \(P_1\) selects 0 for \(r\). In fact, this case didn't get much attention above, since we explicitly allowed \(0\) in the protocol. Explain!
Denial of Service: \(P_i\) could refuse to send his message \(m_i\) to \(P_{i+1}\), stopping the computation

In other words, we get totally undefined behavior. In case you were already familiar with MPC: for this scheme to break down, we don't need a dishonest majority. A single dishonest party is all it takes.

Secret Sharing with Masked Values

The algorithm above leveraged a secret sharing scheme by having \(P_1\) mask \(x_1\) by adding his private secret \(r\) and having all other parties compute masked partial sums \(m_i\). At the end, \(P_1\) unmasked \(m_n\) by subtracting \(r\), revealing the real sum.

In other words, this is what happens, on a high level:

\(P_1\) masks \(x_1\) by adding \(r\)
\(P_i, i \in \{2, 3, \dots, n\}\) compute masked partial sums \(m_i\)
\(P_1\) reveals the sum by unmasking \(m_n\), i.e. by subtracting \(r\).

If you like, this is a form of mask-compute-unmask, but not yet encrypt-compute-decrypt scheme. Before you get too excited, that's not fully homomorphic encryption (FHE), it's not even somewhat homomorphic encryption (SWHE), since masking isn't encyption at all. It's just a way of hiding secrets. We'll get to FHE and SWHE much later.

We can do better

Computing a sum on masked data is already a good step forward towards the general solution of computing the sum using secure multiparty computation (MPC). But we're no there yet! The main problem here was cheating parties: a party could be gossipy, and ruin the privacy for some other parties. Even worse, a dishonest party could inject wrong data in the computation, ruining the result without this being detected at all.

What we want is to somehow encrypt the data instead of merely masking it. This way, gossipy parties will be less of a problem, and dishonest parties can't do harm undetected. We'll see later how each party can encrypt partial results with their own key, and yet how the final result can be extracted, without having to bring all the parties' keys back together. This is what Shamir's threshold scheme is all about.

The Bigger Picture

This was an instance of the general Secure Multiparty Computation cryptographic primitive. In MPC, \(n\) parties \(P_1, P_2, \dots, P_n\) want to collectively compute a function \(y = f(x_1, x_2, \dots, x_n)\), in such a way that each party \(P_i\) contributes input \(x_i\) to the computation, but with the added constraint that during the whole computation up to and including its end, no party knows the inputs contributed by the other parties. In other words, \(\forall i, j \in \{1, 2, \dots, n\} \colon P_i\) doesn't know \(x_j, j \ne i\). In this crypto bite, function \(f\) happens to be the sum of its inputs:

\[ y = f(x_1, x_2, \dots, x_n) := \sum_{i=1}^n{x_i} \]

The algorithm shown here is secure MPC when all parties are honest[R04], but it breaks down even if a single party is gossipy, but otherwise honest. In the next crypto bite we'll introduce Shamir's threshold scheme, which is a better secret sharing scheme that will help us to MPC-compute the sum even in the presence of gossipy, but otherwise honest parties.

Source

This example, inspired by Yao [Y82] has been shown by Tal Rabin in a lecture at Technion Computer Engineering 2014 summer school on Secure Multiparty Computation. Part 1, from 07:19 to 16:31.

Literature

[R04] As stated by Tal Rabin in her talk.
[Y82] Andrew C. Yao: Protocols for Secure Computations (extended abstract). (IEEE (paywalled). full pdf)