In this article, we'll explore Claude Shannon's notion of perfect secrecy. In a nutshell, a secret encryption scheme $\mathcal E = (G, E, D)$ defined over $(\mathcal M, \mathcal C, \mathcal K)$ is perfectly secret, if any ciphertext $c \in \mathcal C$ leaks no information whatsoever about the corresponding plaintext $m \in \mathcal M$. We'll show two equivalent definitions: Shannon's one based on a priori and a posteriori probabilities, and an equivalent one: perfect indistinguishability, which is often easier to work with, using attack games. We'll then reason that perfect secrecy implies that the keys $k \in \mathcal K$ must be at least as large as the messages, in particular that the one-time pad is a perfectly secret encryption scheme. In the next crypto bite on semantic security, we'll loose the requirements on the key sizes to define security models for real-world secret encryption schemes where the keys are, of course, usually much smaller than the messages.

Some Basic Definitions

Definition of a Secret Encryption Scheme

We start by fixing a tuple $(\mathcal M, \mathcal C, \mathcal K)$, where $\mathcal M$ is a set of plaintext messages, $\mathcal C$ is a set of ciphertexts, and $\mathcal K$ is a set of keys.

Moreover, we define a secret encryption scheme $\mathcal E = (G, E,D)$ as a set of three alogrithms:

• $G$ is a probabilistic polynomial time (ppt) key generation algorithm, which, given the security parameter $1^n$ (this is a fancy way of representing integer $n$ as a string in unary representation), uniformly and randomly samples a key $k \in \mathcal{K}$. We write $k \leftarrow G(1^n)$. We also write $k \leftarrow \mathcal{K}$.
• $E$ is a probabilistic polynomial time (ppt) encryption algorithm, which given a key $k \in \mathcal{K}$ generated by $G$, and a plaintext message $m \in \mathcal{M}$, generates a ciphertext $c \in \mathcal{C}$. We write $c \leftarrow E(k, m)$ with $c \in \mathcal{C}$. Note that at each invocation of $E$, a new ciphertext is generated, even if the key and the plaintext message remain the same. This probabilistic encryption is very important to prevent CPA (chosen plaintext attacks).
• $D$ is a deterministic polynomial time decryption algorithm, which given a key $k \in \mathcal{K}$ generated by $G$, and a ciphertext $c \in \mathcal{C}$, generates a plaintext message $m \in \mathcal{M}$. We write $d := D(k, m)$. Note that here, the decryption algorithm is deterministic. Every ciphertext can decrypt to only one plaintext.

Furthermore, $E$ and $D$ are related with the following correctness property:

$\forall k \in \operatorname{codomain}(G(1^n)), \forall m \in \mathcal{M} \colon D(k, E(k, m)) = m$

In other words, for any key $k \in \mathcal{K}$ that $G$ generates, any ciphertext of a message $m$ decrypts back to $m$ under the same key.

A few remarks are in order:

• We want $G$, $E$, and $D$ to be efficiently computable. The most general definition for this is to assume that the alogrithms run in time polynomial in the size of their inputs. In particular, we want the encryption algorithm to run in time $\operatorname{poly}(|m|)$, where $|m|$ is the number of characters in the message $m \in \mathcal{M}$. We also want the decryption algorithm to run in time polynomial in the size of the ciphertext, i.e. in time $\operatorname{poly}(|c|)$. Here, $\operatorname{poly}$ is any polynomial like e.g. $p(x) = 2x$, or $p(x) = x^n$ or $p(x) = x^{n^m}$, but not like the exponential $p(x) = 2^x$ and so on. Practically, some polynomials are too big, say $p(x) = x^{1000000}$, but we'll stick to the usual definiton in complexity theory: everything with runtime polynomial in the size of the input is deemed efficiently computable.
• $G$ and $E$ are probabilistic. This means that they are allowed to toss coins while generating their output, and this output will depend on the outcomes of those coin tosses. Or, equivalently, that they are Turing machines with access to an additional input tape preloaded with as many random bits as necessary to run in polynomial time. In the notation, we will use a left arrow to indicate that the algorithm may return different values at each invocation, even with the same input values. Contrast this with $D$: here, the algorithm is deterministic, meaning that given the same input parameters, it will always output the same value. We denote this with $:=$ instead of a $\leftarrow$.
• A very common shorthand notation: $E(k, m) = E_k(m) = \{m\}_k$. We also often write $D(k, m) = D_k(m)$.

Exercise: Using the above definition of a secret encryption scheme, what happens if we try to decrypt ciphers $c \in \mathcal{C}$ which were not generated by the encryption algorithm $E$? What about ciphertexts generated by $E$, but with a different key? What does the definition imply exactly? Note that we didn't define some special "reject" value $\bot$ for the decryption algorithm. Discuss.

A First Taste of Probability Theory

The elements in $\mathcal M$ need not occur with the same probability. Indeed, in the real world, we tend to encrypt some messages more often than other messages. Who would care to encrypt 8 GB of /dev/urandom, when encrypting 8 GB of the latest blockbuster makes a lot more sense? We therefore assume a probability distribution over $\mathcal M$, and we will call $M$ a random variable over this distribution. If you're unfamiliar with probability theory, think of $M$ as some kind of function, which returns an $m \in \mathcal M$ each time it is mentioned, but sampled from $\mathcal M$ according to the probability distribution.

As an example: suppose than $\mathcal M = \{m_0, m_1\}$ with $m_0$ occuring 75% of the time, and $m_1$ occuring 25% of the time. Writing $x_0 := M, x_1 := M, x_2 := M, \cdots$ will assign $m_0$ to 75% of the $x_i$, and will assign $m_1$ to 25% of the $x_i$, provided that the sequence $(x_i)_{i \in \mathbb N}$ is infinite.

We write $\operatorname{Pr}[M = m]$ to designate the probability that the random variable $M$ returns $m$. This probability is a real number $p$ with $0 \le p \le 1$. In the previous example, $\operatorname{Pr}[M = m_0] = \frac{3}{4}$ and $\operatorname{Pr}[M = m_1] = \frac{1}{4}$.

Furthermore, we write $\operatorname{Pr}[M = m \mid C = c]$ to designate the probability that the random variable $M$ returns $m$ conditioned on the case that the random variable $C$ returned $c$. This conditional property is only defined for $\operatorname{Pr}[C = c] \gt 0$. Formally, given events $E_0$ and $E_1$, the conditional probability of $E_0$ given $E_1$ is defined as:

$\operatorname{Pr}[E_0 \mid E_1] := \frac{\operatorname{Pr}[E_0 \wedge E_1]}{\operatorname{Pr}[E_1]}$

Working with conditional probabilities requires us to state Bayes' Theorem, which allows us to hop from $(E_0 \mid E_1)$ to $(E_1 \mid E_0)$ and vice-versa. This will be extremely useful in proofs:

Theorem 1 (Bayes' Theorem): If $\operatorname{Pr}[E_1] \ne 0$, then:

$\operatorname{Pr}[E_0 \mid E_1] = \frac{\operatorname{Pr}[E_1 \mid E_0] \cdot \operatorname{Pr}[E_0]}{\operatorname{Pr}[E_1]}$

Last, but not least, the following property is crucial:

Definition (Independent events): The events $E_0$ and $E_1$ are independent iff

$\operatorname{Pr}[E_0 \wedge E_1] = \operatorname{Pr}[E_0] \cdot \operatorname{Pr}[E_1]$

A priori and a posteriori probabilities

(... to be written)

Shannon's Perfect Secrecy

Formally Defining Perfect Secrecy with Probabilities

We formally define perfect secrecy like this:

Definition 1 (Perfect Secrecy): A secret encryption scheme $\mathcal E = (G, E, D)$ defined over $(\mathcal M, \mathcal C, \mathcal K)$ is perfectly secret, iff for every probability distribution over $\mathcal M$, for every message $m \in \mathcal M$, and for every ciphertext $c \in \mathcal C$ for which $\operatorname{Pr}[C = c] \gt 0$:

$\operatorname{Pr}_{k,m}[M = m \mid C = c] = \operatorname{Pr}_m[M = m]$

where the probabilities of the left hand side are taken over all messages $m \in \mathcal M$, and over all keys $k \in G(1^n)$ for which $c = E_k(m)$, and the probabilities of the right hand side are taken over all messages $m \in \mathcal M$.

In other words: even it the adversary has previous knowledge of the typical distribution of the expected plaintext messages, i.e. if he knows (or guesses) the a priori probabilities $\operatorname{Pr}[M = m]$, by observing the ciphertexts and computing the corresponding a posteriori probabilities $\operatorname{Pr}[M = m \mid C = c]$, he learns absolutely nothing new, because both probability distributions are identical.

Or, said differently: if a secret encryption scheme is perfectly secret, allowing the adversary to observe the ciphertexts doesn't give him any advantage at all compared to what he otherwise might already know (like the typical distribution of the plaintext messages).

Perfect Indistinguishability

Another definition is perfect indistinguishability.

Definition 2 (Perfect Indistinguishability): A secret encryption scheme $\mathcal E = (G, E, D)$ defined over $(\mathcal M, \mathcal C, \mathcal K)$ is perfectly indistinguishable, iff for every probability distribution over $\mathcal M$, for every two plaintext messages $m_0, m_1 \in \mathcal M$, and for every ciphertext $c \in \mathcal C$ for which $\operatorname{Pr}[C = c] \gt 0$:

$\operatorname{Pr}_{k}[C = c \mid M = m_0] = \operatorname{Pr}_{k}[ C = c \mid M = M_1]$

where the probabilities are taken over all keys $k \in G(1^n)$ for which $c = E_k(m_0)$ or $c = E_k(m_1)$ respectively.

In other words: given two plaintext messages $m_0$ and $m_1$ and a ciphertext message $c$, an adversary can't tell from the observed ciphertext probability distribution which of both plaintext messages have been encrypted to $c$, even knowing the a priori probability distributions for both plaintext messages.

This definition is often easier to work with, because it can be turned into an attack game between a challenger and an adversary: Attack game (Perfect Indistinguishability): The challenger and the adversary have a common input of $1^n$. The adversary selects two messages $m_0, m_1 \in \mathcal M$. He can do so randomly as shown in the image above, or he can also select them in any way he sees fit. He then sends both messages to the challenger. The challenger uniformly samples a random bit $b \leftarrow \{0,1\}$ and a random key $k \leftarrow G(1^n)$,  and encrypts plaintext $m_b$ to ciphertext $c \leftarrow E(k, m_b)$. The challenger sends $c$ to the adversary. The adversary then guesses which of both messages $m_b', b' \in \{0,1\}$ was encrypted to $c$ by the challenger, and outputs his guess $b'$.

We define the advantage of the adversary as

$\operatorname{Adv_{PerfIND}} = \operatorname{Pr}[b' = b]$

where the probabilities are taken over all coin tosses.

The secret encryption scheme $\mathcal E$ is perfectly indistinguishable, iff $\operatorname{Adv_{PerfIND}} = \frac{1}{2}$.

Equivalence of both definitions

A simple application of Bayes' Theorem can be used to prove the following theorem:

Theorem (equivalence of Perfect Secrecy and Perfect Indistinguishability): The secret encryption scheme $\mathcal E = (G, E, D)$ satisfies Perfect Indistinguishability iff $(G, E, D)$ satisfies Perfect Secrecy.

Proof: $\forall m \in \mathcal M, \forall c \in \mathcal C$, Perfect Secrecy guarantees that $\operatorname{Pr}[M = m] = \operatorname{Pr}[M = m | C = c]$.

Now, we have

\begin{align} \operatorname[C = c \mid M = m_0] & = \frac{\operatorname{Pr}[M = m_0 \mid C = c] Pr[C = c]}{\operatorname{Pr}[M = m_0]} & \textrm{(Bayes' Theorem)} \\ & = \frac{\operatorname{Pr}[M = m_0] \operatorname{Pr}[C = c]}{\operatorname{Pr}[M = m_0]} & \textrm{(Perfect Secrecy)} \\ & = \operatorname{Pr}[C = c] & \\ \end{align}

By the same argument, replacing $m_0$ with $m_1$, we have

$\operatorname{Pr}[C = c \mid M = m_1] = \operatorname{Pr}[C = c]$

Therefore:

$\operatorname{Pr}[C = c \mid M = m_0] = \operatorname{Pr}[C = c \mid M = m_1]$

which is the definition of Perfect Indistinguishability. $\Box$

Limitations of Perfect Secrecy

Perfect Secrecy is an extremely strong definition of secrecy. In particular, it implies that [KL15, Theorem 2.10]:

Theorem (Shannon): In a perfectly secret encryption scheme $\mathcal E = (G, E, D)$, there must be at least as many keys than plaintext messages: $|\mathcal K| \ge |\mathcal M|$.

Proof:  Assume to the contrary that $|\mathcal K| \lt |\mathcal M|$. Consider the uniform distribution over $\mathcal M$, and a $c \in \mathcal C$ with $\operatorname{Pr}[C = c] \ne 0$. Define the set $\mathcal{M}(c)$ to be the set of all possible plaintext messages that are possible decryptions for the ciphertext $c$:

$\mathcal M(c) := \{ m \in \mathcal M \mid \exists k \in \mathcal K \colon m = D_k(c) \}$

Because algorithm $D$ is deterministic, it is obvious that $|\mathcal M(c)| \le |\mathcal K|$. Since, according to our assumption, $|\mathcal K| \lt |\mathcal M|$, there must exist some $m' \in \mathcal M$, such that $m' \notin \mathcal M(c)$. It then follows by perfect secrecy that

$\operatorname{Pr}[M = m' \mid C = c] = 0 \ne \operatorname{Pr}[M = m']$

so the cipher $\mathcal E$ can't be perfectly secret. $\Box$

In other words: a perfectly secret cipher requires keys that are at least as long as plaintext messages, i.e. in general $|k| \ge |m|$. For if they were shorter, there wouldn't be enough of them for every potential message!

Achieving Perfect Secrecy with One Time Pads (OTP)

The question is: how tight is the bound $|\mathcal K| \ge |\mathcal M|$? It turns out that with One Time Pads (OTP), the bound can be as tight as $|\mathcal K| = |\mathcal M| (= |\mathcal C|)$, i.e. perfect secrecy is achievable with One Time Pads.

Definition of the One Time Pad (OTP)

A One Time Pad[V17] $\mathcal{OTP} = (G, E, D)$ over $(\mathcal K, \mathcal M, \mathcal C)$ is a secret encryption scheme with the following properties:

• $|\mathcal K| = |\mathcal M| = |\mathcal C|$. Without loss of generality (w.l.o.g.) we set $|\mathcal K| = 2^\ell$ for an integer $\ell$, and $\mathcal K = \mathcal M = \mathcal C = \{0,1\}^\ell$, the set of all $\ell$-bit strings.
• $\forall m \in \mathcal M, \forall c \in \mathcal C, \forall k \in \mathcal K \colon |m| = |c| = |k|$
• The key generation algorithm $G$, given $1^n$ as input, randomly and uniformly selects a key from $\mathcal K$.
• The encryption algorithm $E$ computes $E_k(m) = m \oplus k$.
• The decryption algorithm $D$ computes $D_k(c) = c \oplus k$.

where $\oplus$ is a bitwise exclusive or operation over all strings of length $\ell$.

Note in particular, that the key size matches the size of the plaintext (and ciphertext) messages.

The One-Time Pad is Perfectly Secret

The claim that the One Time Pad is Perfectly Secret is easy to prove.

Theorem: The One Time Pad is Perfectly Secret.

Proof: Fix some $m, c \in \{0, 1\}^\ell$.

We have

\begin{align} \operatorname{Pr}_k[C = c] & = \operatorname{Pr}_k[c = k \oplus m] \\ & = \operatorname{Pr}_k[k = m \oplus c] \\ & = \frac{1}{2^\ell} \\ \end{align}

Thus, $\forall c \in \mathcal C, \forall m_0, m_1 \in \mathcal M$:

$\operatorname{Pr}_k[C = c \mid M = m_0] = \operatorname{Pr}_k[C = c \mid M = m_1]$

In other words, the OTP is perfectly indistinguishable, and by the equivalence of perfect indistinguishability and perfect secrecy, the OTP is therefore perfectly secret. $\Box$

The Case of Multi-Message Encryptions

It should be noted that the perfect secrecy of One Time Pads completely breaks down if more than one message is being OTP-encrypted with the same key (thus turning the one time pad into a two- or more time pad). Indeed, given two messages $m_0, m_1$ being encrypted with the same key $k$, i.e. $c_0 = m_0 \oplus k$ and $c_1 = m_1 \oplus k$, the adversary can compute

\begin{align} c_0 \oplus c_1 & = (m_0 \oplus k) \oplus (m_1 \oplus k) \\ & = (m_0 \oplus m_1) \oplus (k \oplus k) \\ & = (m_0 \oplus m_1) \oplus 0 \\ & = m_0 \oplus m_1 \\ \end{align}

thus learning the exclusive or of both messages, which is very bad[VENONA].

The take away from this is: Never, ever, reuse a One Time Pad!

Another take away is: a scheme can be perfectly secret, yet when misused, still totally unsecure. Don't confuse perfect secrecy and security: they are not the same!

Conclusion

This limitation $|\mathcal K| \ge |\mathcal M|$ of perfect secrecy is pretty bad in real life. Imagine that Alice wanted to send an encrypted message to Bob. Both of them would have to agree on a secret key that is as long or longer than the message itself. But if they have a secure channel to exchange such a long key, they could just as well use it to exchange the ciphertext. In practice, such long secret keys are a nuisance. We want shorter keys. But, as we've just seen, shorter keys mean that the secret key encryption scheme won't be perfectly secret anymore. However, if we're willing to sacrifice this strong information theoretic notion of security for a weaker form of semantic security as we'll see in the next crypto bite, we may use shorter keys.

Source

Perfect Secrecy is very well explained in [KL15, Chapter 2]. The original paper by Claude Shannon is [S49].

Shafi Goldwasser explains these concepts in Lecture "L1: Introduction, Perfect Secrecy, One-Time Pad" of an advanced course on Cryptography. Shannon's definition of Perfect Secrecy starts at 39:37, and the equivalent definiton of Perfect Indistinguishability is at 57:15. The OTP starts at 1:06:00. This embedding starts at 39:37, but you can also rewind back to the beginning to watch the whole lecture.

Another introduction to perfect secrecy by Dan Boneh:

Jonathan Katz has a great introduction to Perfect Secrecy in his lecture "Perfect Secrecy Part II", which is a part of a complete intermediary course on Coursera: