Suppose that Alice and Bob wanted to communicate securely over the public Internet. Both will use a symmetric cipher like AES to keep their communications secret. How can Alice and Bob agree on a key for AES? Alice could select a random key (a session key) and send it to Bob. Then, both Alice and Bob will use that key to AES-encrypt and -decrypt their messages. There's a problem though: Alice needs a secure channel to transmit that key to Bob at the very beginning, because she doesn't want her nosy neighbor, Wyle E. Hacker, to grab that key off her home wifi before she and Bob switched to all-encrypted. Unfortunately, unlike the presidents of the USA and Russia, Alice and Bob don't have a secure direct communications link at their disposal. Yet, billions of encrypted session are securely established every day over the Internet. How is that possible? Enters the Diffie-Hellman Key Exchange, introduced by Whitfield Diffie and Martin E. Hellman in a groundbreaking seminal paper in 1976[DH76].
The Diffie-Hellman Key Exchange is a widely used cryptographic protocol that allows Alice and Bob to securely agree on a shared key, without having to transmit this key over a secure channel. In fact, DH is so ubiquitous, that chances are you are using it or one of its variants right now to access this page via HTTPS: let your browser display the technical details of the connection and watch out for ECDHE, which stands for Elliptic Curve Diffie-Hellman (Key) Exchange.
So you may be wondering how Alice and Bob can simultaneously agree on a key, without telling each other what that key is supposed to be. That's the beauty of Diffie-Hellman.
How does it work? Let's see.
A gentle video introduction
If you've never heard of Diffie-Hellman before, now is the time to grab your notebook, a bottle of diet coke, and head right off to your Community College / University for a quick refresher. Don't worry, you'll be sitting in a room full of sophomores who are just as clueless as you are, and the teacher knows it. You'll have plenty of time to take notes, to think and to absorb the basics. If you're really short on time, you could (discreetly) leave after some 15 minutes, but staying for the mathematical background is highly recommended:
Diffie-Hellman Key Exchange in a Nutshell
There are two phases in this protocol:
- A setup phase, where global parameters \(p\) and \(g\) are fixed.
- A key-exchange phase, where Alice and Bob argee on a shared key \(k_{AB}\).
The setup phase doesn't need to be online. The parameters \(p\) and \(g\) could as well be computed once, and then hard-wired into the software. The key-exchange phase on the other hand is executed online each time Alice and Bob want to establish a new session and need a new shared key.
The setup phase
- Select a random prime number \(p\)
- Compute a generator \(g\) of \({\mathbb Z}_p^{*}\)
If you've been a good student, you've stayed til the end of Prof. Paar's lecture above and will understand
- that we are working in the multiplicative group \(({\mathbb Z}_p^{*}, \times)\), where \(\times\) is the multiplication modulo \(p\),
- what \({\mathbb Z}_p^{*}\) actually is (answer: the set of all invertible elements \(x\) in \({\mathbb Z}_p\), i.e. for which \(\operatorname{gcd}(x, p) = 1\), which happen to be \(\{1, 2, 3, \cdots, p-1\}\) if \(p\) is prime)
- why \(p\) has to be a prime number (answer: so that \({\mathbb Z}_p^{*}\) is a cyclic group)
- what a generator is (answer: an element of \({\mathbb Z}_p^{*}\) of maximal order, also called a primitive element, i.e. an element \(g\) such that we can generate all etements of \({\mathbb Z}_p^{*}\) by computing \(g^i\ \forall i \in \{0, 1, 2, \cdots, p-1\}\))
- what the order of an element \(x\) is (answer: \(\operatorname{ord}(x)\) is the smallest integer \(i\) such that \(x^i \equiv 1 \pmod{p}\), i.e. the length of the circle before the sequence \((x^i \pmod{p})_i\) repeats).
- that a cyclic group must have at least one generator (so that a \(g\) exists at all)
- but that not all elements are generators (so that we need to actively search for one)
With this in mind, computing \(g\) is a simple matter of iterating \(x\) through \({\mathbb Z}_p^{*}\), computing \(\operatorname{ord}(x)\) as we go, until we get an maximal order \(p-1\). We set \(g \leftarrow x\) where \(\operatorname{ord}(x) = p-1\).
The key-exchange phase
Alice:
- \(k_{pr}(A) := a \overset{R}{\leftarrow} \{2, 3, ..., p-2 \}\) (uniformly pick a random element \(a\) as my private key)
- \(k_{pub}(A) := A \leftarrow g^{a} \pmod{p}\) (compute my public key \(A\))
- Send my public key \(A\) to Bob (via the insecure channel)
- Receive Bob's public key \(B\) (via the insecure channel)
- Compute \(k_{AB} \leftarrow B^{a} \pmod{p}\)
- Use \(k_{AB}\) as the shared key.
Bob:
- \(k_{pr}(B) := b \overset{R}{\leftarrow} \{2, 3, ..., p-2 \}\) (uniformly pick a random element \(b\) as my private key)
- \(k_{pub}(B) := B \leftarrow g^{b} \pmod{p}\) (compute my public key \(B\))
- Send my public key \(B\) to Alice (via the insecure channel)
- Receive Alice's public key \(A\) (via the insecure channel)
- Compute \(k_{AB} \leftarrow A^{b} \pmod{p}\)
- Use \(k_{AB}\) as the shared key.
Note how the scheme is perfectly symmetric.
For this scheme to work, we need to make sure that both Alice and Bob compute the same value for $k_{AB}$. Indeed: Alice computes
\[ B^a = (g^b)^a = g^{ba} = g^{ab} \]
and Bob computes
\[ A^b = (g^a)^b = g^{ab} \]
which is the same value. This is a proof of correctness.
A simplified presentation
Since we will explore variations of the Diffie-Hellman Key Exchange protocol in upcoming articles, we'll need one presentation of this protocol that focuses on the main aspects, and also emphasizes the time line. Such a simplified presentation for unauthenticated DH looks like this:
- Alice sends \(g^a\) to Bob
- Bob sends \(g^b\) to Alice
- Alice and Bob independently compute and use \(g^{ab}\)
It is understood that
- the private keys \(a\) and \(b\) are selected at random
- \(g^a\) and \(g^b\) are the corresponding public keys
- Seeing \(g^x\) doesn't mean that one also sees \(x\). Only the result of the exponentiation is available, not the exponent.
- all numbers are to be taken modulo \(p\). Actually it's \(g^a \pmod{p}\) and \(g^b \pmod{p}\) that are sent over the wire. But writing \(g^x\) is perfecty fine as long as we are aware of the fact that \(g^x \in {\mathbb Z}_p^{*}\).
Why does it work?
Wyle E. Hacker, the passive eavesdropper, can only see \(g^a\), \(g^b\) by sniffing the traffic between Alice and Bob, and he knows \(p\) and \(g\) by analyzing the crypto-software used by Alice and Bob. Can he compute \(k_{AB} = g^{ab}\) all by himself? The answer is no. He certainly could, if he was able to efficiently compute \(a\) from \(g^a\) and \(b\) from \(g^b\). But this computation is nothing more or less than finding the discrete logarithm of \(g^a\) and of \(g^b\). Efficiently computing this amounts to solving the discrete logarithm problem, which is as of this writing considered a hard problem of number theory.
The previous paragraph is an example of a simple security proof by reduction: we reduced the capability of a passive eavesdropper to break Diffie-Hellman to that of solving a hard mathematical problem. If the former were possible, the latter would be too. Contradiction. Therefore the former is impossible. This kind of proof technique is very common in cryptography and we'll see it very frequently.
Man in the middle attack on unauthenticated Diffie-Hellman
Obviously, Alice has no means to verify thad \(B\) really belongs to Bob (and vice-versa for Bob to verify that \(A\) really belongs to Alice). An active attacker, Eve, could execute a man in the middle attack by impersonating Bob and Alice to Alice and Bob respectively. In the simplified presentation, such an attack looks like this:
- Alice sends \(g^a\) to Eve
- Eve send \(g^e\) to Alice
- Eve sends \(g^e\) to Bob
- Bob sends \(g^b\) to Eve
Now
- Alice and Eve share \(g^{ae}\)
- Bob and Eve share \(g^{be}\)
Or, if you prefer to see the full protocol:
Alice:
- \(k_{pr}(A) := a \overset{R}{\leftarrow} \{2, 3, ..., p-2 \}\) (uniformly pick a random element \(a\) as my private key)
- \(k_{pub}(A) := A \leftarrow g^{a} \pmod{p}\) (compute my public key \(A\))
- Send my public key \(A\) to Eve (via the insecure channel), thinking I'm sending it to Bob
- Receive Eve's public key \(E\) (via the insecure channel), thinking it is Bob's
- Compute \(k_{AE} \leftarrow E^{a} \pmod{p}\)
- Use \(k_{AE}\) as the shared key with Eve, thinking it is the shared key with Bob
Eve:
- \(k_{pr}(E) := e \overset{R}{\leftarrow} \{2, 3, ..., p-2 \}\) (uniformly pick a random element \(e\) as my private key)
- \(k_{pub}(E) := E \leftarrow g^{e} \pmod{p}\) (compute my public key \(E\))
- Send my public key \(E\) to Alice (via the insecure channel), impersonating Bob
- Send my public key \(E\) to Bob (via the insecure channel), impersonating Alice
- Receive Alice's public key \(A\) (via the insecure channel),
- Compute \(k_{AE} \leftarrow A^{e} \pmod{p}\)
- Receive Bob's public key \(B\) (via the insecure channel),
- Compute \(k_{BE} \leftarrow B^{e} \pmod{p}\)
- Use \(k_{AE}\) as the shared key with Alice
- Use \(k_{BE}\) as the shared key with Bob
- Relay (without or with modifications) all messages between Alice and Bob
Bob:
- \(k_{pr}(B) := b \overset{R}{\leftarrow} \{2, 3, ..., p-2 \}\) (uniformly pick a random element \(b\) as my private key)
- \(k_{pub}(B) := B \leftarrow g^{b} \pmod{p}\) (compute my public key \(B\))
- Send my public key \(B\) to Eve (via the insecure channel), thinking I'm sending it to Alice
- Receive Eve's public key \(E\) (via the insecure channel), thinking it is Alice's
- Compute \(k_{BE} \leftarrow E^{b} \pmod{p}\)
- Use \(k_{BE}\) as the shared key with Eve, thinking it is the shared key with Bob
Now Eve has full control of the channel between Alice and Bob. She can not only decrypt the messages and listen to the conversation, she could even insert, delete, and modify messages at will, creating all kinds of mayhem. All this without Alice's and Bob's knowledge.
Mitigations
The man in the middle attack against Diffie-Hellman was due to the fact that the public keys \(A\) and \(B\) were not really tied to the identities of Alice and Bob respectively. An attacker was able to dissociate the public keys from their identities and substitute itself in this association. Alice and Bob could publish long-term public keys (not \(A\) and \(B\) that are session-specific short-term public keys) in a public key infrastructure (PKI), and sign A and B with their long-term private keys. Upon receipt of signed \(A\) and \(B\), they could verify the signatures before proceeding. This scheme, authenticated Diffie-Hellman, may seem obvious, but coming up with a secure construction is quite difficult. We'll come back to this in the next article.
Odds and Ends
- In the man in the middle scenario, \(k_{AE} \ne k_{BE}\). Therefore Alice and Bob could potentially detect this discrepancy by sending each other what they think is the shared key, once they've switched to all-encrypted mode. Unfortunatly, Eve knows \(k_{AE}\) and \(k_{BE}\) as well, and since she has full control of the session, she could detect them in real-time and try to substitute one for the other. Say Bob sends \(k_{BE}\) towards Alice via Eve. Eve would substitute \(k_{BE}\) with \(k_{AE}\) and let the modified message go through towards Alice. Alice will have no obvious way to detect this tampering and will think that Bob has indeed the same shared key than her's. Of course, Bob could protect \(k_{BE}\) with integrity codes, but Eve could then strip them and recompute them with \(k_{AE}\), if she is smart enough and knows about Bob's sneakiness. Either way, once Eve has control of the communication channel, it is basically game over for Alice and Bob.
- Of course, \(p\) has to be big enough for the discrete logarithm problem to be practically hard. Basically, brute-forcing the search for \(a\) given \(g^a \pmod{p}\) would require on average \(\frac{p-1}{2}\) operations, since \(a\) could be any of the \(p - 1\) elements of \({\mathbb Z}_p^{*}\). So we need to make sure that \(p\) is big enough even for the fastest computers.
- To find a big prime \(p\) in the setup phase, we need a primality test that is faster than factorization. One test that is widely used in practice is the Miller-Rabin Primality Test.
- Computing exponentiations \(g^x \pmod{p}\) with big private keys \(x\) is best done with the repeated squaring algorithm.
- We've shown above the (classic) Diffie-Hellman in the multiplicative group \(({\mathbb Z}_p^{*}, \times)\). In practice, we could also use Diffie-Hellman with the additive group of elliptic curves \((EC_{curve}, +)\) Instead of numbers, we would use points on an elliptic curve, and instead of exponentiation, we would use the multiplication operation (iterated point-addition) on EC points. This would be the Elliptic curve-based Diffie-Hellman Key Exchange, or ECDHE, that is so widely used today.
- There can be a bit of a mismatch between the shared key \(k_{AB}\), and the AES key that Alice and Bob will ultimatly use to encrypt and decrypt the messages. An AES key must be 128-bit, 192-bit or 256-bit long, but \(k_{AB}\) may be of any length. In practice, Alice and Bob will feed \(k_{AB}\) to a key derivation function (KDF) and use \(KDF(k_{AB})\) as the key for AES. This slight adjustment is of no consequence to the Diffie-Hellman protocol per se.