The compression side channel – Attacks on the TLS Record Protocol
21.5.2 The compression side channel
Back in 2002, John Kelsey, a cryptographer working for Certicom at that time, published a paper [98] describing how lossless data compression, when used in cryptosystems, can leak information about the encrypted plaintext.
More precisely, Kelsey showed that under certain circumstances, the size of the ciphertext leaks information about the plaintext when lossless compression is applied to the plaintext. He referred to this information as the compression side channel.
The existence of a compression side channel is quite dangerous because it means that Eve can learn certain things about the plaintext from its corresponding ciphertext without ever breaking the encryption algorithm. In addition, Kelsey observed that the compression side channel is fundamentally different from side channels, such as the algorithm’s power consumption or execution time (which we will cover in the next chapter), in that it is a property of the algorithm, not the implementation. That is, any implementation that uses lossless compression is vulnerable.
As we saw previously, lossless compression works by identifying repeating patterns in the input data and replacing them with references. Thus, if Eve knows the length of encrypted messages that contain s or is able to append s to the messages before they are compressed and encrypted, she can determine the presence of s in a previously unseen encrypted message with non-negligible probability.
Kelsey argued that such string detection attacks are feasible in practice and gave the following example for a long string detection with partially chosen plaintext.
Assume that Eve wants to determine whether some long string s appears often in a set of messages m0,m1,…,mn−1. She can do this using a string detection attack such as this one:
- First, Eve acquires the compressed and encrypted versions of all mi. From these messages, she learns their compressed output lengths.
- In the next step, Eve acquires the compressed and encryption versions of all messages mi∥s.
- In the third step, Eve determines the length of compressed string s when the corresponding lossless compression algorithm is used.
- Finally, Eve computes the average a = (∑ mi∥s − mi)∕n. If a is substantially less than the length of compressed s, it is very likely that s is present in many of the messages.
Let’s apply this example in the internet setting. To authenticate users, websites use cookies. If the cookie is compressed and encrypted, Mallory cannot simply read it. However, if she can send HTTP requests from Bob’s web browser, Mallory can determine the cookie’s value using a string detection attack.
Assume that the value of the secret cookie is abcd1234efgh5678. Say Mallory correctly guesses part of the secret value and sends the following GET request:
GET www.alice.com/app.html?some_parameter=abcd
In this case, the size of the compressor’s output and, in turn, the ciphertext will be smaller because the string abcd occurs multiple times in the HTTP request:
GET /app.html?some_parameter=abcd HTTP/1.1
Host: www.alice.com
Cookie: sessionid=abcd1234efgh5678
As a result, by making guesses for several bytes and checking the resulting size of the encrypted HTTP request, Mallory will be able to determine the cookie’s value.
Importantly, Mallory doesn’t have to guess the entire cookie at once (which would be practically infeasible if the cookie is large enough). Instead, she only needs to guess a few bytes at a time, and this can be easily done because a k-byte string can only have 28k values.