Lecture notes by
Lynette I. Millett
Revised by Jed Liu
Some terminology:
One should note that it is far easier to break codes than it is to create them. In fact, only a few people in the world really know how to create codes; hence, we should be very skeptical of any newly proposed cryptosystem.
Today we will discuss a few primitive cryptographic systems in order to build intuition about cryptanalysis.
The first scheme is called a monoalphabetic substitution cipher. In this cipher, we encrypt a given letter in the message by shifting it to the right (in the alphabet) by some number n. For example, in the Caesar cipher, n = 3. That is, an 'a' becomes 'd', 'b' becomes 'e' and so on. This works for any n not greater than 25 (assuming a 26 letter alphabet). In this cipher, the number n is the 'key.'
Another—somewhat cryptographically stronger—example of a monoalphabetic substitution cipher is to use an arbitrary permutation of the alphabet, rather than shifting by a certain number. Rather than only 25 possible keys, we have 26! (26 factorial, the number of permutations of the alphabet, assuming a 26 letter alphabet), or roughly 4 x 1026. A brute force approach to cracking this cipher, even if one spends only 1 microsecond per permutation, would take roughly 10 trillion years. It is important to note that two successive encryptions do not increase the strength of this cryptosystem because the product of two permutations (keys) is another permutation (key).
In actuality, 10 trillion years are not needed to crack a monoalphabetic code. We can take advantage of the fact that sentences in natural language (for the sake of argument, assume English) do not have uniform frequency distributions over the entire alphabet. Thus, we can calculate the frequency distribution for each character in the cipher text and compare it to what we know about the English language. (For example, 'e' is the most frequent letter, occurring 14% of the time, 't' occurs 9.85%, while 'q' only occurs 0.26%.) Monoalphabetic substitution ciphers have the property that frequency distributions are preserved. Therefore, if we know the original language, and we know that a monoalphabetic substitution was used, then we have a good chance of cracking the code. This kind of attack is referred to as a ciphertext only attack.
There are several types of cryptanalytic attacks:
How can such a substitution be cracked? Note that if we know the key length (how many different permutations are used), then, in the above example we would know that the first, fifth, ninth, etc. characters were encrypted under the same permutation. Thus, knowing that the key length is n reduces the problem to cracking n monoalphabetic substitution ciphers. Cryptanalysis of a polyalphabetic substitution cipher is therefore accomplished in three steps:
THIS IS AN EXAMPLE OF A MESSAGEbecomes
TEAHX IAMSME PSILSSEA GAOENFThis distributes the n-grams, and once the transposition is done a substitution can also be performed. To attack such a transposition, one needs to determine the dimension of the transposition matrix (in the above example: 10x3). This adds one more level of complexity, but is not fundamentally different than the other schemes we have examined. Spreading information through a message like this is called diffusion.
Many ciphers in existence only provide computational security. That is, the encoded information is kept secret only because of limits in our current processing capacity. However, given enough computing power, one can always launch a brute force attack to break the cipher.
One might ask, then, what are the characteristics of a perfect substitution cipher? Such a cipher would provide us with unconditional secrecy, where the probability that a message can be decrypted is unaltered by knowledge of the ciphertext for that message. This can be accomplished with a one-time pad, an infinite sequence of random bits. This sequence is XORed with the message. If it is a random sequence, then knowing any one bit will tell you nothing about what the next bit will be. Moreover, decoding is simple because (key XOR message) XOR key = message. The problem is that unlimited key material is needed, and absolute synchrony is necessary between the sender and the receiver.
There are two properties of an encryption scheme that are desirable:
In January 2002, Michael Rabin of Harvard University announced a scheme for what he called "everlasting security". Here, he uses a bounded storage model, a refinement of computational security, in which it is assumed that any adversary has a bounded amount of storage available, but no assumption is made about computational power.
To encryt using this scheme, the sender A and the receiver B use a publicly accessible sequence α of random bits. This sequence should be longer than their adversary's storage capacity and can come, for example, from an intercepted digital streams from a TV channel. The key that A and B share is an algorithm for picking out bits from α to use for a one-time pad. To solve the problem of perfect synchony, it will suffice for A and B to use a self-synchronizing stream, such as that from a digital TV broadcast or a satelite signal.
To see why this works, suppose an adversary captures the ciphertext and later captures the key. But since the adversary lacks the ability to store α, she will not be able to construct the one-time pad, and so, will not be able to decrypt the message.