RESTRICTIONS: You may work alone or in pairs, and you may not use set! or any other procedure that performs side effects.

As you can see, this problem set is quite lengthy and the mathematics used here is quite involved. Although you are not responsible for understanding all the math presented in the Appendix, it's always a good idea to start early. Don't feel you are stuck just because you don't know the math. Abstract the ideas behind the problem set and work on the abstraction.

If you choose to work in pairs, you must submit only one copy of your solution. Every effort is made to maintain academic integrity in our class. If you share solutions with anyone except your partner, you risk facing serious consequences.

If you still have any trouble with the course software, or if you feel you are completely lost, then please come to our office hours or consulting hours. Again, you are not responsible for the math details in the Appendix. (i.e. you will not see them in prelims and exams). The Appendix is provided to you just for fun.

Before proceeding with the programming assignments, read the high-level description of the scenario, titled "Cryptography". The mathematical details relevant to each problem will be provided in the appropriate sections. Remember one important word: abstraction.

Cryptography

In our modern society, we have many real-life applications requiring secure communication over insecure channels. Specifically, the Internet represents a insecure channel over which we wish to communicate in confidence. In many cases, we are concerned about message secrecy and authenticity. Cryptography is the study of secret writing, which aims at rendering a message incomprehensible to an unauthorized reader.

In this problem set you are going to use and extend a cryptographic library. Here, "library" refers to a collection of algorithmic routines, much as it does in the phrase "C standard library". Recall that in lecture one we talked about computational abstraction. We take a computational process and give it a name, then we build a "black box" around it. A library is in fact a collection of these black boxes. It exposes routines that a programmer can use to perform certain tasks.

What should a cryptographic library contain? Certainly we need to have routines that will encrypt and decrypt messages. Many people know these two processes conceptually well, but as you will see, there are significant subtleties behind the scenes.

Encryption and Decryption

Encryption actually takes a message, does some "scrambling" on it, and gives you an encrypted version of the original message. We will describe this by saying that encryption transforms plaintext to ciphertext. Decryption is just the inverse of encryption (i.e. ciphertext to plaintext). However, if you just build a black box that takes one input (either plain-text or cipher-text) and spits out the output, it is easy to see that your encryption black box will spit out the same cipher-text for a certain plain-text, and everyone who uses your encryption black box will end up with exactly the same input-output pairs. This is obviously undesirable.

To solve this problem, we introduce the concept of keys. A key is an abstract representation of identity. The idea is that Alice owns a distinct key (that is, nobody has that same key). Therefore, that key represents Alice. Now you can rebuild both your black boxes (encryption and decryption) to take one additional input: the key. Now since each person has a different key, everyone who uses your encryption black box will end up with different outputs even if the input text is exactly the same. Note as well that those different ciphertexts will be decrypted back into exactly the same plain-text when used with the correct key.

Key Management

Unfortunately, this doesn't solve the problem. Why? You are the only one who can decrypt an encrypted message encrypted with your own key. Remember that no one else has the key and both encryption and decryption processes require your own key. This isn't useful. Now Alice needs to share her own key with the message recipient. It's possible that Alice sets up one key for each possible recipient, and Alice uses a particular key when she wants to send a secret message to the corresponding recipient. There are also other smarter schemes that share keys. But how would Alice share her keys with the recipient securely?

It's easy to see this "scheme" doesn't quite work. If you have a secure channel to send the key, why bother using cryptography? You can just send your message over this secure channel! The reason you deploy cryptography is exactly because you don't have such a channel. So how do we get around this chicken-and-egg like question?

Public Key Cryptography

The solution is what is known as public key cryptography. The system that we described above is called private key cryptography, in which the key must be kept private among the communicating parties (e.g. Alice and her friends). Some people actually prefer to call that system secret key cryptography because of a possible confusion with another term used later on. We will use secret key cryptography to avoid confusion in this problem set as well.

Secret key cryptographic systems have the intrinsic problem of the key management issue above. So cryptographers developed a new breed called public key cryptography. In a public key system, each principal (user) has a key pair, namely her public key and her private key. In this system, encryption takes the public key as the extra input; decryption then takes the private key.

These two keys are mathematically interrelated and that's why the encryption and decryption processes can take them separately yet produce coherent results. But there is the interesting and useful property that if you only get someone's public key it's extremely hard to infer the corresponding private key. With such a nice property, Alice can now announce to the world what her public key is. In fact, every principal's public key can be announced to the world. Now when Alice wants to send some secret message to her friend Bob, she just needs to get Bob's public key--which is known to the public--to encrypt the message, at which point only someone with Bob's private key (that is, Bob himself) can decrypt the message.

As an interim summary, public key cryptographic system contains an encryption and a decryption routine. When A wants to send a secret message to B, A takes B's public key, encrypts the message with B's public key, and then B uses his own private key to decrypt the message. In our system, it's also possible that A uses his own private key to encrypt and then B use A's public key to decrypt. (consequence of the mutual inverse property.) Don't worry about the math behind this - the mathematics in the encryption and decryption is just not important because of our abstraction.

Message Authenticity

We've solved the message secrecy problem. But remember that we also want message authenticity. That means we want to be able to verify the identity of the sender and the genuineness of the message.

It's easy to see why we need this. Consider a public key system where Alice receives an encrypted message from someone claimed to be Bob. Now Alice has every intention to verify if it's really from Bob. But in this system, all Alice knows is that the sender has her public key. But everybody has her public key because she announces it to the world. How can Alice be sure that Bob really sent this letter?

Message Signature

To provide message authenticity, we introduce a new abstraction called a message signature. A message signature is a digest of the message. This digest is small and compact when compared to the original message. In this problem set, for example, the message signature is an integer computed from the message contents.

Different messages will usually end up with different message signatures. (Observe that the number of possible messages is much larger than the number of signatures because of the size differences.) In fact, cryptographers have come up with very effective signature routines such that similar messages are guaranteed to have different signatures. Ideally, changing one bit in the message will flip 50% of the bits in the signature at random positions.

Making Signatures

Certain public key systems, like the one we developed in this problem set, have another nice property: the public key and the private key are interchangeable. That means, the owner of the private key can take it and use it for encrypting a message, and then the recipient can take the sender's public key and decrypt it. (Make sure you distinguish the two keys.) What this means is you can prepare an encrypted message and the whole world can decrypt it because you announce to the world your public key.

Combining the above property with message signatures guarantees message authenticity. Here's the idea: let's say Bob wants to propose to Alice. Bob writes up his declaration first. Then he computes the message-signature of the declaration. Finally, he encrypts the message-signature using his private key, and encrypts the message using Alice's public key and sends his declaration together with the signature to Alice.

Verifying Signature

Once Alice gets the message-signature and message, she first decrypts the message with her own private key and decrypts the message-signature with Bob's public key. She then computes the message-signature of the decrypted message and compares that with the received signature. If they are the same, then she can be (practically) 100% certain it's really from Bob. Why? Presumably, only Bob has Bob's private key. (Remember that it's very difficult to infer the private key from the public key.)

But we said "practically 100%", not exactly 100%. What goes wrong in our method? Nothing, but there is a slight possibility for an adversary can come up with a fake signature-like input and encrypts it with a fake private key in the hope that with Bob's real public key, Alice will get exactly the real signature when she decrypts the fake encrypted signature. Complicated enough... it's almost practically impossible given the difficulty in the mathematical analysis required and the sensitivity of our signature computing routine.

Modular Arithmetic

Before we can talk about the programming assignments, we need to review modular arithmetic. You may have heard about an arithmetic operation called "mod" before. In C, you use the operator %. In essence, if c is in 0 <= c < b, then

But this is not the end of the story. We can further define other arithmetic operations like add under modulo with respect to a positive integer n. Basically you first do the operation and get the result X, then compute X mod n and return that as the result. In this problem set, we are particularly interested in modulo-add. Here are some examples with the modulus chosen to be 3:

Written Exercises

Problem 1: Mathematical Induction

Prove by induction that (xⁿ - yⁿ) is divisible by (x - y), for (x != y) and (n >= 1). X and Y do not have to be integers.

TURN IN: Hand in your complete proof of the above property. Be sure to specify what you are doing induction on and what your induction hypothesis is. Please be sure to look at this example for a guide to how you should write up an induction proof.

Problem 2: Structural Induction

This function sums the numbers in a list. For example, (sum-list '(1 2 3)) evaluates to 6. Prove that sum-list indeed sums the numbers in a list for all lists using induction and the substitution model. That is, prove for all lists of numbers that:

TURN IN: Hand in your complete proof of sum-list's correctness. Please be sure to look at this example for a guide to how you should write up an induction proof.

Programming Exercises

In your program, you should load the ps2.ss source code. This should be done in the same fashion as problem set 1, namely using the command

We occasionally define a variable in a problem to help you test your code and use it for the rest of the problem set. result1 is such a variable. Make sure you define them when you test your program.

Here's a table listing the functions provided in our cryptographic library, the arguments they take and a high level description of what they do. There are some other helper functions that are not here, but their code should be self-explanatory. However, since you are also extending the library, some essential helper methods are documented as well. We will have more details about some individual functions you may need in related questions.

Problem 3: RSA Decryption

In this problem we will complete the library by developing the RSA-decrypt. RSA is actually the initial letters of the three inventors of a popular public key cryptographic system, which is the one we implement in our library. (They are Ron Rivest, Adi Shamir, and Leonard Adleman.)

In RSA, the encryption and decryption processes operate on integers. They take integers as input and output integers. We have provided you with a standard way to transform between strings of characters and numbers in our library. You are not responsible for these two methods. Try these commands:

When we encrypt a message, we first transform the message into a list of integers and process the integer list with RSA-encrypt-list.

Here, you are also told that a key pair in RSA contains a key-modulus. We will call it n. Just take it as a carefully chosen integer.

You may guess that we will just convert the string to an integer list and then encrypt each individual integer. But this makes cracking the code much easier. (See the appendix for an explanation of why.) Instead, we encrypt the last number in the list, modulo-add that to the second-to-last integer (using n as the modulus), encrypt the result and modulo-add that to the third-to-last integer and encrypt and so on. Observe that each resulted integer in this process now depends on all the integers after it in the input list.

We now define a method for you called RSA-transform. In fact, it is the most crucial part of implementing RSA. RSA-transform takes two arguments, an integer and a key. It corresponds to the encrypt process mentioned in the previous paragraph.

RSA has an interesting property: the encryption and the decryption processes are the same! You encrypt an integer by using RSA-transform with a public key, then you can decrypt the result and get the original integer back by using RSA-transform with the corresponding private key. (Remember, you don't use private key just for decryption.)

where [X] represent the RSA-transform of X with respect to the key-modulus. Check to make sure you really know what RSA-encrypt-list does and why it works exactly as described above. Then think why don't we do the following:

But wait. Where do the keys come from? We have generated some key pairs to use in this problem set. To learn how to generate them, please see the Appendix. You are not responsible for key generation.

Your task is to write RSA-decrypt-list. It takes an encrypted integer list and the correct key. (HINT: RSA-decrypt-list is very similar to RSA-encrypt-list. If you find yourself doing something much more complicated, then you are barking up the wrong tree. Ask for help if necessary.)

TURN IN: For this problem, turn in a listing of your procedure RSA-decrypt-list, the sample encryption and decryption of the test message, and a sample encryption and decryption (using test-key pair1 and test-key-pair2) of some message of your choice. Remember that samples are to be pasted in the code as comments.

Problem 4: Envelopes

In this problem, we are going to deal with enveloped data. In the real world, very often you need to send a single secret message to more than one person. But RSA can only handle one recipient. So how can we send a secret message to, say, 20 people? An obvious solution is to do the encryption process 20 times - one copy of encrypted message for each recipient. This is highly undesirable because if one copy of the encrypted message is 100 KB then 20 copies is 2000 KB. It's definitely not using bandwidth wisely. One way to get around this problem is by introducing envelopes. An enveloped message for n recipients contains the following items:

That is, it's a two-tuple. A session key is just a key, and it is usually much smaller than the message itself. Let's check that each recipient can really decrypt the message in this format. First they filter the recipient list for their own public key, then get the encrypted session key. Since that encrypted session key is encrypted using their own public key, their private key can of course be used to decrypt the session key. Once they have the session key, the encrypted message can easily be decrypted.

A signed envelope data is just a three-tuple. The extra information is the signature as the last element. The signature of a message is computed by encrypting the message first, and then computing the signature, which we do by calling compress.

The first step in adding enveloped messages is to create the data type for enveloped messages. For this question, you should use higher-order functions to create the type of enveloped messages. You need to write functions make-envelope, envelope-rlist, envelope-msg, and envelope-sig to fulfill the following specification:

Note that make-envelope is a function taking a recipient list, a message, and a signature and returning an enveloped-message. Additionally, envelope-rlist, envelope-msg and envelope-sig should take as input a enveloped-message. The recipient list will be represented as a list of recipients, where a recipient is the result of evaluating (make-recipient pubkey enckey). pubkey is the public key of the recipient, while enckey is the encrypted session key used to encrypt the message. The session key is encrypted with each recipients public-key.

Your function make-envelope should be implemented using higher order procedures (that is, make-envelope returns a function), provided you can use it to write envelope-rlist, envelope-msg and envelope-sig to satisfy the specification.

TURN IN: For this problem, you should hand in your listings for make-envelope, envelope-rlist, envelope-msg, and envelope-sig, as well as output showing that your functions satisfy the above specification.

Problem 5: Creating the Encrypted Message

to create the secret message. secret-rlist, secret and secret-signature are both defined in the ps2.ss file. Later on you will be cracking the RSA system and figuring out what the message is.

Now write encrypt-and-sign, a procedure that takes as arguments a message to be encrypted and signed, the sender's private key, and a list of the recipients' public keys. The procedure should generate a random RSA key-pair (using a key size of around 8), compute a digital signature for the message which is encrypted with the private-key of the sender's key pair, encrypt the message with the public-key of the random key-pair, and encrypt the private-key of the random key-pair for each recipient with his public-key, creating a list of recipients combine these to produce a signed message.

Now implement the inverse transformation authenticate-and-decrypt, which takes as arguments the received enveloped-message, the sender's public key, and the recipient's key-pair (both public and private keys). If the signature is authentic, the procedure should produce the decrypted message. If the signature is not authentic, the procedure should indicate this. Test your procedures by trying

to recover the original message. Note that it is necessary for both of these expressions to evaluate to the original message. Additionally, you should show that the procedure catches fraudulent signatures.

TURN IN: For this problem turn in a listing of your procedures encrypt-and-sign and authenticate-and-decrypt together with a demonstration that they work. Don't forget to demonstrate that they catch fraudulent signatures. You can use the two test keys for this.

Problem 6: Finding divisors

In this problem, you will implement the method smallest-divisor, which given an integer n (>= 2) finds the smallest integer (> 1) that divides n with no remainder. Note that we don't specify the behavior for n equals to 1 and your implementation won't be checked for cases where n <= 1. We require you use the simple divisor-trial scheme. i.e. consider integers that could divide n. Do not implement other complicated schemes even if you know any of those.

In this problem we stress efficiency. You will be heavily penalized if you don't follow the grading scheme below. We will measure efficiency by the number of trials. How can you minimize the number of trials required to find the smallest divisor of n? Here are two factors that we will consider when we grade your solution:

Also, write a procedure unique-prime-decompose that returns the list of unique prime factors given positive integer n >= 2. For example, (unique-prime-decompose 12) will give (2 3), (unique-prime-decompose 24) will also give (2 3) and (unique-prime-decompose 17) will give (17). You may want to use smallest-divisor as a helper method here.

Much like smallest-divisor, we stress efficiency here too. The grading factors are:

TURN IN: For this problem turn in a listing of your smallest-divisor and unique-prime-decompose. Be efficient. Show several examples of smallest-divisor and unique-prime-decompose working. In particular, show the results of (smallest-divisor 6355441) and (unique-prime-decompose 21288129609051). (It won't take more than 1 minute even on the slowest campus CIT machine if your code is written properly.)

Problem 7: Cracking RSA Encryption

You now have a basic implementation of an RSA cryptosystem, complete with facilities for encryption, decryption, digital signatures and signature authentication, envelope-messages and generation of new keys. Since we have used such small primes to generate keys, you should also be able to crack the system. In order to crack an RSA system, recall you must factor the modulus n into its component prime factors p and q. You can do this using the procedure smallest-divisor. Note that you only need to find one prime divisor p; the other divisor is q = n/p. Write a procedure crack-RSA that, given a public key, returns the associated private key. Test your procedure using the pairs test-key-pair1 and test-key-pair2 to show that it generates the correct private keys, given the public keys.

TURN IN: For this problem turn in a listing of your procedure crack-RSA, together with demonstrations that it works.

Problem 8: Cracking our Private Message

Use your crack-RSA and authenticate-and-decrypt procedures to determine who sent the envelope-message in secret-message, who the message was intended for and what the content of the message is. The message and the public keys for various possible senders and receivers are at the end of the file ps2.ss.

TURN IN: For this problem turn in the results of using these procedures crack-RSA and authenticate-and-decrypt to decrypt the message.

Appendix

This appendix is prepared for you so that you can learn more about RSA, both some of the math ideas and how RSA can be used in other areas such as digital time stamping. You are not responsible for the material presented here, i.e. you won't be tested on it during exams. Some understanding of the mathematical details of RSA definitely helps doing the problem set faster, but it should not be a critical factor of successfully completing the problem set.

The story begins in 1976 when Diffie and Hellman discovered a new approach to encryption and decryption: public key cryptography. As said, public key cryptography differs from private key cryptography that each user now gets two keys, namely the public key and the private key. Its major advantage over private key cryptography is that the public key, which is required for encryption, can be made known to the world. This property makes key management easier because we no longer require a secure channel just for the keys.

But public key cryptography is not equivalent to RSA. In fact RSA is a popular implementation of the theory laid down in Diffie and Hellman's paper:

W. Diffie and M. Hellman, "New directions in cryptography," IEEE Transactions on Information Theory, IT-22:6, 1976, pp. 644-654.

From now on, we will focus on how RSA works and how we make RSA works for us in the problem set.

How RSA works

RSA does not work with characters, but with integers. The ASCII standard representation of a single character is a 7-bit integer. We will represent each block of two characters as a 14-bit integer obtained by concatenating the ASCII representations of the two characters. Code have been provided to convert character strings to lists of 14-bit integers and vice versa.

In the RSA scheme, you select two large prime numbers p and q. Recall that a prime number is a positive integer with no divisors other than itself and 1. You then define

n = pq
m = (p-1)(q-1)

The significance of these definitions is that

with very few exceptions, almost all numbers less than n, when raised to the power m and the result is taken modulo n, give 1 -- x < n: x^m mod n = 1;
no one knows how to compute m efficiently, even knowing n.

(Notation: [a=b] mod m means that a mod m = b mod m, where a mod m is the remainder obtained when dividing a by m using ordinary integer division; equivalently, a-b is divisible by m. For example, [17 = 22] mod 5.)

Now you pick a number e < m relatively prime to m, i.e., such that e and m have no factors in common except 1; then compute a number d such that [de = 1] mod m, i.e. de = km+1 for some k. We'll see how to do this below. Your public key, which you can advertise to the world, is the pair (n,e). Your private key is (n,d). Anyone who wants to send you a secret message s (represented by an integer) encrypts it by computing E(s), where

E(s) = s^e mod n

That is, if the plaintext is represented by the number s, then the ciphertext E(s) is obtained by raising s to the power e, then taking the remainder modulo n. The decryption process is exactly the same, except that d is used instead of e:

D(s) = s^d mod n

Why does that work? In fact, the functions E(s) and D(s) are inverses.

D(E(s)) = (s^e)^d mod n
             = s^de mod n
             = s^1+km mod n
             = s(s^m)^k mod n
             = s(1)^k mod n
             = s mod n
             = s

The integer s representing the plaintext must be less than n. That's why we break the message up into character chunks. Also, this only works if s is relatively prime to n, i.e., has no factors in common with n other than 1. If n is the product of two large primes, then all but negligibly few messages s < n satisfy this property. If by some freak chance s and n turned out not to be relatively prime, then the code would be broken; but the chances of this happening by accident are insignificantly small. It is easy to compute d efficiently from e and m. We show how below. Thus, to use the RSA system:

pick large primes p and q;
compute n = pq and m = (p-1)(q-1);
choose e and use this to compute d such that [de = 1] mod m;
publish the pair (n,e) as your public key, but keep d secret to yourself

How secure is RSA? At present, the only known way to obtain d from e and n is to factor n into its prime factors p and q, then compute m and proceed as above. But no one knows how to factor large integers efficiently, despite centuries of effort by number theorists. Using known methods, factoring n = pq where p and q are 1000-digit primes would require years on today's fastest supercomputers. Until someone comes up with an efficient way to factor, or discovers some other way to compute d from e and n, the system is reasonably secure for all practical purposes.

Digital Signatures

So how do we do signatures? As we have outlined in the high-level description, We apply a compression function (also called a hash function) that transforms the message to a single, relatively small number h. In general, there will be many messages that produce the same hash value. Now transform the hash value h using your private key to give D(h). This is your digital signature, which you transmit along with the message. Anyone who receives the message can authenticate the signature by transforming it using your public key, giving E(D(h)) = h, and checking that this gives the same result as applying the compression function to the message.

Anyone who wanted to forge a message claiming to be from you must produce the number D(h). Anyone can compute the hash value h of the message, since the compression function is public. But as long as you are the only one who knows your private key, only you can produce D(h).

Digital Time Stamping

One problem that needs to be solved before electronic documents can be considered legally binding is the need to verify the time at which the document was created. This is called a digital time stamp. It is much like using a public notary, but in fact can be easier, cheaper, and more secure. Typically, a digital time stamper should be a neutral party such as the government (hmmm...). One means of digital time stamping is to have the owner of the document calculate the hash value of the document and send this to the time-stamper. The time-stamper then uses its own private key to encrypt a message containing that hash value and the time that the request was received. This is the time-stamped message. The time-stamper then returns the time-stamped message to the sender, who can append it to the document. If at a later date someone challenges the date when the document was written, they can decrypt the time stamp using the public key of the time-stamper and verify that the hash value of the document matches the hash value included in the time stamp. The security of the system rests in the security of the time-stamper's private key; a person who gains access to the private key of the time-stamper would be able to forge another date.

Added security can be built in by getting a document time-stamped by multiple time-stamping authorities. A person would then have to obtain the private key of all the authorities to be able to forge a date. This feature would probably be implemented by a hierarchy of time-stampers; that is, when one sends a request for a time-stamp, the time-stamper would include a time-stamp from another authority, and so on. This chain can only be forged if the forger know the private keys of all the authorities on the chain. Digital time-stamping also has the added advantage that the sender does not need to reveal the whole document to the time-stamper, as one does with a public notary, because only the hash value is sent. Finally, altering the document would be very difficult, since the time-stamp would be invalid for any document that did not have the same hash value.

To illustrate a use for digital time-stamping, let's consider this example with Bill and Hillary. After exchanging a series of torrid secret email messages for a couple of hours, Hillary and Bill decide to get married the next day. Hillary wants to have a prenuptial agreement because she makes much more money than Bill. Hillary's lawyer, who is vacationing in Europe, cannot give them a physical copy of the prenuptial agreement to sign, so she emails them an online version. After they both read the document, they sign it using the method discussed in the previous section. They calculate the hash value of the document with signatures and have it time-stamped. A few years later when they file for divorce, Bill claims that Hillary got access to his private key and forged the document after they got married. Hillary produces the time-stamp, proving that the prenuptial agreement was signed before they were married. Bill's only recourse now is to try to claim that Hillary knew his digital signature back before they were married.

Implementing RSA

We have talked about key representation in the problem set. But we have not covered how we implemented RSA-transform:

  (define (RSA-transform number key)
    (expmod number (key-exponent key) (key-modulus key)))

This implements both the encryption and decryption operations, depending on whether the key is public or private.

To generate RSA keys, we first need a way to generate primes. The most straightforward way is to pick a random number in some desired range and start testing successive numbers from there until we find a prime. The following procedure starts searching at a randomly chosen integer between smallest and (+ smallest range):

(define (choose-prime smallest range)
  (let ((start (+ smallest (rand range))))
    (search-for-prime
      (if (even? start)
        (+ start 1)
        start))))

(define (search-for-prime guess)
  (if (fast-prime? guess)
    guess
    (search-for-prime (+ guess 2))))

The test for primality is the Fermat test. This is a probabilistic method, which with high probability determines whether or not n is prime. It is based on Fermat's Theorem, which states that if n is prime, then all numbers a in the range 0 < a < n satisfy [a^n-1 = 1] mod n; and if n is not prime, then at most half do (with a relatively small set of exceptions called Carmichael numbers, which are composite numbers that look prime as far as Fermat's test is concerned. There are so few of these we won't worry about hitting one).

(define (fermat-test n a)
  (= (expmod a n n) a))

(define (fast-prime? n)
  (if (> n 7)
    (and (fermat-test n 2) (fermat-test n 3)
         (fermat-test n 5) (fermat-test n 7))
    (or (= n 2) (= n 3) (= n 5) (= n 7))))

Now we can generate a public RSA key and matching private key. We'll represent this as a list of two keys, along with functions make-key-pair, public-key and private-key to create and access parts of a pair of keys:

(define public-key  first)
(define private-key second)
(define (make-key-pair public private)
  (list public private))

The following procedure generates an RSA key pair. It picks primes p and q that are in the range from 2^r to 2*2^r so that n = pq will be in the range 2^2r to 2^2r+2. In order to encode two characters per number, we need primes larger than 2¹⁴. In this problem set, we're using small values of n (2¹⁸ to 2²⁰) because we want you to play around with cracking an RSA system. By starting with larger random numbers, you can use the same method to produce a system that really is secure.

(define (generate-RSA-key-pair r)
  (let* ((size (expt 2 r))
         (p (choose-prime size size))
         (q (choose-prime size size)))
    (if (= p q)       ; check that we haven't chosen the same prime twice
      (generate-RSA-key-pair r) ; VERY unlikely
      (let* ((n (* p q))
             (m (* (- p 1) (- q 1)))
             (z (select-keys m))
             (e (first z))
             (d (second z)))
        (make-key-pair (make-key n e) (make-key n d))))))

Computing the Keys

The public and private keys e and d must satisfy:

[de = 1] mod m

i.e., e and d must be multiplicative inverses mod m. One can show that a solution to this equation exists if and only if the greatest common divisor (gcd) of e and m is 1. We can pick e randomly in the range 0 < e < m, then calculate the gcd of e and m using the Euclidean algorithm. If the gcd is not 1, we pick again. The number d falls out as a by-product of this computation.

The Euclidean algorithm is based on the fact that for any positive integers k and m, there is a unique quotient q and remainder r such that

k = qm + r
0 <= r < m

Moreover, the gcd of k and m, call it g, is the same as the gcd of m and r. This gives us a recursive means of calculating g and integers s and t such that sk + tm = g: recursively compute u and v such that um + vr = g; then observe

g = um + vr
  = um + v(k - qm)
  = vk + (u - vq)m

so we can take s = v and t = u - vq.

(define (Euclid m n)
  (if (zero? n)
    (list 1 0 m)
    (let* ((q (floor (/ m n)))
           (r (modulo m n))
           (z (Euclid n r))
           (u (first z))
           (v (second z))
           (g (third z)))
      (list v (- u (* q v)) g))))

Thus the call (Euclid e m) can be used to test whether the gcd of e and m is 1. If so, it also returns s and t such that se + tm = 1. The desired multiplicative inverse d of e is s mod m (we have to reduce mod m because the s returned by euclid might be negative).

Finally, to generate digital signatures for encrypted messages, we need a standard compression function. In this problem set, we'll simply add the integers modulo 2¹⁸. (In practice, people use more complicated compression schemes than this.)

(define (compress intlist)
  (letrec ((add-loop
            (lambda (l)
              (if (null? l)
                0
                (+ (head l) (add-loop (tail l)))))))
    (modulo (add-loop intlist) (expt 2 18))))

Real Life

In real life, we seldom do encryption/decryption like what we did in the problem set because RSA, as well as other secure public-key system, are quite slow compared to many private-key crypto-systems. So how can we get the best of both worlds? Software like PGP actually use a public key algorithm to encrypt a session-key that is sent with each message. That session-key is actually the key for a private key system. The session-key is small (compared to a general message), and so you save a lot of time by using a fast algorithm for a larger part but still get the roughly same level of security. This not only makes the encryption/decryption process faster, it also provides the advantage of enveloped data. Another difference between our system and reality is that signatures are usually embedded in the encrypted message, i.e. we sign the original message, then we encrypt the signed message.

Let's take a look at message signing. How should we choose a good algorithm to compute the signature? Or we could ask what quality makes a signature computing algorithm good? Should it reveal any information about the original message? A nice property would be that the signature changes half (on average) of its bits if one bit in the original message flipped. However, we didn't implement that in our system.

It may seem crazy to factor such a big number in part 3. However, that number is only 45 bit long. What do we use in real life? Numbers like 1024 and 2048 are very common. How long does it take to factor such a number by brute-force? Each bit roughly doubles the time required... And if you are very interested in this kind of brute-force code breaking, try visit distributed.net and see their RC5-64 project.

Function	Arguments	Description
RSA-encrypt	msg, key	encrypt msg using key
RSA-decrypt	msg, key	decrypt msg using key
generate-RSA-key-pair	r	generates an RSA key pair of size 2^r
RSA-transform	number, key	fundamental RSA transformation of number by key
compress	msg	computes an integer value for the msg; used for signatures

Issued: Tuesday, September 7 Due: Thursday, September 23, 4PM

Issued: Tuesday, September 7
Due: Thursday, September 23, 4PM