COS 126: RSA Public-Key Cryptosystem

COS 126

RSA Public-Key Cryptosystem

Programming Assignment

Due: 11:59pm

Overview. Write a program to implement the RSA public-key cryptosystem. The RSA (Rivest-Shamir-Adleman) cryptosystem is widely used for secure communication in browsers, bank ATM machines, credit card machines, mobile phones, smart cards, and the Windows operating system. It works by manipulating integers. To thwart eavesdroppers, the RSA cryptosystem must manipulate huge integers (hundreds of digits). The built-in C type int is only capable of dealing with 16 or 32 bit integers, providing little or no security. You will design, implement, and analyze an extended precision arithmetic data type that is capable of manipulating much larger integers. You will use this data type to write a client program that encrypts and decrypts messages using RSA.

Note: the training wheels are off on this assignment - you're starting with a blank screen, not our code. This means that you must pay particular attention to the process of building up your program from scratch. Consider carefully how your program should be structured and how you are going to implement the various functions before plunging in. Remember to thoroughly test and debug each function as you write it.

The RSA cryptosystem. As discussed in Lecture S1, the RSA public key cryptosystem involves three integer parameters d, e, and n that satisfy certain mathematical properties. The private key (d, n) is known only by Bob, while the public key (e, n) is published on the Internet. If Alice wants to send Bob a message (e.g., her credit card number) she encodes her message as an integer M that is between 0 and n-1. Then she computes:

E(M) = M^e mod n

and sends the integer E(M) to Bob. As an example, if M = 2003, e = 7, d = 2563, n = 3713, then Alice computes

E(M) = 2003⁷ mod 3713 = 129,350,063,142,700,422,208,187 mod 3713 = 746.

When Bob receives the encrypted communication E(M), he decrypts it by computing:

M = E(M)^d mod n.

Continuing with the example above, Bob recovers the original message by computing:

M = 746²⁵⁶³ mod 3713 = 2003.

To check your arithmetic, you may use bc, maple, or xmaple.

Part 1 (extended precision arithmetic). The RSA cryptosystem is easily broken if the private key d or the modulus n are too small (e.g., 32 bit integers). The built-in C types int and long can typically handle only 16 or 32 bit integers. Your main challenge is to design, implement, and analyze an extended precision arithmetic data type that can manipulate large (nonnegative) integers. To make it easier to check your work, we recommend working with integers represented using familiar decimal (base 10) notation. (We note that you could achieve superior performance by using a base that is a power of 2, e.g., 256 or 32,768.) Your data type will support the following operations: addition, subtraction, multiplication, division, modular exponentiation, and comparison. You may use the header file XP.h.

Data structure. The first and most critical step is to choose an appropriate data structure to represent N-digit extended precision integers. A natural approach is to store each digit as an element in an array. To avoid pointers and memory allocation and deallocation headaches, consider packaging the array up in a struct as follows:

#define N  100         // maximum number of digits supported
#define NN (2*N + 1)   // number of digits allocated

typedef struct {
   unsigned char digit[NN];
} XP;

We declare our array to have slightly more than N elements to give ourselves a bit of extra scratch space to store intermediate results - the exact reason for using 2N + 1 will become clear later. Advanced students (who like pointers) may wish to make it a true first class ADT as in Sedgewick, p. 181.

Initialization. Implement a function XPinitDecimal() that takes a string of decimal digits as input, and returns the extended precision integer corresponding to this string. If you represent integers using decimal notation, this will be straightforward; if you use base 2 or 256, consider using Horner's method to do the conversion.

Display. Implement a function XPshowDecimal() that takes an extended precision integer as input and prints out a (decimal) representation of that integer.

Addition. Implement the function XPadd(). It should take two extended precision integers as input and return a third extended precision integer that is the sum of the two. Implement the method you learned in grade school. Observe that if a and b are 2N-digit integers, the sum will have at most 2N + 1 digits.

Once you've written the above three function, you should be able to write code like:

XP a, b, c;                       // declare extended precision variables
a = XPinitDecimal("123456789");   // a   = 123456789
b = XPinitDecimal("54545454");    // b   =  54545454
c = XPadd(a, b);                  // c   =  a + b
XPshowDecimal(c);                 // print 178002243

Random number generation. Implement a function XPrand() that takes as input an integer n and returns a pseudo-random extended precision integer of at most n digits by choosing each digit at random. This will be useful in debugging the subsequent code, and also in analyzing its complexity.

Parity. Implement the function XPisOdd() that takes an extended precision integer as input and return 1 if and only if it is odd. You may assume that the base is even, so that this amount to checking the parity of the least significant bit.

Comparisons. Implement the following functions: XPgreater(), XPless(), and XPeq(). They should take two (2N+1)-digit extended precision integers as input and return 1 if and only if the first integer is greater than (less than, equal to) the second.

Subtraction. Implement the functions XPsub(). It should take two (2N+1)-digit extended precision integers as input and return a third extended precision integer that is the difference of the two. You should check that the first integer is greater than or equal to the second before subtracting; otherwise output an error message.

Submit everything above for the warmup.

Multiplication. Implement the function XPmul(). It should take two extended precision integers as input and return a third extended precision integer that is the product of the two. Observe that if a and b are N-digit integers, the product will have at most 2N digits.

Division. Integer division is the most complicated of the arithmetic functions. Here is one of the simplest algorithms to compute q = a / b and r = a mod b, where a and b are extended precision integers, with b not equal to 0. This algorithm requires the least amount of code, but its correctness may not be immediately apparent.

div(a, b) {
    if (a < b)
        return (0, a)
    (q, r) = div(a, 2b)
    q = 2q
    if (r < b)
        return (q, r)
    else
        return (q + 1, r - b)
}

Note that you should use extended precision arithmetic to perform each of the intermediate computations. The natural way to implement this algorithm is using recursion. Observe that if a and b are 2N-digit numbers, then the intermediate results will never have more than 2N+1 digits.

Part 2 (encryption and decryption). Now it's time to put all your hard work to use. The encryption and decryption processes are identical, except that they involve different encryption/decryption keys. Write a program rsa.c that takes in one command line arguments (the name of the encryption key), reads in a decimal string from standard input, and applies the RSA function to the message. It should work as follows. In real applications, you might want to send an ASCII text message instead of a decimal integer; however, you are only responsible for handling decimal inputs.

% gcc126 rsa.c xp.c -o rsa                          # compile
% rsa bob.pub < message.txt > message.encrypted     # Alice encrypts
% mail bob@princeton.edu < message.encrypted        # Alice emails Bob
% rsa bob.pri < message.encrypted                   # Bob decrypts

RSA function. The key step is to implement the RSA function (a.k.a. modular exponentiation): given three extended precision integers a (the base), b (the exponent), and n (the modulus), compute c = a^b mod n. You will implement this as the function XPrsa() in xp.c.

A natural way to compute c is to set x = 1; then multiply it by a, b times; then set c = x mod n. This method suffers from two serious performance bugs.

First, observe that if the inputs a, b, and n are N-digit decimal integers, then the intermediate result x could contain as many as 10^N digits. To see the great importance of this, observe that if N is 30, the intermediate result could consume a terrabyte of memory!

The second problem is that repeating anything b times will be slow. If b is even moderately large (say 30 decimal digits), it will take centuries to execute, even on the fastest supercomputer.

We use two ideas to overcome the above obstacles. First, the following mathematical identity is useful to keep the intermediate numbers small.

xy mod n = ((x mod n) * (y mod n)) mod n

Second, we use the technique of divide-and-conquer or repeated squaring. We incorporate both ideas into the following pseudocode. This allows us to perform modular exponentiation in a reasonable amount of time (when an appropriate base case is added).

  t = a ^ (b / 2) mod n
  c = (t * t) mod n
  if (b is odd)
      c = (c * a) mod n
  return c

Note here that the division is integer division, and each of the intermediate computations is performed using extended precision arithmetic. The exponentiation computation is naturally implemented using recursion. As above, if a, b, and n are N-digit numbers, then the intermediate results in this function will never have more than 2N digits. However, since t * t could be 2N digits long, the intermediate results in the mod (division) function might be as many as 2N + 1 digits. This explains why we allocate 2N + 1 digit.

Part 3 (analysis). Perform a complexity analysis of the following operations for N-digit extended precision arithmetic: addition, multiplication, division, RSA encryption (RSA exponent is small, say 7), and RSA decryption (RSA exponent has roughly N digits) algorithms. For each operation, give the exponent and coefficient of the leading term in its asymptotic running time, e.g., 1.3 × 10^-5 N² seconds. Justify your answers with empirical and/or theoretical evidence. Using this analysis, estimate the largest input (measured in number of digits) that each of your functions could handle in 1 day (86,400 seconds). Depending on your conclusions, you may wish to modify your design and implementation to improve performance. You may find it helpful to use the program timing.c to estimate the running times of your functions.

Legal notice. It is a violation of US law to export your solution for this assignment to foreign governments or embargoed destinations (Cuba, Iran, Iraq, Libya, North Korea, Serbia, Sudan, Syria, and Taliban-controlled areas of Afghanistan as of January 2000). It is also illegal to import your solution into several countries, including France, Iran, Iraq, and Russia.

Extra credit. Write a program that compute two large pseudo-random prime numbers p and q of a specified number of digits. Compute φ = (p - 1) (q - 1), and select a small integer e that is relatively prime with φ. Use these to generate a public key (e, n) and its companion secret key (d, n).

This assignment was created by Bob Sedgewick and Kevin Wayne.