COS 226 Programming Assignment 8

Factoring

Write a program to factor huge integers. Read an integer N from the command line, then print it as the product of two integers (or note that it is prime) as in the following brute-force solution:
     #include <stdio.h>
     #include <limits.h>
     main(int argc, char *argv[])
       { int d;
         int N = atoi(argv[1]);
         for (d = 2; d < N/d; d++)
           {
             if (N % d == 0) break;
           }
         if (N % d == 0) 
              printf("%d = %d*%d\n", N, d, N/d);
         else printf("%d is prime\n", N);
       }
This code simply checks each integer greater than 1 to see whether it divides evenly into N. The header limits.h gives access to constants like INT_MAX that are not used by this program, but which you may need to avoid overflow in your own code.

We stop after finding any factor since we know that we could find all factors, if desired, by using the same method for the factors.

The solution just given is not quite a brute-force solution because it already contains a test that improves performance by a substantial amount: If we do not find a factor by the time that we reach the square root of N, then we certainly will not find a factor larger than that value, and we can declare N to be prime. A truly naive brute-force implementation might use time proportional to N (try every integer less than N; this improvement improves the running time to be proportional to the square root of N, so we can use it to factor any 32-bit integer in a fraction of a second. In this assignment, we explore the problem of factoring larger integers.

There are two reasons that the above code is not useful for factoring huge integers. First, the standard int data type supports 32-bit integers, so we would get overflow and/or incorrect results if we were to try to use it for larger numbers. Second, it is too slow. Even if we could deal with (say) 128-bit integers, we could never afford to try 2^64 divide operations to check for factors.

For this assignment, you will develop an implementation of a faster algorithm for factoring that is known as Pollard's rho method. The method is brought to us by the magic of number theory that is beyond the scope of this course, but it is an easy computation, based on iterating the function f(x) = x*x + c (mod N). Specifically, we start at some value x = x0 and maintain, for i = 1, 2, 3, ... , one variable a that tracks the result of i iterations of f and another variable b that tracks the result of 2i iterations of f, computing the greatest common divisor d of |a - b| and N. It turns out that whenever we find that d is not 1, we can infer that it is a factor of N, so we are done. (If you know number theory and are interested in details, see Pollard's paper in BIT 15, 1975, pp. 331-334.) For example, the following table traces the computation when we use it to factor 299.
x0 = 1
 c = 1
             a    b  |a-b|  d
           -------------------
             2    5    3    1
             5   79   74    1
            26  174  148    1
            79  105   26   13

          299 = 13*23
The key to a simple implementation is to note that we can maintain the values of a and b as required by simply setting a = f(a) and b = f(f(b)) in the inner loop. The loop always terminates, since a and b must eventually be equal (think about why this must be true!) and the greatest common divisor of 0 and N is N. Pollard's method is a randomized algorithm that factors with high probability if the starting value of a and the additive constant c are chosen at random.

Your first task for this assignment is to write a program small.c to factor int N that uses Pollard's method to compute the factorization. Instrument your code to also print the chosen values of x0 and c, the number of times the loop iterates, and, for N less than 1000, a table like the one shown above. Use your program to factor 713, 6319, 10609, and 42037. Completing this part of the assignment successfully is worth 6 points. Note that the numbers you have to factor are so small that you do not have to worry about overflow (see the first Extra Credit below).

You will need to implement a function to compute the greatest common divisor of two integers. For this purpose, use a recursive gcd function based on the fact that the greatest common divisor of a and b is the same as the greatest common divisor of b and a % b. (This classic method is known as Euclid's algorithm.) Incidentally, use a simple if test, not a library function, to compute the absolute value.

We will hardly notice the difference in performance between Pollard's algorithm and the brute-force method unless we want to factor huge integers. To develop an implementation for this purpose, we need an abstract data type (ADT) that supports all the usual arithmetic operations on such objects. This is an exercise in ADT design and implementation, a topic which is covered briefly in COS 126; in some detail in Chapter 4 of Algorithms in C; and in great detail in Hanson's book C Interfaces and Implementations and in COS 217. For this assignment, you can use the interface bignum.h. At Princeton, there are hundreds of implementations of this interface lying around, because it is a COS 217 assignment. A trustworthy implementation of this ADT is provided in ~cs226/prog8_files/ by including the objects bignum.o and libgmp.a during your compilation.

Your second task for this assignment is to write a program big.c that uses the BigNum ADT to factor huge integers, using Pollard's algorithm. Use your program to factor 5352499, 670726081, 9449868410449, 1082154235955237, 121932633334857493, and 13565005454706599869.

Experiment with other huge integers, and include in your readme file a discussion of the number of iterations that you think are required, in general.

Extra credit A. Implement small.c such that it is functionally equivalent to the sample program given above (but much faster). That is, your program needs to factor any int, avoiding overflow in the intermediate calculations.

Extra credit B. Find the largest prime pair (N and N+2 both prime) that you can in about ten seconds of computer time. An impressively large value of N will earn you an extra point.

Due: 11:59PM Sunday, April 18.