Princeton University
COS 217: Introduction to Programming Systems

Assignment 4: A Primality Tester Program


Purpose

The purpose of this assignment is to help you learn about team software development. It also will give you more experience with advanced C programming, creating and using ADTs, and the GNU/UNIX programming tools, especially emacs, gcc, gdb, and make.


Your Task

Your task in this assignment is to work within a team to write a C program that tests large non-negative integers for primality.


Background: Primality Testing

Large integers can be difficult to break into prime factors. Modern cryptographic methods depend on this fact. In the RSA public-key cryptography system, someone who wishes to receive messages publishes an integer k that is the product of two large (e.g., 200-digit) prime integers p and q. Anyone knowing k can encode a message, but only the person who knows p and q can decode the message.

How can you, the message receiver, find some large primes p and q? The simplest way is to choose a random 200-digit integer, and test whether it is prime. But how do you test whether an integer n is prime? You can try dividing by each prime integer up to sqrt(n). But there are approximately 4 · 1097 primes less than sqrt(10200), so at the rate of one per microsecond it would take you 1074 times the age of the universe to test n in this way.

Fortunately, it is possible to learn that an integer is composite (not prime) without even learning its factors. This can be done in a more reasonable length of time. You can use the following mathematical fact about prime integers: for a prime integer p and an integer a in the range 1 ≤ a < p:

ap-1 mod p = 1

But for a typical composite c, ac-1 mod c ≠ 1 for at least half the a's.

You can test an integer n for primality by choosing a random a, raising it to the (n-1)st power modulo n, and seeing if you get 1. If you get something other than 1, then you know n is composite. If for k random a's you get 1, then the chance that n isn't prime is about 2-k. So if you do, say, forty tries and always get 1, then n is almost certain to be prime.

So, to make a primality testing program, you read in an integer n, choose forty random a's and compute an-1 mod n. If you always get 1, you report "probably prime." Otherwise you report "composite."


Background: Big Integers

Now the only problem is that an int on the hats computers (when building via the gcc217 command) can hold only a 32-bit (10 decimal digit) integer, not a 200-digit integer. So how do you do exponentiation and modulus on these big integers?

You should represent a big integer as an array of digits in the base b. You are used to base 10, of course; but you should use base b = 4294967296, since that's convenient on the computer. An unsigned int can hold integers in the range 0..4294967295, so a 200-digit decimal integer is just an array of about 21 unsigned ints.

Big integers can be added and subtracted by the same rules that you learned in grade school (before they let you use pocket calculators). Remember carries and borrows? Just start at the low-order digit and work your way up.

To multiply, you can use a recursive approach based upon these mathematical rules:

b = 0  ⇒ a · b = 0
b even ⇒ a · b = (a · b/2) · 2
b odd  ⇒ a · b = ((a · b/2) · 2) + a

where "/" is truncating integer division. (Recall that multiplying and dividing by 2 on a digital computer is easy.)

To implement the division and modulus operations, again you can use a recursive approach. These are some pertinent mathematical rules:

a < b  ⇒ a/b  = 0 rem a
r < b  ⇒ a/(2·b) = q rem r ⇒ a/b = (2·q) rem r
r ≥ b  ⇒ a/(2·b) = q rem r ⇒ a/b = (2·q) + 1 rem (r-b)

You also have the problem of exponentiating big integers. To raise a to the power k, all you have to do is multiply a by itself k times. The problem is that if k = n-1, and n is a 200-digit integer, this will take much longer than the age of the universe. But you can use the following identities:

k even ⇒ ak = (ak/2)2
k odd  ⇒ ak = a · ak-1

But there's still a problem here. The integer a is likely to be about 200 decimal digits, and if you take it to the 10200 power then you get an integer with more digits than atoms in the universe; this won't fit onto your computer. But you don't really have to compute an-1; you have to compute an-1 mod n. That is a much smaller integer, of about 200 decimal digits. You can use the following mathematical identity while doing the exponentiation:

(a · b) mod n = ((a mod n) · (b mod n)) mod n

Thus, you can keep all the intermediate results down to 400 digits (or 200, if you're clever during the multiply, but don't worry about that).


The Programs

Create a program named prime. The program should read a line from the standard input stream. The line should be a non-empty sequence of decimal digits, that is, ASCII characters in the range 0-9, terminated by an end-of-line mark. The program should convert the digit sequence to a big integer. To do that, the program can multiply values in the range 0..9 by powers of ten and sum them up.

The program should test the big integer for primality. Then the program should write to the standard output stream a line consisting of (1) the sequence of decimal digits, (2) a space, and (3) the message "probably prime" or "composite" as appropriate. The program should repeat until end-of-file of the standard input stream.

If the prime program reads 0 from the standard input stream, then the program should write to the standard output stream a line consisting of 0, a space, and the message "neither". Similarly, if the prime program reads 1 from the standard input stream, then the program should write to the standard output stream a line consisting of 1, a space, and the message "neither".

If the program reads a malformed line from the standard input stream, then it should write to the standard output stream a line consisting of (1) the sequence of decimal digits, (2) a space, and (3) the message "malformed". Then it should continue.

For example, if the program reads these four lines from the standard input stream:

204587985712111
000013
98457b458112
                   ⇐ blank line
55743769241161
0
1

then the program should write these four lines to the standard output stream:

204587985712111 probably prime
000013 probably prime
98457b458112 malformed
 malformed
55743769241161 composite
0 neither
1 neither

In addition to the prime program, you also should write "testing programs," that is, programs that test your add, subtract, multiply, divide, etc. functions. You also might find it helpful to write a function that prints a given big integer in decimal. To convert a big integer to decimal format, you must start by calculating the last digit; just divide by ten and take the remainder. Your testing programs might find such a function handy.

Finally, you should develop a test data file, or perhaps several test data files, containing large non-negative integers that are known to be prime or composite. You then should run your prime program on those files to verify that your program works properly.

Incidentally, the Linux dc command might be helpful when developing your test data files. The Linux bc command also might be helpful.


The Team Approach

We will assign you to a software development team composed of a subset of students from your precept. Together with your teammates, your job is to write module interfaces. Then, as an individual, your job is to write module implementations (or partial module implementations) which, when combined with the module implementations (or partial module implementations) written by the other members of your team, define the prime program and your testing programs.

Each team will have an adviser. Your team's adviser will be either your preceptor or some CS faculty member or student who has volunteered for the role. It is up to the team, with guidance from the adviser, to decide how to divide the programs into modules.

Each team also will have a grader. In many cases, the team's grader and adviser will be the same person.

Your team should choose three students to serve leadership roles:

The choice of Manager, Architect, and Builder are subject to the adviser's approval. The Architect must have done well (that is, must have scored in the high-90s) on the course's first three programming assignments.

The mapping of team members to (partial) module implementations also is subject to the adviser's approval. The mapping should be balanced: the Architect, Manager, and Builder should write one or more (partial) module implementations, and other team members should write two or more (partial) module implementations. The work assignments should be redundant: you should assign at least two team members to the development of each (partial) module implementation. Modules that are specific to testing programs count as "(partial) module implementations." So do test data files.

Your team will need to communicate to write module interfaces, and to decide which team members will write which (partial) module implementations. At least two precepts will be devoted to team meetings. We anticipate that your team will need to schedule additional meetings outside of precept time.

Your team also should communicate by e-mail. You should create an e-mail address list containing the addresses of all members of your team, your team's adviser, and your team's grader. Use that list for all communication between your teammates. (Your adviser needs to be part of your e-mail conversations to provide advice. Your grader needs to be part of your e-mail conversations to evaluate your work properly.)


Logistics

Develop on hats. Use emacs to create source code. Use make to build. Use gdb to debug. Use meminfo to check for dynamic memory management errors, and to make sure that the program frees all dynamically allocated memory. Use splint and critTer to check for stylistic errors.

Your team will have a code repository. Your team code repository should contain the module interface (.h) files that you and your teammates write together. It also should contain the (partial) module implementation (.c) files that you and your teammates write individually. Each (partial) implementation file in the repository should have a name that is prefixed by the initials of the team member who wrote it. For example, your repository might contain files named AA_file1.c or BD_file2.c.

Use Subversion to implement your code repository. The supplementary document entitled Version Control with Subversion describes the code repository that we have created for your team and how to use Subversion (via the svn command) to manipulate it.


Collaboration Rules

The normal assignment collaboration rules, as described in the course "Policies" Web page, apply to this assignment. For this assignment these additional rules apply.

You are allowed to:

You are not allowed to:

Regrettably, we must bring violations of the collaboration rules to the attention of the University's Committee on Discipline. Please be careful!!!


Deliverables

The assignment covers a two-week (that is, a four-precept) period. These are the deliverables throughout that period:

By the end of the day of the first precept: a "startup" e-mail.

The Manager should create a team e-mail list, and send a "startup" e-mail to that list. It should indicate the names of the Manager, Architect, and Builder. Remember that the team's adviser and grader should be members of the e-mail list.

By the end of the day of the second precept: a set of interfaces in the code repository, and a "work assignments" document.

The Architect should place in the team's code repository a complete set of interface (.h) files. You can change the interface files subsequently, with notification and justification sent to the team's e-mail list.

The Manager should create a document named WorkAssignments in the team's code repository. The document should indicate which team members are writing each (partial) module implementation.

By the end of the day of the third precept: a "status report" document.

The Manager should create a document named StatusReport in the team's code repository. The document should contain a description of work completed so far by each team member. The Manager may compose the descriptions, or may collect them from the team members. One or two sentences per team member is sufficient.

By the assignment due date: your complete submission.

You, as an individual, should submit:

The first dependency rule in your makefile should have the target "all". That rule should cause make to build your prime executable file and your team's testing executable binary files. The remaining dependency rules should build (1) .o files using the .h and .c files that you submitted, and (2) executable files using the .o files. Your makefile should encode file dependencies to allow for partial builds.

Your readme file should contain:

You should submit your individual work electronically on hats using these commands:

submit 4 implementationFilesThatYouWrote
submit 4 complementaryImplementationFiles
submit 4 interfaceFiles
submit 4 makefile
submit 4 readme

Remember that testing programs and test data files count as "implementation files."


Notes

Create "private" header files.

Suppose a program contains a module named m, consisting of interface file m.h and implementation file m.c. Furthermore, let's say that m.c is large -- so large that it would be unreasonable for one programmer to write all of it. To facilitate multi-programmer development, the team could split the implementation into two files: m1.c and m2.c. Then some team members could work on m1.c, and others independently could work on m2.c.

Further suppose, however, that both m1.c and m2.c need to use the same data types. It would be stylistically bad to place the definitions of those types in m.h; doing so would reveal the data types to clients, thus violating encapsulation. It would be stylistically horrible to define the data types redundantly within both m1.c and m2.c; redundancy introduces the possibility of inconsistency, especially over time. So where should the shared data types be defined?

One solution is to create another file named mprivate.h. mprivate.h is a "private" header file for the m module which defines the data types used by m1.c and m2.c. Both m1.c and m2.c #include mprivate.h, and thus have access to the definitions of the data types that they need. But mprivate.h is not advertised to clients, and so encapsulation is not violated.

For this assignment, your team should create "private" header files as required.

Write reasonably efficient code.

Your grade for the assignment won't be based primarily upon the efficiency of your code. Nevertheless, your code should be reasonably efficient. As general guidelines...

When built normally (that is, using gcc217 commands with no flags), and when performing forty tests, our reference program consumes approximately 29 seconds of hats CPU time to determine that this integer (split into multiple lines for readability):

6864797660130609714981900799081393217269435300
1433054093944634591855431833976560521225596406
6145455497729631139148085803712198799971664381
2574028291115057151

is "probably prime".

Incidentally, it's possible to speed up your program by specifying flags to the gcc217 command:

When built with the -O3 and -D NDEBUG flags, when given the aforementioned integer, and when performing forty tests, our reference program consumes approximately 20 seconds of hats CPU time to reach its "probably prime" conclusion.

Also incidentally, you can use the Linux time command to determine how much CPU time your program consumes:

time prime < somefile

Use opaque pointer types.

As described in lectures and precepts, you should use the opaque pointer type technique to make sure that clients of your objects cannot access the fields of your objects. If you don't use that technique, then you should discuss the matter ahead of time with your adviser and your grader, and should provide a thorough justification, perhaps based upon efficiency, in your readme file.

Understand all of the code.

It couldn't hurt for you to understand the algorithms used by the modules that you don't have to implement -- if only because exam questions could be based upon those algorithms.


Grading

Your grade for the assignment will have a team component and an individual component. The team component will be based upon the quality of your team's interface files. The individual component will be based upon the quality of your individual (partial) implementation files.

We will grade your work on quality from the user's and programmer's points of view. From the user's point of view, your code has quality if it behaves as it should, that is, if it accurately classifies given integers as prime or composite, and does so with reasonable efficiency.

From the programmer's point of view, your code has quality if it is well styled and thereby easy to maintain. In part, good style is defined by the splint and critTer tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document. The more course-specific style rules listed in the previous assignment specifications also apply. Proper function-level and file-level modularity will be a prominent part of your grade. To encourage good coding practices, we will deduct points if gcc217 generates warning messages.

The quality of your participation during the precepts devoted to team meetings will affect your class participation grade for the course substantially.


This assignment was written by Andrew Appel with contributions by Robert M. Dondero, Jr.