The purpose of this assignment is to help you learn about team software development. It also will give you more experience with advanced C programming, creating and using ADTs, and the GNU/UNIX programming tools, especially
Your task in this assignment is to work within a team to write a C program that tests large non-negative integers for primality.
Large integers can be difficult to break into prime factors. Modern cryptographic methods depend on this fact. In the RSA public-key cryptography system, someone who wishes to receive messages publishes an integer
k that is the product of two large (e.g., 200-digit) prime integers
q. Anyone knowing
k can encode a message, but only the person who knows
q can decode the message.
How can you, the message receiver, find some large primes
q? The simplest way is to choose a random 200-digit integer, and test whether it is prime. But how do you test whether an integer
n is prime? You can try dividing by each prime integer up to
sqrt(n). But there are approximately
4 · 1097 primes less than
sqrt(10200), so at the rate of one per microsecond it would take you
1074 times the age of the universe to test
n in this way.
Fortunately, it is possible to learn that an integer is composite (not prime) without even learning its factors. This can be done in a more reasonable length of time. You can use the following mathematical fact about prime integers: for a prime integer
p and an integer
a in the range
1 ≤ a < p:
ap-1 mod p = 1
But for a typical composite
ac-1 mod c ≠ 1 for at least half the
You can test an integer
n for primality by choosing a random
a, raising it to the
(n-1)st power modulo
n, and seeing if you get 1. If you get something other than 1, then you know
n is composite. If for
a's you get 1, then the chance that
n isn't prime is about
2-k. So if you do, say, forty tries and always get 1, then
n is almost certain to be prime.
So, to make a primality testing program, you read in an integer
n, choose forty random
a's and compute
an-1 mod n. If you always get 1, you report "probably prime." Otherwise you report "composite."
Now the only problem is that an
int on the hats computers (when building via the
gcc217 command) can hold only a 32-bit (10 decimal digit) integer, not a 200-digit integer. So how do you do exponentiation and modulus on these big integers?
You should represent a big integer as an array of digits in the base
b. You are used to base 10, of course; but you should use base
b = 4294967296, since that's convenient on the computer. An
unsigned int can hold integers in the range 0..4294967295, so a 200-digit decimal integer is just an array of about 21
Big integers can be added and subtracted by the same rules that you learned in grade school (before they let you use pocket calculators). Remember carries and borrows? Just start at the low-order digit and work your way up.
To multiply, you can use a recursive approach based upon these mathematical rules:
b = 0 ⇒ a · b = 0 b even ⇒ a · b = (a · b/2) · 2 b odd ⇒ a · b = ((a · b/2) · 2) + a
where "/" is truncating integer division. (Recall that multiplying and dividing by 2 on a digital computer is easy.)
To implement the division and modulus operations, again you can use a recursive approach. These are some pertinent mathematical rules:
a < b ⇒ a/b = 0 rem a r < b ⇒ a/(2·b) = q rem r ⇒ a/b = (2·q) rem r r ≥ b ⇒ a/(2·b) = q rem r ⇒ a/b = (2·q) + 1 rem (r-b)
You also have the problem of exponentiating big integers. To raise
a to the power
k, all you have to do is multiply
a by itself
k times. The problem is that if
k = n-1, and
n is a 200-digit integer, this will take much longer than the age of the universe. But you can use the following identities:
k even ⇒ ak = (ak/2)2 k odd ⇒ ak = a · ak-1
But there's still a problem here. The integer
a is likely to be about 200 decimal digits, and if you take it to the
10200 power then you get an integer with more digits than atoms in the universe; this won't fit onto your computer. But you don't really have to compute
an-1; you have to compute
an-1 mod n. That is a much smaller integer, of about 200 decimal digits. You can use the following mathematical identity while doing the exponentiation:
(a · b) mod n = ((a mod n) · (b mod n)) mod n
Thus, you can keep all the intermediate results down to 400 digits (or 200, if you're clever during the multiply, but don't worry about that).
Create a program named
prime. The program should read a line from the standard input stream. The line should be a non-empty sequence of decimal digits, that is, ASCII characters in the range 0-9, terminated by an end-of-line mark. The program should convert the digit sequence to a big integer. To do that, the program can multiply values in the range 0..9 by powers of ten and sum them up.
The program should test the big integer for primality. Then the program should write to the standard output stream a line consisting of (1) the sequence of decimal digits, (2) a space, and (3) the message "probably prime" or "composite" as appropriate. The program should repeat until end-of-file of the standard input stream.
If the prime program reads 0 from the standard input stream, then the program should write to the standard output stream a line consisting of 0, a space, and the message "neither". Similarly, if the prime program reads 1 from the standard input stream, then the program should write to the standard output stream a line consisting of 1, a space, and the message "neither".
If the program reads a malformed line from the standard input stream, then it should write to the standard output stream a line consisting of (1) the sequence of decimal digits, (2) a space, and (3) the message "malformed". Then it should continue.
For example, if the program reads these four lines from the standard input stream:
204587985712111 000013 98457b458112 ⇐ blank line 55743769241161 0 1
then the program should write these four lines to the standard output stream:
204587985712111 probably prime 000013 probably prime 98457b458112 malformed malformed 55743769241161 composite 0 neither 1 neither
In addition to the
prime program, you also should write "testing programs," that is, programs that test your add, subtract, multiply, divide, etc. functions. You also might find it helpful to write a function that prints a given big integer in decimal. To convert a big integer to decimal format, you must start by calculating the last digit; just divide by ten and take the remainder. Your testing programs might find such a function handy.
Finally, you should develop a test data file, or perhaps several test data files, containing large non-negative integers that are known to be prime or composite. You then should run your
prime program on those files to verify that your program works properly.
Incidentally, the Linux
dc command might be helpful when developing your test data files. The Linux
bc command also might be helpful.
We will assign you to a software development team composed of a subset of students from your precept. Together with your teammates, your job is to write module interfaces. Then, as an individual, your job is to write module implementations (or partial module implementations) which, when combined with the module implementations (or partial module implementations) written by the other members of your team, define the
prime program and your testing programs.
Each team will have an adviser. Your team's adviser will be either your preceptor or some CS faculty member or student who has volunteered for the role. It is up to the team, with guidance from the adviser, to decide how to divide the programs into modules.
Each team also will have a grader. In many cases, the team's grader and adviser will be the same person.
Your team should choose three students to serve leadership roles:
The Manager should be the team's administrative leader. The Manager's job is to (1) determine which team members will write which (partial) module implementations, and (2) facilitate communication among members of the team and between the team, your adviser, and your grader.
The Architect should be the team's technical leader. The Architect's job is to be the ultimate authority on all technical decisions. The Architect should lead the technical discussions that occur during precepts.
The Builder has two jobs: (1) serve as the team's "repository consultant," that is, advise teammates on how to use the team's code repository (as described below), and (2) serve as the team's "makefile consultant," that is, advise teammates on how to create their makefiles (as described below).
The choice of Manager, Architect, and Builder are subject to the adviser's approval. The Architect must have done well (that is, must have scored in the high-90s) on the course's first three programming assignments.
The mapping of team members to (partial) module implementations also is subject to the adviser's approval. The mapping should be balanced: the Architect, Manager, and Builder should write one or more (partial) module implementations, and other team members should write two or more (partial) module implementations. The work assignments should be redundant: you should assign at least two team members to the development of each (partial) module implementation. Modules that are specific to testing programs count as "(partial) module implementations." So do test data files.
Your team will need to communicate to write module interfaces, and to decide which team members will write which (partial) module implementations. At least two precepts will be devoted to team meetings. We anticipate that your team will need to schedule additional meetings outside of precept time.
Your team also should communicate by e-mail. You should create an e-mail address list containing the addresses of all members of your team, your team's adviser, and your team's grader. Use that list for all communication between your teammates. (Your adviser needs to be part of your e-mail conversations to provide advice. Your grader needs to be part of your e-mail conversations to evaluate your work properly.)
Develop on hats. Use
emacs to create source code. Use
make to build. Use
gdb to debug. Use
meminfo to check for dynamic memory management errors, and to make sure that the program frees all dynamically allocated memory. Use
critTer to check for stylistic errors.
Your team will have a code repository. Your team code repository should contain the module interface (.h) files that you and your teammates write together. It also should contain the (partial) module implementation (.c) files that you and your teammates write individually. Each (partial) implementation file in the repository should have a name that is prefixed by the initials of the team member who wrote it. For example, your repository might contain files named AA_file1.c or BD_file2.c.
Use Subversion to implement your code repository. The supplementary document entitled Version Control with Subversion describes the code repository that we have created for your team and how to use Subversion (via the
svn command) to manipulate it.
The normal assignment collaboration rules, as described in the course "Policies" Web page, apply to this assignment. For this assignment these additional rules apply.
You are allowed to:
Read a teammate's source code for the purpose of editing your source code -- if your teammate's source code is for a (partial) module implementation that you are not writing.
Read a teammate's source code for the purpose of helping your teammate to edit his/her source code -- if your teammate's source code is for a (partial) module implementation that you are not writing.
Read, edit, and share makefiles with your teammates. However by the time you submit your makefile you are responsible for knowing how it works.
You are not allowed to:
Collaborate with a member of another team in any way.
Read a teammate's source code for a (partial) module implementation that you are writing. In fact it would be considered an act of academic dishonesty even to download a teammate's source code file for a (partial) module implementation that you are writing. If you accidentally download such a file, then delete it immediately and tell your preceptor what happened via e-mail.
Edit a teammate's source code.
Regrettably, we must bring violations of the collaboration rules to the attention of the University's Committee on Discipline. Please be careful!!!
The assignment covers a two-week (that is, a four-precept) period. These are the deliverables throughout that period:
The Manager should create a team e-mail list, and send a "startup" e-mail to that list. It should indicate the names of the Manager, Architect, and Builder. Remember that the team's adviser and grader should be members of the e-mail list.
The Architect should place in the team's code repository a complete set of interface (.h) files. You can change the interface files subsequently, with notification and justification sent to the team's e-mail list.
The Manager should create a document named
WorkAssignments in the team's code repository. The document should indicate which team members are writing each (partial) module implementation.
The Manager should create a document named
StatusReport in the team's code repository. The document should contain a description of work completed so far by each team member. The Manager may compose the descriptions, or may collect them from the team members. One or two sentences per team member is sufficient.
You, as an individual, should submit:
Your team's module interface (
The (partial) module implementation files that you personally wrote. It should be clear from the initials prefacing the file names which of the (partial) module implementation files you personally wrote. You also should state in your
readme file which (partial) module implementation files you personally wrote.
A subset of the (partial) implementation (
.c) files written by your teammates. The combination of (1) that subset, (2) your (partial) implementation files, and (3) your team's interface files should allow your grader to build a complete
prime program and your team's testing programs.
The first dependency rule in your
makefile should have the target "all". That rule should cause
make to build your
prime executable file and your team's testing executable binary files. The remaining dependency rules should build (1)
.o files using the
.c files that you submitted, and (2) executable files using the
.o files. Your
makefile should encode file dependencies to allow for partial builds.
Your readme file should contain:
A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course "Policies" web page.
A one-paragraph description of the role that you played within your team and the work that you accomplished. In particular you should state which (partial) module implementation files you wrote.
(Optionally) An indication of how much time you spent doing the assignment.
(Optionally) Your assessment of the assignment.
(Optionally) Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.
You should submit your individual work electronically on hats using these commands:
submit 4 implementationFilesThatYouWrote submit 4 complementaryImplementationFiles submit 4 interfaceFiles submit 4 makefile submit 4 readme
Remember that testing programs and test data files count as "implementation files."
Suppose a program contains a module named
m, consisting of interface file
m.h and implementation file
m.c. Furthermore, let's say that
m.c is large -- so large that it would be unreasonable for one programmer to write all of it. To facilitate multi-programmer development, the team could split the implementation into two files:
m2.c. Then some team members could work on
m1.c, and others independently could work on
Further suppose, however, that both
m2.c need to use the same data types. It would be stylistically bad to place the definitions of those types in
m.h; doing so would reveal the data types to clients, thus violating encapsulation. It would be stylistically horrible to define the data types redundantly within both
m2.c; redundancy introduces the possibility of inconsistency, especially over time. So where should the shared data types be defined?
One solution is to create another file named
mprivate.h is a "private" header file for the
m module which defines the data types used by
mprivate.h, and thus have access to the definitions of the data types that they need. But
mprivate.h is not advertised to clients, and so encapsulation is not violated.
For this assignment, your team should create "private" header files as required.
Your grade for the assignment won't be based primarily upon the efficiency of your code. Nevertheless, your code should be reasonably efficient. As general guidelines...
When built normally (that is, using
gcc217 commands with no flags), and when performing forty tests, our reference program consumes approximately 29 seconds of hats CPU time to determine that this integer (split into multiple lines for readability):
6864797660130609714981900799081393217269435300 1433054093944634591855431833976560521225596406 6145455497729631139148085803712198799971664381 2574028291115057151
is "probably prime".
Incidentally, it's possible to speed up your program by specifying flags to the
-O3 (that's uppercase "oh", followed by the number "3") flag commands
gcc to optimize the machine language code that it produces. When given the
-O3 flag, gcc spends more time compiling your code so, subsequently, the computer spends less time executing your code.
-D NDEBUG flag commands
gcc to define the
NDEBUG macro, just as if the preprocessor directive
#define NDEBUG appeared in the specified .c file(s). Defining the
NDEBUG macro disables the calls of the
assert macro within your code.
When built with the
-D NDEBUG flags, when given the aforementioned integer, and when performing forty tests, our reference program consumes approximately 20 seconds of hats CPU time to reach its "probably prime" conclusion.
Also incidentally, you can use the Linux
time command to determine how much CPU time your program consumes:
time prime < somefile
As described in lectures and precepts, you should use the opaque pointer type technique to make sure that clients of your objects cannot access the fields of your objects. If you don't use that technique, then you should discuss the matter ahead of time with your adviser and your grader, and should provide a thorough justification, perhaps based upon efficiency, in your
It couldn't hurt for you to understand the algorithms used by the modules that you don't have to implement -- if only because exam questions could be based upon those algorithms.
Your grade for the assignment will have a team component and an individual component. The team component will be based upon the quality of your team's interface files. The individual component will be based upon the quality of your individual (partial) implementation files.
We will grade your work on quality from the user's and programmer's points of view. From the user's point of view, your code has quality if it behaves as it should, that is, if it accurately classifies given integers as prime or composite, and does so with reasonable efficiency.
From the programmer's point of view, your code has quality if it is well styled and thereby easy to maintain. In part, good style is defined by the
critTer tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document. The more course-specific style rules listed in the previous assignment specifications also apply. Proper function-level and file-level modularity will be a prominent part of your grade. To encourage good coding practices, we will deduct points if
gcc217 generates warning messages.
The quality of your participation during the precepts devoted to team meetings will affect your class participation grade for the course substantially.
This assignment was written by Andrew Appel with contributions by Robert M. Dondero, Jr.