COS 226 Programming Assignment 5 - Checklist


Checklist 5: Repeat Search

The following is a checklist which provides a summary of the assignment. This is meant only as a supplement; please read the original assignment description.


Frequently Asked Questions:

If there are three or more equal substrings, do I have to print out the index of the first two?
No. Feel free to print out any two, so long as those two form an example of a longest matching pair.

Can the longest string include space characters?
Certainly. You should be working over a 27-character alphabet.

What exactly is a character access?
Assume that the array of characters is sitting in some room under lock and key, and that every time you need to know a single character, you pay to send someone into the room to retrieve it. More concretely, assume the raw data is in an array a[]. How many times in the execution of your program is some expression a[foo] evaluated?
So if you have a line if (a[i]==a[j]), you have two accesses. If you have a line if (a[i]==' '), you have made one access. If you have a line c = a[i] for a local variable c, you have made one access (and you can continue to use c without paying for accesses so long as the variable remains unchanged).

Note: the character access count is one way for us to profile your efficiency, but the real test will be your running time, so don't waste time just to artificially cut down the access count.

Ambiguities: One possible source of ambiguity is whether the two common strings are allowed to share characters. The answer is yes. That is, if the input string is "aaaaab", I would say the longest repeated string is "aaaa" beginning at either character 0 or character 1.

Input Format: The input is one long line which consists entirely of lower case letters and spaces. Since the line might be long, you should read it one character at a time, using the loop while ((a[N++] = getchar()) != EOF);. warning: The value N at the termination of the loop is actually one plus the true number of characters (the EOF got counted, but you probably don't want to consider it as an input character).

Output Format: As said in the assignment, your output should consist of 4 integers: length of match, starting position of first substring, starting position of second substring, and number of characters accessed.

readme: Make sure you explain what you do and why. Give estimates of the time/space costs which might be required, and what size problems you feel you are able to solve in a reasonable amount of time.

Performance and Sample Input We would like to provide you with several interesting inputs. Also, for each input, we will tell you how long a program written by 1999's head TA (Michael Goldwasser) takes to solve the problem (while running on his PC at home). It is certainly possible that you may be able to beat Michael's program, and it is also possible that you will get most/all of the credit even if your method is not as efficient. In this table we report both the running time, as well as the average number of character accesses done per input character.
Inputfile Original Source N M's time(sec) M's avg access
princeton A Packet article about the town 7527 0.21 37
amendments Constitutional Amendments 17835 0.53 92
y2kintro recent Senate report on Y2K 20450 0.42 44
baby How baby's learn language 21362 0.46 40
manifesto Communist Party Manifesto 70661 1.92 54
muchado Much Ado about Nothing 115489 2.74 44
aesop collection of Aesop fables 182188 4.55 46
starr The Starr Report narrative 221060 7.18 69 (warning: explicit language)
lilwomen Little Women 963127 27.20 48
mobydick Moby Dick 1137324 31.90 46
[Note: all of these files are also located in ~/cs226/prog5_files/ on the phoenix machines, so there is no need to download them if you are running on those machines]
cos226 Class Page
rs@cs.princeton.edu
Last modified: March 26, 2001