COS 226 Programming Assignment Checklist: Redundancy Detector

Frequently Asked Questions

Can the two substrings overlap? Yes. For example, issi is the longest repeated sequence in the string mississippi.

Can the longest string include space characters? Certainly.

What if there is a tie for the longest repeated substring? Print out whichever one you want. For example, if the string is abcdabcedab, print out either abc or dab.

How do I implement isspace() and isprint()? Don't. Just include <ctype.h>. There are many useful (and portable) functions in this library.

What's the maximum string length? 10 million characters. It's fine to declare a large static array and avoid dynamic memory management. Make it global (file scope) so that it is allocated "on the heap" instead of "on the function call stack."

#define MAXN 10000000
char text[MAXN + 1];

What does the strcmp() function return? It returns 0 if the two strings match, or a positive or negative integer depending on which string is lexicographically larger. This can be a fit counter-intuitive. Note also that strcmp() does not necessarily return 0, +1, or -1.

Input, Output, and Testing

Input and output. Here are a number of sample input files.

Reference solutions.   For reference, we provide executable code for our solution in Windows, Solaris, and OS X. We encourage you to develop a faster program! It is certainly possible (although, we expect, not easy) to beat our solution. You can still earn most/all of the credit even if your method is not quite as efficient. However, you will incur a performance penalty if your algorithm has a quadratic average-case running time.

Performance.   You should strive for fast and efficient algorithms (but there is no need or reward for over-optimizing with low-level hacks). Also, don't blindly throw away space in the quest for more speed. (For example, 256-way tries would be pretty wasteful here!)

Timing and testing.   Unix users may find the shell scripts verifyit.csh and timeit.csh useful for automating the testing and timing.

Submission and readme

We will compile your program with "gcc *.c" so you are free to organize your program as you see fit. As usual, we encourage and reward sound design principles.

Here is a template readme.txt file. Please answer all of the questions.

Getting started

  • Download the directories redundant and textfiles These directories are mirrored on arizona at /u/cs226/pub/. They contain reference solutions and sample input files.

  • Implement a brute-force approach.
  • Read in the string text[].
  • For all i < j, compare the substring starting at offset i (in C, this substring can be referenced by text + i) with the substring starting at offset j, and compute the longest common prefix of the two substrings with a variant of strcmp().
  • Print the longest substring from your exhaustive search.
  • Brainstorm for better algorithmic ideas, perhaps consulting your preceptor for advice. Hint: you might generate some promising ideas after reading Sedgewick 15.5.

  • COS 226 Home Page
    Kevin Wayne
    Last modified: March 20, 2002