Due midnight, Sunday, February 10
Sun Feb 10 20:18:57 EST 2013
This assignment is a warmup to make sure you're comfortable with C, that you know how to compile and run C programs, and that you can submit code via the COS 333 submission mechanism. And there's one other thing -- your code has to work. Since the class is so large, grading will be entirely automatic, your code won't likely be looked at by humans at all, and your grade will be largely determined by how well you do on our tests. However, we most strongly recommend that you remember and put into practice all that you learned about good style in previous courses. It will help you greatly when your code for this assignment doesn't work.
There's one exception to the hands-off approach. We will routinely run all submissions through MOSS, which is extremely effective in detecting programs that are similar. If your program is deemed interesting by MOSS, we will look at it very carefully, and if appropriate will send our opinions to the Committee on Discipline. Please do not copy anyone else's work or let anyone copy yours; it's simply not worth it.
Although it is not part of this assignment, Assignment 1 (coming up real soon) will require you to rewrite the regular expression code in Java, and to develop a set of good test cases and a program to run them. So the more thought and effort you put into this assignment, the easier the next one will be.
The assignment is to extend the regular expression code in Section 9.2 of The Practice of Programming; the code itself can be found here. Do not change anything else in the program beyond what is needed for this assignment.
To begin, read the description in the book carefully and be sure you understand it. You might find it helpful to read this paper too. Then download the code and modify match and the other functions so they match a different class of regular expressions.
Use ctype.h functions like isdigit to determine character types. The six shorthands listed above are the only special ones. If a \ is followed by any other character, that character is taken literally, so, for example, x\+\+ matches a literal x followed by two plus signs. If the last character of the regular expression is a bare \, the meaning is undefined, and we won't test that except to make sure you don't get a core dump there. But notice that \\ is a valid regular expression.
A ^ not at the beginning and a $ not at the end are also taken literally, and so are *, + and ? if they appear in a context where they are not metacharacters. Think about the regular expression ***: the first and third *'s are not metacharacters. The TPOP code works this way already; if you don't change this part, it will still work.
You have to submit a single C file regexp.c that contains only your function definitions. The only name that should be visible outside your regexp.c is match, so everything else must be declared static. Your code in regexp.c does not print anything ever, nor does it ever exit or abort; it is a grievous error to do any of these. Do not use or include eprintf in your submission.
We will test your code by compiling and linking your regexp.c with our own driver (a straightforward modification of grep.c), using this declaration only:
int match(const char *re, const char *s);
and this compile command:
gcc -pedantic -ansi -Wall [our]grep.c [your]regexp.c
Your code must compile and link without change and without warning messages. Note that match's arguments are const; your code must not change their contents.
For calibration, my regexp.c is about 100 lines long with ordinary formatting, blank lines, etc.; if your solution is a lot longer than this, you are probably off on the wrong track.
The file /u/cos333/asgn0/refgrep on penguins is a reference implementation of grep using our regexp.c; you can use it as a check to confirm that you're getting the same answers, but in cases of disagreement, the definition of regular expressions determines who wins, not the reference implementation. (We're human too.)
Don't get wrapped up in language lawyering; the task is to convert the original code to our definition, not to invent your own regular expresion definition. Follow the original code. Do not under any circumstances create your own matching algorithm or reinterpret what a regular expression is!
It is always best to attack such a task in stages, making sure each one works before the next one is started. I started with the + metacharacter, then ?, then added the shorthands, then isolated matching into regexp.c, testing after each stage.
Define suitable functions rather than writing everything in one routine. When you add code for + and ?, use separate functions but call on ones that already exist where possible. Process the shorthands first into a better representation before doing any matching; use a single function to handle shorthand matching. (Hint: this program only has to work with 7-bit ASCII so you can use the 8th bit to encode metacharacterness.) Keep it simple: this program does not need to be fast, or save memory, or be the slightest bit clever. As is true with all programming, trying to be fast and/or clever is a recipe for disaster. If things go wrong, use printf or master a debugger, or both. Here's my one-page cheat sheet on gdb, and the much more extensive GDB reference and tutorial from COS 217.
Talking over the precise meaning of the assignment with friends, the TAs, and Piazza is encouraged. You must write your code yourself, however, so once you start coding, you may no longer discuss programming issues with other students.
You are strongly advised to make up good test cases to make sure that your program works. Assignment 1 will require you to submit test cases, so if you do this now, life will be easier in the future.
Submit your regexp.c using the CS Dropbox. [updated 2/10:] You can submit multiple times; we'll grade the last one that arrives. The check script will give you some indication of how well your program appears to work when you submit so you can try again.
We'll probably grade this out of 100 somewhat along these lines:
Please follow the rules on what to submit. It will be a help if you get the filenames right and submit exactly what's asked for. Thanks.