CS 333 Assignment 1: s/ed/grep/

Due Thursday, February 10, 11:59 PM.


Tue Feb 1 07:32:58 EST 2000

By now you've heard (or would have, if you had been at the first lecture) the story of how Ken Thompson, best known as the father of Unix, created the original grep command in a single evening, starting with the Unix editor ed.

Your assignment is to do the same thing, although with three advantages and one disadvantage. The advantages are that you have a week, you know what grep is and does, and ed is now written in C, not PDP-11 assembler as it originally was. The disadvantage? You're not Ken Thompson.

The file ed.c contains the source for the 10th Edition Unix version of ed, which is 1700 lines long. It dates from about 1989, but is typical of Unix code from the mid 1970's: terse, tight, efficient, and largely uncommented. Your version of grep must use that code for regular expression processing. You will have to throw away a fair amount, since your grep.c will not need more than about 400 lines; you should not need to add more than about 30 lines of your own code.

It is almost always best to attack such a task in stages, making sure each one works before the next one is started. Recompile frequently as you remove code to make sure that what remains is syntactically correct. Be sure to keep a backup of the last known correct state, or learn how to use undo in your text editor. Make sure it handles stdin before handling filename arguments. Make sure the code works with its goto's before worrying about removing them. Try to modularize sensibly; in particular, it would be good if your regular expression code were easily separable so that it could be used in some other program later.

This is typical of what you might encounter in a new job -- you're asked to make a small change in a big program, and most of the challenge is finding the right thing to change, while making sure that nothing else breaks. Thus, most of this assignment is an exercise in figuring out what parts of the code are irrelevant and discarding them. If something has nothing to do with regular expressions, it can almost surely go. If you're unfamiliar with ed, see ed(1) or Appendix 1 of The Unix Programming Environment, which contains a brief tutorial.

Hints

It can be fun and instructive to puzzle this out entirely on your own. But if you need a hint or two, look here. There's no penalty for doing so, nor extra credit for not looking.

Submission

Submit your single source file grep.c using the command
	~cs333/bin/submit 1 grep.c
You may submit as often as you like; only the last one will be saved.

PLEASE follow the rules on what to submit -- get the filenames right and submit exactly what's asked for. This is a big class, and it will be hard to manage if we can't automate most of the processing. Remember that a happy TA gives better marks than an unhappy one. Keep your TA happy.

Historical note

I received this mail from Ken Thompson when this problem was previously assigned in CS333:
From ken Sun Jan 17 13:53:50 EST 1993
looks like a very good assignment.
it stresses reuse and cleanliness rather than grunt.