COS 333 Assignment 2: Feeping Creaturism (Spring 2008)

Due midnight, Friday, February 22

Thu Feb 21 20:25:39 EST 2008

[added 2/18] If you run Leopard, gcc may complain about the symbol isblank defined in b.c. You can fix this by changing the name of the function in b.c; it won't affect anything else.

Don't forget to look at the newsgroup if you have a question; it may already have been asked and answered. And if not, mail to cos333@lists will get the ball rolling.

One of the most common experiences for a programmer coming into a new job or working with some open-source code is to have to fix a bug or make a small change in a large unfamiliar program. This requires the ability to quickly find the relevant parts of the program and change them in a minimal way, while ignoring irrelevant parts and being sure not to break anything.

This assignment is an exercise in adding new features to an existing program that we have talked about in class, but whose innards are almost certainly unfamiliar. To get started, become familiar enough with AWK that you know in broad outline what it does; this awk help file might help. Then download the source from the AWK home page and skim the source code to see how it is implemented..

Your specific tasks are to

For this assignment, you can get a very long way merely by finding code that already does more or less what's needed and is already in the right place and right form; much of the job amounts to intelligent cut and paste, using grep to locate places that might be relevant.

First, the AWK grammar needs a new rule that specifies the syntax of an until statement and creates the right kind of node in the parse tree. There's a new keyword that lexical analysis has to recognize and some new code must be added to run.c to provide the semantics, though it's mostly identical to old code.

For fields, the grammar does not change, but lexical analysis has to recognize the name of the new built-in function, and you have to write that function and include it with the other built-ins.

For the new comment syntax, you have to fiddle the lexical analysis in lex.c.

Automate your testing as much as possible. Create a shell file awk.test (for ksh and bash, not tcsh). There should be at least a dozen tests that ensure that your features are properly tested. The file awk.test should be self-contained, requiring no input from a user and generating its own test data somehow. It should produce only 1 line per test if the tests work, and one additional line per failure, of the form

	Error: test failed [further text...]
if a test fails. It should assume that the program being tested is named a.out and is in the current directory. For example, this file contains a test of until and a test of fields():
	#!/bin/bash
	echo 'test 1: count down from 1 to 0; no braces'
	a.out '
        BEGIN {
                n = 1
                until (n == 0)
                        print n--
        }' >temp1
	echo '1' >temp2
	cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test failed: count down 1 to 0' 2>&1

	echo 'test 2: print fields 2 & 3, comma separated'
	echo 'a bc def ghij' | a.out '
	{
		print fields(2, 3, ",")	// a comment in the new syntax
	}' >temp1
	echo 'bc,def' >temp2
	cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test failed: fields(2,3)' 2>&1
This file is here; you might find it easiest to just edit it. Your awk.test should contain further tests as well: the echo line should state what each test does. Don't forget to test for syntax errors.

Here is some other advice:

For calibration, I added about 10 brand new lines for until, spread over 5 or 6 files. Adding // comments required 4-5 new lines. I added about 35 lines for fields in run.c, plus a couple of other lines in two other files. If you're doing a lot more than this, you are likely off on the wrong track. If you get stuck, here are some hints that might help. No penalty for using them, no reward for not using them. You will probably find it most instructive to try hard before looking at the hints.

We will assess the quality of your test cases, so make sure that they are correct and meaningful; it would be nice if we could run your tests over everyone else's implementations. We will also make sure you didn't break something else, so be careful of that. Don't make unnecessary or irrelevant changes in any file. In particular, do not replace newlines by CRLF (Windows), and do not reformat the code.

Submission

Updates to significant pieces of software are often distributed as "patch" files, that is, the set of changes necessary to convert the old version into the new version. For Unix and Linux source, this is usually done by running diff to create a file of editing commands on the system where the changes were made, and running patch with that file as input on the system where the changes are to be applied.

For this assignment, you have to submit a patch file awk.patch that contains your changes. The easiest way to do this is place the original program in one directory, say old, and the new version in new. Clean out all the junk like Yacc-generated files (ytab*), proctab.c, and binary files. Then in the parent of these directories, say

     diff -ur old new >awk.patch
The recipient (the grader in this case) will say, on his/her system,
    cd old
    patch --verbose --backup <../awk.patch
to update the old version with your changes, in place.

Try the patch process yourself to be sure it works right before you submit. Back up your work before you start experimenting!! My patch file is about 180 lines long; if yours is a lot bigger, you are probably including something inappropriate like a Yacc output file, or you have managed to create CRLFs.

When you're all done, submit using the command

   ~cos333/bin/submit  2  awk.patch  awk.test

PLEASE follow the rules on what to submit. Note that you are not submitting the whole awk distribution, just the patch file. It will be a help if you get the filenames right and submit exactly what's asked for. Thanks.