COS 333 Assignment 2: Feeping Creaturism

Due midnight, Friday, February 24

Tue Feb 14 09:01:37 EST 2017

One of the most common experiences for a programmer coming into a new job or working with some open-source code is to have to fix a bug or make a small change in a large unfamiliar program. This requires the ability to quickly find the relevant parts of the program and change them in a minimal way, while ignoring irrelevant parts and being sure not to break anything. In this assignment, you're going to work with the AWK programming language and with the current version of its original implementation.

AWK is an elderly program by most standards; it was first implemented in 1977, about the time your parents were in elementary school. It shows all the signs of having been maintained over decades -- grimy code, unhelpful comments, violated abstractions, utterly mysterious constructs, and written in a language that's not exactly trendy. That said, after literally 40 years AWK is still a core Unix tool, and in terms of payoff divided by time to learn, hard to beat.

Specification

This assignment is an exercise in adding some new features to an existing program that we have talked about in class, but whose innards are almost certainly unfamiliar. To get started, become familiar enough with AWK that you know in broad outline what it does; this awk help file might help. Then download the source from Github and skim the source code to see how it is implemented.

Your specific tasks are to

How to proceed

For this assignment, you can get a very long way merely by finding code that already does more or less what's needed and is already in the right place and right form; most of the job amounts to intelligent copy and paste, using grep to locate places that might be relevant. (Blind copy and paste is not sufficient, however; pay attention to comments in the code.)

For until, the AWK grammar needs a new rule that specifies the syntax of an until statement and creates the right kind of node in the parse tree. Lexical analysis has to recognize the new keyword and a new function has to be written and added to the program to provide the semantics. Figure out how while works, then copy, paste and edit.

For the new for loop, the AWK grammar needs a new rule that specifies its syntax and creates the right kind of node in the parse tree. The cleanest way to do this is to make a minor change in the rule for the for with only one index, and then both rules can call the same semantic function. You will need to modify that function in a straightforward way as well.

For trim, there are no grammar changes but there is a new function name to be recognized, and you have to add new semantics in the right place in run.c. Figure out how some analogous one-argument built-in function works, then copy, paste and edit.

For the new comment syntax, you have to fiddle the lexical analysis in lex.c. Figure out how # comments work, then copy, paste and edit.

For the last part, automate testing as much as possible. Create a bash shell file awktests.sh. There should be at least ten tests that ensure that your features are properly tested. The file awktests.sh should be self-contained, requiring no input from a user and generating its own test data somehow. It should produce no output if the tests work, and one line per failure, of the form

	Error: test N failed
if the N-th test fails. It should assume that the program being tested is named a.out and is in the current directory. For example, this file contains one minimal test of until and the comment convention:
	#!/bin/bash
	# test 1: count down: 3, 2, 1, 0; no braces
	./a.out '
	BEGIN {
		n = 3	// a comment in the new syntax
		until (n < 0)
			print n--
	}' >temp1
	echo '3
	2
	1
	0' >temp2
	cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test 1 failed' 2>&1

Take time to understand what this test does. It runs a.out with the test program enclosed in quotes, and puts the output in temp1. It echoes the correct output to temp2. It compares the two temp files, discarding output on both stdout and stderr, and, if the comparison fails (that is, the files have different contents) it displays the error message on the standard output of the shell script.

By the way, you can often use programs like grep, wc and diff with suitable arguments for approximate matching of results, rather than strict comparators like cmp. For instance, diff -w ignores all white space.

Your awktests.sh should contain at least ten tests, with comments to explain what each test does. Don't forget to test for syntax errors.

Advice

Talking over the precise meaning of the assignment with friends is encouraged. You must write your code yourself, however, so once you start coding, you may no longer discuss specific programming details with other students.

It is almost always best to attack such a task in stages, making sure each one works before the next one is started. First compile what you downloaded, before you make any changes. (In spite of our best efforts, you might have to fiddle the makefile to get the right arguments for Yacc. The line

	YACC = yacc -d 
works for me on CS and OIT computers and my own Macs, but your mileage may vary; try one of the other options and report serious problems.)

Then make a tiny change, recompile, and test:

	until (you're done) {tiny change; recompile; test}

I think it's easiest to do the new comment syntax first, since that requires only a simple change in one file. Then do trim, which requires about 20 new lines in 3 files. Then do until, which adds about 20 lines (most copied and pasted directly from existing ones) in 5 files. Finally, do the two-index for loop, which requires about 15 lines in 2 files.

If you're doing a lot more that this, you are off on the wrong track. If you get stuck, here are some hints that might help. No penalty for using them, no reward for not using them. You will probably find it most instructive to try hard before looking at the hints.

Prepare some good test cases ahead of time. Test early, test often, know what you are testing. Automate the testing as much as you can.

We will assess the quality of your test cases, so your tests should be good ones. We will also make sure you didn't break something else, so be careful of that. Don't make unnecessary or irrelevant changes in any file. In particular, do not replace newlines by carriage returns (Mac) or CRLF (Windows), and do not reformat the code. And try to make your tests correct; it would be nice if we could run your tests over everyone else's implementations.

It may help to learn gdb, though I think that in most cases you will do much better by thinking hard and using printf. If you prefer gdb, this short gdb help file might help you get started or remind you of basic commands.

No core dumps or seg faults, please.

Submission

Updates to significant pieces of software are often distributed as "patch" files, that is, the set of changes necessary to convert the old version into the new version. For Unix and Linux source, a patch file is usually made by running diff to create a file of editing commands on the system where the changes were made, and running patch with that file as input on the system where the changes are to be applied.

For this assignment, you have to submit a patch file awk.patch that contains your changes. The easiest way to do this is to place the original program in one directory, say old, and the new version in another directory new. Clean out all the junk like Yacc-generated files (ytab*), proctab.c, binary files like a.out and *.o, and editor backup files whose names end in ~ or .bak. (The make clean rule in the makefile doesn't get rid of all of ytab*.) Then in the parent of these directories, say

	diff -ur old new >awk.patch
The recipient (the grader in this case) will say, on his/her system,
	cd old
	patch --verbose --backup <../awk.patch
to update the old version with your changes, in place.

Try the patch process yourself to be sure it works right before you submit. Back up your work before you start experimenting!! My patch file is around 220 lines long; if yours is a lot bigger, you are probably including something like a Yacc output file or you have somehow changed line feeds or tabs in your source files. Fix those up and try again.

When you're all done, submit the two files awk.patch and awktests.sh, using the CS dropbox link dropbox.cs.princeton.edu/COS333_S2017/asgn2.