Christopher Moretti
Department of Computer Science
Princeton University

Assignment 2: Uncle Brian's Spaghetti

Due: 23:59 Friday 2/19/2016

Preface

One of the most common experiences for a programmer coming into a new job or working with some open-source code is to have to fix a bug or make a small change in a large unfamiliar program. This requires the ability to quickly find the relevant parts of the program and change them in a minimal way, while ignoring irrelevant parts and being sure not to break anything.

 

Specification

This assignment is an exercise in working with a program that we have talked about in class, but whose innards are almost certainly unfamiliar. To get started, become familiar enough with Awk to read, and perhaps write, basic programs; this Awk help file might help. Then download the source from the Awk Github and skim the source code to see how it is implemented.

Your first new feature is variant of the for loop that is loosely based on the indexed loops in languages like Fortran and Basic:

for (var from expr1 to expr2 by expr3) {
          statements
}

This repeats the statements in the body with values ofvar running from expr1 to expr2 inclusive in steps of expr3. (The klunky syntax is so that it will fit smoothly into the existing grammar, which is fragile.) The by clause is optional; if omitted, the step size is 1. Negative values of by are legal.

The values of the expressions are fixed when the statement begins; statements within the loop do not affect the limits or the step size, and changes to the index variable within the loop do not affect the iterations either. Other than that, the behavior should be the same as:

for (var = expr1; var <= expr2; var += expr3) {
          statements
}

(with >= instead if expr3 is negative). Thus, for example,

for (i from 1 to 10) { print i } 

prints 1 through 10, while

for (i from 1 to 10 by 2) { i = 0; print i }

prints 0 five times. (This is the simplest of several plausible designs; please implement exactly this even if you think something else is more sensible.)

 

The second new feature is // comments as in C++, Java, etc. This will require modifying the lexical analysis. Be sure to continue to allow the existing # comments as well.

Your third task is to fix a bug -- this Awk doesn't complain if you try to read a directory, and arguably it should. Modify the program so that trying to read a directory as an input file is an immediate fatal error. The error message must include exactly this string:
      filename is a directory
so we can test your code mechanically. For example, the program
      a.out '{print}' .
should produce the error message:
      a.out: . is a directory.

Create enough test cases to exercise your code thoroughly. Specifically, create a shell file awktests.sh that contains at least twenty tests that ensure that your features are thoroughly tested. The file awktests.sh must be self-contained, requiring no input from a user and generating its own test data somehow. Each test should produce no output if the test works, and one line if a test fails, of the exact form:

	    Error: test N failed [further text...]

It should assume that the program being tested is named a.out and is in the current directory. Each test should have a comment to explain what that test does. Don't forget to test for boundary conditions, and that your new features don't break the existing features they're build upon. Also don't forget to test syntax errors -- these tests will succeed if the code fails in the correct way, so you will need to be a bit deft in your testing file to accomplish this while maintaining the output requirements.

Here is a minimal example file to build from, with a single test checking that the standard awk comments (still) work:

#!/bin/bash

# test 1: test old comments
./a.out '
	BEGIN {
                m = 2   # end of line old comment
		#m = 3	# start of line old comment
                print m # // # // # # # // # # //
	}' >temp1
echo '2' >temp2
cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test 1 failed' 2>&1
rm -f temp1 temp2

 

Advice

It is almost always best to attack such a task in stages, making sure each one works before the next one is started. First compile what you downloaded and test it. Then make a tiny change, recompile, and test again.

Prepare some good test cases ahead of time. Test early, test often, know what you are testing. Automate the testing as much as you can.

For this assignment, you can get a very long way merely by finding code that already does more or less what's needed and is already in the right place and right form; most of the job amounts to intelligent copy and paste, using grep and your favorite text editor to locate places that might be relevant.

For the new kind of for statement, the Awk grammar needs two new rules (parallel to existing rules for for) that specify its syntax and create the right kind of node in the parse tree. Use the same syntactic category for expressions as in the existing for statement.

Lexical analysis has to recognize the three new keywords from, to and by, and their syntactic categories have to be added to the grammar. A new function has to be written to provide the semantics of the new for statement; it will be similar to the function that handles the existing for.

For the new comment syntax, you have to fiddle the lexical analysis in lex.c.

For detecting directory access, there are no grammar changes but you have to add one function and direct all file opens to it. Use the stat system call to determine whether a file is a directory.

For our version, there are approximately 50 additional lines for the for statement, spread over 6 source files. Testing directories added less than 10 lines, plus changing a handful of function calls in main.c, run.c and lib.c. Comment stripping added about 5 lines to lex.c. If you're doing a lot more, you are off on the wrong track. If you get stuck, here are some hints that might help. No penalty for using them, no reward for not using them, but you will probably find it most instructive to try hard before looking at the hints.

Make sure you don't break anything. Don't make unnecessary or irrelevant changes in any file. In particular, do not replace newlines by carriage returns (Mac) or CRLF (Windows), and do not reformat the code. And try to make your tests correct ... for one, because the tests themselves will be automatically evaluated, but also because it would be nice if we could run your tests over everyone else's implementations.

 

Submission and Evaluation

Updates to significant pieces of software are often distributed as "patch" files, that is, the set of changes necessary to convert the old version into the new version. For Unix and Linux source, a patch file is usually made by running diff to create a file of editing commands on the system where the changes were made, and running patch with that file as input on the system where the changes are to be applied.

For this assignment, you have to submit a patch file awk.patch that contains your changes. The easiest way to do this is to place the original program in one directory, say old, and the new version in new. Clean out all the junk like Yacc-generated files (ytab*), proctab.c, and binary files like a.out and *.o. (The make clean rule in the makefile doesn't get rid of all of ytab*.) Then in the parent of these directories, say:

diff -ur old new >awk.patch 

To update the old version with your changes, in place, the recipient (the grader in this case) will say, on his/her system:

cd old
patch --verbose --backup <../awk.patch  

Try this end-to-end patch process yourself to be sure it works right before you submit. Back up your work before you start experimenting!!

The patch file for our implementation is just under 300 lines long; if yours is a lot bigger (say over 1000 lines), you are probably including something like proctab.c or a Yacc output file, or you have somehow changed line terminations or tabs in your source files. Fix those up and try again.

When you're all done, submit awk.patch and awktests.sh, using the dropbox link dropbox.cs.princeton.edu/COS333_S2016/Two.

We will run your implementation on a large number of tests, stressing the syntax and semantics of the new features, and making sure that they do not break existing features of the language. We will run your tests on a reference solution to evaluate their correctness, and we will also test your tests for coverage.