COS 333 Assignment 2: Feeping Creaturism

Due Thursday February 21 at 10:00 pm

Mon Feb 11 09:30:59 EST 2019

One of the most common experiences for a programmer coming into a new job or working with some open-source code is to have to fix a bug or make a small change in a large unfamiliar program. This requires the ability to quickly find the relevant parts of the program and change them in a minimal way, while ignoring irrelevant parts and being sure not to break anything.

This assignment is an exercise in working with a program that we have talked about in class, but whose innards are almost certainly unfamiliar. It also involves

To get started, become familiar enough with Awk that you know in broad outline what it does; this Awk help file might help. Then clone the source from Github at https://github.com/onetrueawk/awk, skim the source code to see how it is implemented, and be sure that you can build it and test it before you make any changes. (You may have to fiddle the makefile to use yacc instead of bison; it's a 1-character change.)

Warning: Most of the assignments in this course are best done on Linux or MacOSX, and that is especially true of this one. If you are a Windows user, it is strongly recommended that you use ssh, putty or MobaXterm to log into a CS cluster like cycles or an OIT cluster like nobel, or use a virtual Linux machine. The alternative is likely to be a great deal of beating your head against lots of walls, and it's just not worth it. (For general use, Cygwin is highly recommended; it makes Windows more convenient for programmers familiar with Unix. You might also try WSL, the Windows Subsystem for Linux, a compatibility layer that lets you run Linux binaries on Windows.)

1. What to do

Your specific tasks are to

2. How to do it

For this assignment, you can get a very long way merely by finding code that already does more or less what's needed and is already in the right place and right form; most of the job amounts to intelligent copy and paste, using grep and your favorite text editor to locate places that might be relevant.

For the new kind of for statement, the Awk grammar needs two new rules (parallel to existing rules for for) that specify its syntax and create the right kind of node in the parse tree. Use the same syntactic category for expressions as in the existing for statement.

Lexical analysis has to recognize the three new keywords from, to and by, and their syntactic categories have to be added to the grammar. A new function has to be written to provide the semantics of the new for statement; it will be similar to the functions that handle the two existing for loops.

You also have to recognize the name of the new built-in function trim as part of lexical analysis and add semantics for it.

For detecting directory access, there are no grammar changes but you have to add one function and direct all file opens to it. Use the stat system call to determine whether a file is a directory; it's fine to search online for how to use this system call. Note that stat itself can return an error; that should be treated the same as a failure to access the specified file.

For the last part, testing, create a shell file awktests.sh that contains at least a dozen tests that ensure that your features are thoroughly tested. The file awktests.sh must be self-contained, requiring no input from a user and generating its own test data somehow. It should produce only one line per test if the test works, and one additional line per failure, of the form

Error: test N failed [further text...]
if a test fails. It should assume that the program being tested is named a.out and is in the current directory. For example, this file contains two minimal tests of the new for:
#!/bin/bash

echo 'test 1: count from 1 to 10'
a.out '
BEGIN {
	for (i from 1 to 10)
		print i
}'>temp1
for i in `seq 1 10`; do echo $i; done >temp2
cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test 1 failed: 1..10' 2>&1

echo 'test 2: count from 1 to 10 by 3'
a.out '
BEGIN {
	for (i from 1 to 10 by 3)
		print i
}' >temp1
for i in 1 4 7 10; do echo $i; done >temp2
cmp temp1 temp2 >/dev/null 2>&1 || echo 'Error: test 2 failed: 1..10 3' 2>&1
This file is here; you might find it easiest to just paste your tests into it. Your awktests.sh should contain at least ten further tests: the echo line should state what each test does. Don't forget to test for syntax errors as well as boundary conditions.

You also want to be sure that your changes don't break anything else. The Github distribution includes the Awk regression suite, which you can run with the single command REGRESS. A couple of tests in that will fail when you add new keywords like from and trim but otherwisd everything should work.

3. Advice

Start early.

For my version, I added about 50 lines for the for statement, spread over 6 source files. Testing directories added less than 10 lines, plus changing a handful of function calls in main.c, run.c and lib.c. For trim, there are no grammar changes but there is a new function name to be recognized, and you have to add new semantics in the right place in run.c. Figure out how some analogous one-argument built-in function works, then copy, paste and edit.

If you're doing a lot more, you are off on the wrong track. If you get stuck, here are some hints that might help. No penalty for using them, no reward for not using them, but you will probably find it most instructive to try hard before looking at the hints.

Make sure you don't break anything. Don't make unnecessary or irrelevant changes in any file. In particular, do not replace newlines by carriage returns (Mac) or CRLF (Windows), and do not reformat the code. And try to make your tests correct; it would be nice if we could run your tests over everyone else's implementations.

4. Submission

Updates to significant pieces of software are often distributed as "patch" files, that is, the set of changes necessary to convert the old version into the new version. For Unix and Linux source, a patch file is usually made by running diff to create a file of editing commands on the system where the changes were made, and running patch with that file as input on the system where the changes are to be applied.

For this assignment, you have to submit a patch file awk.patch that contains your changes. The easiest way to do this is to place the original program in one directory, say old, and the new version in new. Clean out all the junk like Yacc-generated files (ytab*), proctab.c, binary files like a.out and *.o, and editor temporaries like run.c~. (The make cleaner rule in the makefile doesn't get rid of all of ytab*, nor does it get rid of editor temporaries.) Then in the parent of these directories, say

diff -ur old new >awk.patch
The recipient (the grader in this case) will say, on his/her system,
cd old
patch --verbose --backup <../awk.patch
to update the old version with your changes, in place.

Try the patch process yourself to be sure it works right before you submit. Back up your work before you start experimenting!! My patch file is just under 300 lines long; if yours is a lot bigger (say over 1000 lines), you are probably including something like proctab.c or a Yacc output file, or you have somehow changed line terminations or tabs in your source files. Fix those up and try again.

When you're all done, submit awk.patch and awktests.sh, using the TigerFile link tigerfile.cs.princeton.edu/COS333_S2019/asgn2.