Princeton University
COS 217: Introduction to Programming Systems

Assignment 7: A Linux Shell

Purpose

The purpose of this assignment is to help you learn about Linux processes, low-level input/output, and signals. It also will give you ample opportunity to define software modules; in that sense the assignment is a capstone for the course.

Students from past semesters reported taking, on average, 26.5 hours to complete this assignment.

Rules

This assignment is an individual assignment, not a team assignment.

Signal handling (as described below) is the challenge part of this assignment. While doing the challenge part of the assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional policy: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the lab teaching assistants, other current students via Piazza, or any other people while working on the challenge part of an assignment, except for clarification of requirements.

The challenge part is worth 5 percent of the assignment. So if you don't do any of the challenge part and all other parts of your assignment solution are perfect and submitted on time, then your grade for the assignment will be 95 percent.

Background

A Linux shell is a program that makes the facilities of the operating system available to interactive users. There are several popular Linux/Unix shells: sh (the Bourne shell), csh (the C shell), and bash (the Bourne Again shell) are a few.

Your Task

Your task in this assignment is to create a series of three related programs. The programs must be named ishlex, ishsyn, and ish. Your ish program must be a minimal but realistic interactive Linux shell. Your development of the simpler ishlex and ishsyn programs will help you to develop your ish program. A Supplementary Information page lists detailed requirements and recommendations.

The Procedure

Develop on armlab. Use emacs to create source code. Use make to automate the build process. Use gdb to debug.

Stage 0: Preliminaries

Read this entire assignment specification and the entire assignment Supplementary Information page. Review the lecture slides and precept material from the first half of the course on testing, building, debugging, style, and especially modularity. Study the lecture slides and precept material from the second half of the course on exceptions and processes, process management, I/O management, signals, and alarms. Complete the pertinent required reading, especially Chapter 8 of Computer Systems: A Programmer's Perspective (Bryant & O'Hallaron).

The armlab /u/cos217/Assignment7 directory contains files that you will find useful. Subsequent stages describe them. Create a project directory, and copy all files from the /u/cos217/Assignment7 directory to your project directory.

Create a first draft of a Makefile. Then refine your Makefile throughout all subsequent stages.

The first dependency rule in your Makefile must command make to build programs named ishlex, ishsyn, and ish. (Those programs are described in subsequent stages.) That is, the first dependency rule in your Makefile must be:

all: ishlex ishsyn ish

Your Makefile must:

Maintain object (.o) files to allow for partial builds of the three executable binary files.
Encode the dependencies among the files that comprise your programs.
Use only the fundamental features of make that are covered in the Building lecture. For example, your Makefile must not contain implicit dependency rules (as covered in the Appendix of the Building lecture) or pattern dependency rules (not covered in the Building lecture).

Stage 1: Lexical Analysis

Compose a lexical analyzer for your programs. Your lexical analyzer must be defined in a distinct module. Your lexical analyzer must accept an array of characters, and return a DynArray object containing tokens. (The DynArray ADT was described in precepts. The source code defining the DynArray ADT is available in the armlab /u/cos217/Assignment7 directory.) Compose additional modules that are used by your lexical analyzer, as appropriate.

From the user's point of view, a token is a word. (Your program may represent a token as a string, or as a richer data structure.) More formally, from the user's point of view a token consists of a sequence of non-white-space characters that is separated from other tokens by white-space characters. There are two exceptions:

The special characters '<' and '>' form separate special tokens: the stdin-redirect token and the stdout-redirect token, respectively.
Strings enclosed in double quotes (") form part or all of a single token.

Special characters inside of strings are not separate tokens. It is an error for an "opening" double quote within a line to be unmatched by a "closing" double quote.

Make no assumptions about the length of each line. Your lexical analyzer must work for lines of any length.

Then compose a client of your lexical analyzer. The client must be defined in a file named ishlex.c. Use the ishlex.c client, your lexical analyzer module, and other modules that you have composed to build a program named ishlex. Your ishlex must:

Write to stdout a prompt consisting of a percent sign and a space.
Read a line (that is, an array of characters) from stdin.
Write that line (array of characters) to stdout
Flush the stdout buffer.
Pass the line (array of characters) to your lexical analyzer to create a DynArray object containing tokens.
Write the tokens to stdout, using precisely the format specified in the Supplementary Information page.

It must do that repeatedly until the program reaches end-of-file of stdin. Recall that typing Ctrl-d simulates end-of-file when stdin is bound to the terminal.

Test your ishlex thoroughly. These given files will help you with your testing:

sampleishlex: a sample correct program. Your ishlex must have the same behavior as sampleishlex does. That is, your ishlex must write exactly the same output to stdout as sampleishlex does. ishlex also must write exactly the same output to stderr as sampleishlex does, with one exception: whereas the error messages written by sampleishlex begin with "sampleishlex", the error messages written by your ishlex must begin with "ishlex" — more precisely, with argv[0], the file name of the program (which normally would be "ishlex", but need not be).
testishlex and testishlexdiff: scripts that automate your testing. Comments at the beginning of the scripts describe how to use them.
commands_lex: a file containing example commands that your ishlex program must lexically analyze properly. The commands_lex file is not intended to be thorough. It would be appropriate to test your program using many commands in addition to those provided in the commands_lex file.

Stage 2: Syntactic Analysis

Compose a syntactic analyzer for your programs. Your syntactic analyzer must be defined in a distinct module. Your syntactic analyzer must accept a DynArray object containing tokens, and return a command. Compose additional modules as appropriate.

The DynArray object containing tokens must begin with an ordinary token, which is the command's name. It is an error for the DynArray object not to begin with an ordinary token. The command name token might be followed by tokens which are command-line arguments, tokens which indicate redirection of stdin, and/or tokens which indicate redirection of stdout.

Your syntactic analyzer must handle redirection in these ways:

The token following a stdin-redirect token must be an ordinary token. Your syntactic analyzer must interpret that ordinary token as the name of file to which stdin is redirected. It is an error for a sequence of tokens to contain a stdin-redirect token that is not followed immediately by an ordinary token. It is an error for a sequence of tokens to contain multiple stdin-redirect tokens.
The token following a stdout-redirect token must be an ordinary token. Your syntactic analyzer must interpret that ordinary token as the name of file to which stdout is redirected. It is an error for a sequence of tokens to contain a stdout-redirect token that is not followed immediately by an ordinary token. It is an error for a sequence of tokens to contain multiple stdout-redirect tokens.

Then compose a client of your syntactic and lexical analyzer modules. The client must be defined in a file named ishsyn.c. Use the ishsyn.c client, your syntactic and lexical analyzer modules, and other modules that you have composed to build a program named ishsyn.

Your ishsyn must use the same lexical analyzer module as your ishlex does.

The behavior of your ishsyn must be a superset of the behavior of your ishlex, except that your ishsyn must not write tokens to stdout. More precisely, your ishsyn must:

Write to stdout a prompt consisting of a percent sign and a space.
Read a line (that is, an array of characters) from stdin.
Write that line (array of characters) to stdout.
Flush the stdout buffer.
Pass the array of characters to your lexical analyzer to create a DynArray object containing tokens.
Pass the DynArray object containing tokens to your syntactic analyzer to create a command.
Write the command to stdout, using precisely the format specified in the Supplementary Information page.

It must do that repeatedly until the program reaches end-of-file of stdin.

Test your ishsyn thoroughly. These given files will help you with your testing:

sampleishsyn: a sample correct program. Your ishsyn must have the same behavior as sampleishsyn does. That is, your ishsyn must write exactly the same output to stdout as sampleishsyn does. Your ishsyn also must write exactly the same output to stderr as sampleishsyn does, with one exception: whereas the error messages written by sampleishsyn begin with "sampleishsyn", the error messages written by your ishsyn must begin with "ishsyn" — more precisely, with argv[0], the file name of the program (which normally would be "ishsyn", but need not be).
testishsyn and testishsyndiff: scripts that automate your testing. Comments at the beginning of the scripts describe how to use them.
commands_syn: a file containing example commands that your ishsyn program must syntactically analyze properly. The commands_syn file is not intended to be thorough. It would be appropriate to test your program using many commands in addition to those provided in the commands_syn file.

Stage 3: Handling External Commands

Compose a "first draft" of ish. At this stage ish must handle simple external commands, that is, commands that contain no redirection (via < or >).

Specifically, compose a file named ish.c. Use ish.c, your lexical and syntactic analyzer modules, and other modules that you have composed to build a program named ish. Compose additional modules as appropriate.

The behavior of your ish must be a superset of the behavior of your ishsyn, except that your ish must not write commands to stdout. More precisely, your ish must:

Write to stdout a prompt consisting of a percent sign and a space.
Read a line (that is, an array of characters) from stdin.
Write the line (array of characters) to stdout.
Flush the stdout buffer.
Pass the line (array of characters) to your lexical analyzer to create DynArray object containing tokens.
Pass the DynArray object containing tokens to your syntactic analyzer to create a command.
Execute the command.

It must do that repeatedly until the program reaches end-of-file of stdin.

Test your ish thoroughly. These given files will help you with your testing:

sampleish: a sample correct program. Your ish must have the same behavior as sampleish does. That is, your ish must write exactly the same output to stdout as sampleish does. Your ish also must write exactly the same output to stderr as sampleish does, with one exception: whereas the error messages written by sampleish begin with "sampleish", the error messages written by your ish must begin with "ish" — more precisely, with argv[0], the file name of the program (which normally would be "ish", but need not be). Note however that the output of some of the commands executed by your ish will differ from the output of some of the commands executed by sampleish. For example, the output of a date command executed by your ish will differ from the output of a date command executed by sampleish, unless your ish and sampleish happen to execute the date commands at exactly the same time.
testish and testishdiff: scripts that automate your testing. Comments at the beginning of the scripts describe how to use them.

Your ish must use the same lexical analyzer module and syntactic analyzer module as your ishsyn does.

Stage 4: Handling Shell Built-In Commands

Enhance ish so it handles shell built-in commands. Specifically, ish must interpret four shell built-in commands:

`setenv var [value]`	If environment variable `var` does not exist, then your `ish` must create it. Your `ish` must set the value of `var` to `value`, or to the empty string if `value` is omitted. Note: Initially, your `ish` inherits environment variables from its parent. Your `ish` must be able to modify the value of an existing environment variable or create a new environment variable via the `setenv` command. Your `ish` must be able to set the value of any environment variable; but the only environment variable that it explicitly uses is `HOME`. It is an error for a `setenv` command to have zero or more than two command-line arguments.
`unsetenv var`	Your `ish` must destroy the environment variable `var`. It is an error for an `unsetenv` command to have zero command-line arguments or more than one command-line argument.
`cd [dir]`	Your `ish` must change its working directory to `dir`, or to the HOME directory if `dir` is omitted. It is an error for a `cd` command to have more than one command-line argument. It is an error for a `cd` command to have zero command-line arguments if the `HOME` environment variable is not set.
`exit`	Your `ish` must exit with status 0. It is an error for an `exit` command to have any command-line arguments.

Test your ish thoroughly. Your ish must have exactly the same behavior as sampleish does with respect to its handling of shell built-in commands. You will find the aforementioned testish and testishdiff scripts helpful.

Stage 5: Handling Redirection

Enhance your ish so it handles redirection of stdin and/or stdout.

It is erroneous for stdin to be redirected to a file that does not exist.

If stdout is redirected to a file that does not exist, then your ish must create it. If the stdout is redirected to a file that already exists, then your ish must destroy the file's contents and rewrite the file from scratch. Your ish must set the permissions of the file to 0600.

It is erroneous for stdout to be redirected to a file whose name is invalid. For example, it is erroneous for stdout to be redirected to a file named "/" or ".", or for stdout to be redirected to a file in some directory whose contents the user cannot change.

Note that the four shell built-in commands neither read from stdin nor write to stdout. So it would be pointless (but not erroneous) for the user to redirect stdin or stdout within any of those commands. More precisely, when given a shell built-in command containing redirection of stdin or stdout, your ish must lexically and syntactically analyze the entire command, including the part that redirects stdin or stdout — just as your ishlex and your ishsyn do — and must report any lexical or syntactic errors that it encounters. However your ish must not implement the specified file redirection.

Test your ish thoroughly. Your ish must have exactly the same behavior as your sampleish does with respect to handling of redirection. You will find the aforementioned testish and testishdiff scripts helpful.

Stage 6: Handling Signals

Enhance your ish to handle SIGINT signals.

When the user types Ctrl-c, Linux sends a SIGINT signal to your ish (parent) process and to its child process. Upon receiving a SIGINT signal:

If the ish parent process is handling a command — that is, if the user had entered a command and the ish parent process is executing it or is waiting for its child process to exit — then the ish parent process must ignore the SIGINT signal.
If the ish parent process is not handling a command, then the ish parent process must give no response to the user and must continue executing. However, if the ish parent process receives another SIGINT signal quickly (within three seconds, and before the user enters another command), then the ish parent process must exit.
The child process must exit. More precisely, unless the child process itself (beyond the control of the ish parent process) has installed a handler for SIGINT signals, the child process must exit.

Test your ish thoroughly. Your ish must have exactly the same behavior as sampleish does with respect to handling of signals.

Finishing Up

Critique your programs using the splint tool. Each time splint generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) copy the warning to your readme file and explain your disagreement it.

Similarly, critique your programs using the critTer tool. Each time critTer generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) copy the warning to your readme file and explain your disagreement it.

Edit your copy of the given readme file by answering each question that is expressed therein.

One of the sections of the readme file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme file, copy the list of authorized sources given in the "Policies" web page to that section, and edit it as appropriate.

Provide the instructors with your feedback on the assignment. To do that, issue this command:

FeedbackCOS217.py 7

and answer the questions that it asks. That command stores its questions and your answers in a file named feedback in your working directory.

Submit your work electronically on armlab using these commands:

submit 7 readme feedback Makefile ishlex.c ishsyn.c ish.c
submit 7 dynarray.h dynarray.c
submit 7 allOtherModuleFiles

Don't forget to submit both your .h files and your .c files.

To make sure that your submission is complete, use this approach... Create a temporary directory. Copy the files that comprise your submission to that directory. Build your programs in that directory to make sure that no files are missing. Delete from that directory all files that you do not wish to submit, for example, executable binary files and .o files. Finally submit all of the files in that directory by issuing the command submit 7 *.

Handling Errors

Your programs must handle each erroneous line gracefully by writing a descriptive error message to stderr and rejecting the line. Any error message written by your programs must begin with "programName: " where programName is argv[0], that is, the name of your program's executable binary file. Note that argv[0] typically will be ishlex, ishsyn, or ish, but need not be so.

The error messages written by your programs must be identical to those written by sampleishlex, sampleishsyn, and sampleish. However, if your programs read a line that contains multiple errors, then your programs can report any one of the errors — not necessarily the same error as sampleishlex, sampleishsyn, and sampleish reports.

It must be impossible for the user's input to cause your programs to terminate abnormally — via a failed assert, heap corruption, a segmentation fault, etc.

Memory Management

Your programs must contain no memory leaks. For every call of malloc or calloc, eventually there must be a corresponding call of free. More specifically, your programs must produce clean meminfo reports when the user terminates your programs by typing Ctrl-d. ish need not produce a clean meminfo report when the user terminates the program by issuing the exit command or by typing Control-c twice within three seconds.

Program Style

In part, good program style is defined by the splint and critTer tools, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.

The more course-specific style rules listed in the previous assignment specifications also apply, as do these: your code must have proper file-level and function-level modularity.

Grading

To receive any credit for your ishlex, the program must build. To receive any credit for your ishsyn, the program must build. To receive any credit for your ish, the program must build.

We will grade your work on two kinds of quality:

Quality from the user's point of view. From the user's point of view, your code has quality if it behaves as it must. The correct behavior of your programs is defined by the previous sections of this assignment specification and by the given sampleishlex, sampleishsyn, and sampleish programs.
Quality from the programmer's point of view. From the programmer's point of view, your code has quality if it is well styled and thereby easy to maintain. Good program style is defined by the previous section of this assignment specification. The use of proper function-level and file-level modularity will be a prominent part of your grade.

To encourage good coding practices, we will deduct points if gcc217 generates warning messages.

Remember that the Supplementary Information page lists detailed implementation requirements and recommendations.

This assignment was written by Robert M. Dondero, Jr.
with contributions by many other faculty members and students.

Princeton University COS 217: Introduction to Programming Systems