# Princeton University COS 217: Introduction to Programming Systems

## Assignment 1: A "De-Comment" Program

### Purpose

The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) the details of the "de-commenting" task of the C preprocessor, and (3) how to use the GNU/UNIX programming tools, especially bash, xemacs, and gcc.

### Background

The C preprocessor is an important part of the C programming system.  The C preprocessor performs two jobs. Its first job is to handle preprocessor directives (#define, #include, etc.) that reside in the given source code file. Its second job is to "de-comment" (that is, remove comments from) the given source code file. The first is the more difficult. Nevertheless, the second is substantial.

Your task is to compose a C program named "decomment" that performs the de-commenting job of the C preprocessor.

Your program should be structured as a UNIX filter. That is, your program should read characters from standard input, and write characters to standard output (and possibly to standard error). Specifically, your program should read text (which presumably comprises a C program) from standard input, write that same text -- devoid of comments -- to standard output, and write error and warning messages as appropriate to standard error. A typical command-line execution of your program might look like this:

decomment < somefile.c > somefilewithoutcomments.c 2> errorandwarningmessages

### The Details

• Define "comment" as in the C89 standard. In particular, your program should consider text of the form (/* ... */) to be a comment. It should not consider text of the form (// ... ) to be a comment. Examples:
 Standard Input Standard Output Standard Error abc/*def*/ghi abcghi   abc//def abc//def
• Allow a comment to span multiple lines. That is, your program should allow a comment to contain newline characters. Example:
 Standard Input Standard Output Standard Error abc/*def ghi*/jkl abcjkl 
• Not recognize nested comments. Example:
 Standard Input Standard Output Standard Error abc/*def/*ghi*/jkl*/ abcjkl*/ 
• Handle C string literals. In particular, your program should not consider text of the form (/* ... */) that occurs within a string literal ("...") to be a comment. Examples:
 Standard Input Standard Output Standard Error abc"def/*ghi*/jkl"mno abc"def/*ghi*/jkl"mno   abc/*def"ghi"jkl*/mno abcmno   abc/*def"ghijkl*/mno abcmno 
• Handle C character literals. In particular, your program should not consider text of the form (/* ... */) that occurs within a character literal ('...') to be a comment. Examples:
 Standard Input Standard Output Standard Error abc'def/*ghi*/jkl'mno abc'def/*ghi*/jkl'mno   abc/*def'ghi'jkl*/mno abcmno   abc/*def'ghijkl*/mno abcmno 

Note that the C compiler would consider the first of those examples to be erroneous. But the C preprocessor would not, and your program should not.

• (Revised 9/23/04) Handle escaped double quote characters within string literals. That is, your program should consider text of the form ("...\" ...") to be a valid string literal which happens to contain the double quote character. Handle escaped characters within string literals. That is, when your program reads a backslash ('\') while processing a string literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning. In particular, your program should consider text of the form ("...\" ...") to be a valid string literal which happens to contain the double quote character. Examples:
 Standard Input Standard Output Standard Error abc"def\"ghi"jkl abc"def\"ghi"jkl   abc"def\'ghi"jkl abc"def\'ghi"jkl 
• (Revised 9/23/04) Handle escaped quote characters within character literals. That is, your program should consider text of the form ('...\' ...') to be a valid character literal which happens to contain the quote character.  Handle escaped characters within character literals. That is, when your program reads a backslash ('\') while processing a character literal, your program should consider the next character to be an ordinary character that is devoid of any special meaning.  In particular, your program should consider text of the form ('...\' ...') to be a valid character literal which happens to contain the quote character. Examples:
 Standard Input Standard Output Standard Error abc'def\'ghi'jkl abc'def\'ghi'jkl   abc'def\"ghi'jkl abc'def\"ghi'jkl 

Note that the C compiler would consider both of those examples to be erroneous. But the C preprocessor would not, and your program should not.

• Handle escaped newline characters. That is, when a newline character is immediately preceded by a backslash ('\') character, your program should write neither the backslash character nor the newline character to standard output. (Revised 9/23/04) Your program's handling of escaped newline characters should take precedence over its handling of escaped characters within string literals and within character literals. Examples:
 Standard Input Standard Output Standard Error abc def\ ghi jkl abc defghi jkl   abc\def ghi abc\def ghi   abc"def\ ghi"jkl abc"defghi"jkl abc'def\ ghi'jkl abc'defghi'jkl

Note that the C compiler would consider the last of those examples to be erroneous. But the C preprocessor would not, and your program should not.

• Allow multi-line C string literals, but generate warning messages when they occur. Specifically, your program should write the message "Warning: line X: newline in string literal" when a newline character occurs within a string literal.  "X" should be the number of the line which contains the offending newline character. Examples:
 Standard Input Standard Output Standard Error abc"def ghi"jkl abc"def ghi"jkl Warning: line 1: newline in string literal abc"def ghi jkl"mno abc"def ghi jkl"mno Warning: line 1: newline in string literal Warning: line 2: newline in string literal
• Allow multi-line C character literals, but generate warning messages when they occur. Specifically, your program should write the message "Warning: line X: newline in character literal" when a newline character occurs within a character literal. "X" should be the number of the line which contains the offending newline character. Examples:
 Standard Input Standard Output Standard Error abc'def ghi'jkl abc'def ghi'jkl Warning: line 1: newline in character literal abc'def ghi jkl'mno abc'def ghi jkl'mno Warning: line 1: newline in character literal Warning: line 2: newline in character literal

Note that the C compiler would consider both of those examples to be erroneous. But the C preprocessor would not, and your program should not.

• Detect an unterminated string literal. If your program detects end-of-file before a string literal is terminated, it should write the message "Error: line X: unterminated string literal" to standard error. "X" should be the number of the line on which the unterminated string literal begins. Examples:
 Standard Input Standard Output Standard Error abc"def ghi jkl abc"def ghi jkl Warning: line 1: newline in string literal Warning: line 2: newline in string literal Warning: line 3: newline in string literal Error: line 1: unterminated string literal
• Detect an unterminated character literal. If your program detects end-of-file before a character literal is terminated, it should write the message "Error: line X: unterminated character literal" to standard error. "X" should be the number of the line on which the unterminated character literal begins. Examples:
 Standard Input Standard Output Standard Error abc'def ghi jkl abc'def ghi jkl Warning: line 1: newline in character literal Warning: line 2: newline in character literal Warning: line 3: newline in character literal Error: line 1: unterminated character literal

Note that the C compiler would consider that example to be erroneous. But the C preprocessor would not, and your program should not.

• Detect an unterminated comment. If your program detects end-of-file before a comment is terminated, it should write the message "Error: line X: unterminated comment" to standard error. "X" should be the number of the line on which the unterminated comment begins.
 Standard Input Standard Output Standard Error abc/*def ghi abc Error: line 1: unterminated comment abc/*def ghi* abc Error: line 1: unterminated comment abc/*def*ghi jkl abc Error: line 1: unterminated comment abc/*def/ghi jkl abc Error: line 1: unterminated comment

You should make sure that the text that your program writes to standard output ends with a newline character. In particular, you should make sure that your program writes a newline character after an unterminated comment.

You should not make any assumptions about the maximum length of an input line.

Suggestion: Design your program as a deterministic finite state automaton. That concept is described in the COS 126 course.

Hints: You will find the standard C "getchar" function useful. You might find the standard C "ungetc" function useful, especially for handling the backslash-newline character sequence.

### Logistics

You should create your program on hats using xemacs and gcc.

#### Step 1: Create Source Code

Use xemacs to create source code in a file named decomment.c.

Limit line lengths in your source code to 78 characters. Doing so allows us to print your work in two columns, thus saving paper.

Note: For this assignment you may place all source code in the decomment.c file. You need not split your source code into multiple files. Subsequent assignments will ask you to write programs which should consist of multiple source code files.

#### Step 2: Compile, Assemble, and Link

Use the gcc command with the -Wall, -ansi, and -pedantic options to preprocess, compile, assemble, and link your program.

#### Step 3: Execute

Execute your program multiple times on various input files that test all logical paths through your code.

We have provided several files in hats directory /u/cos217/Assignment1. You should copy those files to your project directory, and use them to help you test your decomment program.

• sampledecomment is an executable version of a correct assignment solution. Your program should write exactly (character for character) the same data to standard output and standard error as does sampledecomment. You should test your program using commands similar to these:
sampledecomment < somefile.c > output1 2> errors1
decomment < somefile.c > output2 2> errors2
diff output1 output2
diff errors1 errors1
rm output1 errors1 output2 errors2

The UNIX "diff" command finds differences between two given files. The executions of the diff command shown above should produce no output. If the command "diff output1 output2" produces output, then sampledecomment and your program have written different characters to standard output. Similarly, if the command "diff errors1 errors2" produces output, then sampledecomment and your program have written different characters to standard error.

• Several .txt files (that is, files whose names end with the ".txt") can serve as input files to your program.

#### Step 4: Create a readme File

Use xemacs to create a "readme" text file that contains:

• A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course "Policies" web page.
• (Optionally) An indication of how much time you spent doing the assignment.
• (Optionally) Your assessment of the assignment: Did it help you to learn? What did it help you to learn? Do you have any suggestions for improvement? Etc.
• (Optionally) Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.

Comments describing your code should not be in the readme file. Rather they should be integrated into the code itself.

#### Step 5: Submit

Submit your work electronically on hats via the command:

/u/cos217/bin/i686/submit 1 decomment.c readme

If the directory /u/cos217/bin/i686 is in your PATH environment variable, then you can abbreviate that command as:

submit 1 decomment.c readme

If you are using the bash shell and have copied files .bashrc and .bash_profile to your HOME directory, then directory /u/cos217/bin/i686 indeed is in your PATH environment variable. You can examine the value of your PATH environment variable by executing the command "printenv PATH".