Princeton University
COS 217: Introduction to Programming Systems

Assignment 1: A "De-Comment" Program

Purpose

The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) the details of the "de-commenting" task of the C preprocessor, and (3) how to use the GNU/UNIX programming tools, especially bash, xemacs, and gcc.

Background

The C preprocessor is an important part of the C programming system. Given a C source code file, the C preprocessor performs three jobs:

  1. Merge "physical" lines of source code into "logical" lines. When the preprocessor detects a backslash character immediately followed by a newline character, it discards both of those characters.
  2. Handle preprocessor directives (#define, #include, etc.) that reside in the source code.
  3. "De-comment" (that is, remove comments from) the source code.

Certainly, the "handle preprocessor directives" job is the most difficult. But the "de-comment" job is substantial. For example, note that the C preprocessor must be sensitive to:

Your Task

Your task is to compose a C program named "decomment" that performs a subset of the de-comment job of the C preprocessor, as described below.

Your program should be structured as a UNIX filter. That is, your program should read characters from standard input, and write characters to standard output (and possibly to standard error). Specifically, your program should read text (which presumably comprises a C program) from standard input, write that same text -- with each comment replaced by a space -- to standard output, and write error and warning messages as appropriate to standard error. A typical command-line execution of your program might look like this:

decomment < somefile.c > somefilewithoutcomments.c 2> errorandwarningmessages

The Details

Your program should:

Standard Input Standard Output Standard Error
abc/*def*/ghi
abc ghi
 
abc/*def*/ ghi
abc  ghi
 
abc /*def*/ghi
abc  ghi
 
Standard Input Standard Output Standard Error
abc/*def*/ghi
abc ghi
 
abc//def
abc//def
 
Standard Input Standard Output Standard Error
abc/*def
ghi*/jkl
mno
abc jkl
mno
 
Standard Input Standard Output Standard Error
abc/*def/*ghi*/jkl*/
abc jkl*/
 
Standard Input Standard Output Standard Error
abc"def/*ghi*/jkl"mno
abc"def/*ghi*/jkl"mno
 
abc/*def"ghi"jkl*/mno
abc mno
 
abc/*def"ghijkl*/mno
abc mno
 
Standard Input Standard Output Standard Error
abc'def/*ghi*/jkl'mno
abc'def/*ghi*/jkl'mno
 
abc/*def'ghi'jkl*/mno
abc mno
 
abc/*def'ghijkl*/mno
abc mno
 

Note that the C compiler would consider the first of those examples to be erroneous. But the C preprocessor would not, and your program should not.

Standard Input Standard Output Standard Error
abc"def\"ghi"jkl
abc"def\"ghi"jkl
 
abc"def\'ghi"jkl
abc"def\'ghi"jkl
 
Standard Input Standard Output Standard Error
abc'def\'ghi'jkl
abc'def\'ghi'jkl
 
abc'def\"ghi'jkl
abc'def\"ghi'jkl
 

Note that the C compiler would consider both of those examples to be erroneous. But the C preprocessor would not, and your program should not.

Standard Input Standard Output Standard Error
abc"def
ghi"jkl
abc"def
ghi"jkl
Warning: line 1: newline in string literal
abc"def
ghi
jkl"mno
abc"def
ghi
jkl"mno
Warning: line 1: newline in string literal
Warning: line 2: newline in string literal
Standard Input Standard Output Standard Error
abc'def
ghi'jkl
abc'def
ghi'jkl
Warning: line 1: newline in character literal
abc'def
ghi
jkl'mno
abc'def
ghi
jkl'mno
Warning: line 1: newline in character literal
Warning: line 2: newline in character literal

Note that the C compiler would consider both of those examples to be erroneous. But the C preprocessor would not, and your program should not.

Standard Input Standard Output Standard Error
abc"def
ghi
jkl
abc"def
ghi
jkl
Warning: line 1: newline in string literal
Warning: line 2: newline in string literal
Warning: line 3: newline in string literal
Error: line 1: unterminated string literal
Standard Input Standard Output Standard Error
abc'def
ghi
jkl
abc'def
ghi
jkl
Warning: line 1: newline in character literal
Warning: line 2: newline in character literal
Warning: line 3: newline in character literal
Error: line 1: unterminated character literal

Note that the C compiler would consider that example to be erroneous. But the C preprocessor would not, and your program should not.

Standard Input Standard Output Standard Error
abc/*def
ghi
abc
Error: line 1: unterminated comment
abc/*def
ghi*
abc
Error: line 1: unterminated comment
abc/*def*ghi
jkl
abc
Error: line 1: unterminated comment
abc/*def/ghi
jkl
abc
Error: line 1: unterminated comment

The text that your program writes to standard output should end with a newline character.

You should not make any assumptions about the maximum length of an input line.

Suggestion: Design your program as a deterministic finite state automaton, as described in lectures.

Suggestion: You will find the standard C "getchar" function useful.

Logistics

You should create your program on hats using xemacs and gcc.

Step 1: Create Source Code

Use xemacs to create source code in a file named decomment.c.

Limit line lengths in your source code to 78 characters. Doing so allows us to print your work in two columns, thus saving paper.

Note: For this assignment you may place all source code in the decomment.c file. You need not split your source code into multiple files. Subsequent assignments will ask you to write programs which should consist of multiple source code files.

Step 2: Preprocess, Compile, Assemble, and Link

Use the gcc command with the -Wall, -ansi, and -pedantic options to preprocess, compile, assemble, and link your program.

Step 3: Execute

Execute your program multiple times on various input files that test all logical paths through your code.

We have provided several files in hats directory /u/cos217/Assignment1. You should copy those files to your project directory, and use them to help you test your decomment program.

sampledecomment < somefile.c > output1 2> errors1
decomment < somefile.c > output2 2> errors2
diff output1 output2
diff errors1 errors1
rm output1 errors1 output2 errors2

The UNIX "diff" command finds differences between two given files. The executions of the diff command shown above should produce no output. If the command "diff output1 output2" produces output, then sampledecomment and your program have written different characters to standard output. Similarly, if the command "diff errors1 errors2" produces output, then sampledecomment and your program have written different characters to standard error.

Step 4: Create a readme File

Use xemacs to create a "readme" text file that contains:

Comments describing your code should not be in the readme file. Rather they should be integrated into the code itself.

Step 5: Submit

Submit your work electronically on hats via the command:

/u/cos217/bin/i686/submit 1 decomment.c readme

If the directory /u/cos217/bin/i686 is in your PATH environment variable, then you can abbreviate that command as:

submit 1 decomment.c readme

If you are using the bash shell and have copied files .bashrc and .bash_profile from the /u/cos217 directory to your HOME directory, then directory /u/cos217/bin/i686 indeed is in your PATH environment variable. You can examine the value of your PATH environment variable by executing the command "printenv PATH".

Grading

We will grade your work on correctness and design. We will consider understandability to be an important aspect of good design. See the next section for guidelines concerning program understandability. To encourage good coding practices, we will compile using "gcc -Wall -ansi -pedantic" and take off points based on warning messages during compilation.

Program Understandability

An understandable program:

(1) Uses a consistent and appropriate indentation scheme. All statements that are nested within a compound, if, switch, while, for, or do...while statement should be indented. Most programmers use either a 3- or 4-space indentation scheme. Note that the xemacs editor can automatically apply a consistent indentation scheme to your program.

(2) Contains descriptive identifiers. The names of variables, constants, structures, types, and functions should indicate their purpose. Remember: C can handle identifiers of any length, and the first 31 characters are significant. We encourage you to prefix each variable name with characters that indicate its type. For example, the prefix "c" might indicate that the variable is of type "char," "i" might indicate "int," "pc" might mean "pointer to char," "ui" might mean "unsigned int," etc.

(3) Contains carefully worded comments. You should begin each program file with a comment that includes your name, the number of the assignment, and the name of the file. Each function -- especially the main function -- should begin with a comment that describes what the computer does when it executes that function. That comment should explicitly state what (if anything) the computer reads from standard input (or any other stream), and what (if anything) the computer writes to standard output and standard error (or any other stream). The function's comment should also describe what the computer does when it executes that function by explicitly referring to the function's parameters and return value.