Princeton University
COS 217: Introduction to Programming Systems

Assignment 1: A "De-Comment" Program

Purpose

The purpose of this assignment is to help you learn or review (1) the fundamentals of the C programming language, (2) the details of the "de-commenting" task of the C preprocessor, and (3) how to use the GNU/UNIX programming tools, especially bash, xemacs, and gcc.

Background

The C preprocessor is an important part of the C programming system. Given a C source code file, the C preprocessor performs three jobs:

  1. Merge "physical" lines of source code into "logical" lines. That is, when the preprocessor detects a backslash character immediately followed by a newline character, it discards both of those characters.
  2. Remove comments from ("de-comment") the source code.
  3. Handle preprocessor directives (#define, #include, etc.) that reside in the source code.

The "de-comment" job is substantial. For example, note that the C preprocessor must be sensitive to:

Your Task

Your task is to compose a C program named "decomment" that performs a subset of the de-comment job of the C preprocessor, as defined below.

Your program should be structured as a UNIX "filter." That is, your program should read characters from standard input, and write characters to standard output and possibly to standard error. Specifically, your program should (1) read text, presumably a C program, from standard input, (2) write that same text to standard output with each comment replaced by a space, and (3) write error and warning messages as appropriate to standard error. A typical command-line execution of your program might look like this:

decomment < somefile.c > somefilewithoutcomments.c 2> errorandwarningmessages

Functionality

In the following examples a space is shown as "s" and a newline character as "n". Your program should:

Standard Input Standard Output Standard Error
abc/*def*/ghin abcsghin
abc/*def*/sghin abcssghin  
abcs/*def*/ghin abcssghin  
Standard Input Standard Output Standard Error
abc//defn abc//defn  
Standard Input Standard Output Standard Error
abc/*defn
ghi*/jkl
n
mno
n
abcn
sjkln
mno
n
 
abc/*defn
ghijkl
n
mno*/pqr
n
stu
n
abcn
n
spqrn
stu
n
 
Standard Input Standard Output Standard Error
abc/*def/*ghi*/jkl*/mnon abcsjkl*/mnon
Standard Input Standard Output Standard Error
abc"def/*ghi*/jkl"mnon abc"def/*ghi*/jkl"mnon
abc/*def"ghi"jkl*/mnon abcsmnon
abc/*def"ghijkl*/mnon abcsmnon
abc"def'ghi'jkl"mnon abc"def'ghi'jkl"mnon  
Standard Input Standard Output Standard Error
abc'def/*ghi*/jkl'mnon abc'def/*ghi*/jkl'mnon  
abc/*def'ghi'jkl*/mnon abcsmnon  
abc/*def'ghijkl*/mnon abcsmnon  
abc'def"ghi"jkl'mnon abc'def"ghi"jkl'mnon  

Note that the C compiler would consider some of those examples to be erroneous. But the C preprocessor would not, and your program should not.

Standard Input Standard Output Standard Error
abc"def\"ghi"jkln abc"def\"ghi"jkln  
abc"def\'ghi"jkln abc"def\'ghi"jkln  
Standard Input Standard Output Standard Error
abc'def\'ghi'jkln abc'def\'ghi'jkln  
abc'def\"ghi'jkln abc'def\"ghi'jkln  

Note that the C compiler would consider both of those examples to be erroneous. But the C preprocessor would not, and your program should not.

Standard Input Standard Output Standard Error
abc"defn
ghi"jkl
n
abc"defn
ghi"jkl
n
Warning:slines1:snewlinesinsstringsliteraln
abc"defn
ghi
n
jkl"mno
n
abc"defn
ghi
n
jkl"mno
n
Warning:slines1:snewlinesinsstringsliteraln
Warning:
slines2:snewlinesinsstringsliteraln
Standard Input Standard Output Standard Error
abc'defn
ghi'jkl
n
abc'defn
ghi'jkl
n
Warning:slines1:snewlinesinscharactersliteraln
abc'defn
ghi
n
jkl'mno
n
abc'defn
ghi
n
jkl'mno
n
Warning:slines1:snewlinesinscharactersliteraln
Warning:
slines2:snewlinesinscharactersliteraln

Note that the C compiler would consider both of those examples to be erroneous. But the C preprocessor would not, and your program should not.

Standard Input Standard Output Standard Error
abc"defn
ghi
n
jkl
n
abc"defn
ghi
n
jkl
n
Warning:slines1:snewlinesinsstringsliteraln
Warning:
slines2:snewlinesinsstringsliteraln
Warning:
slines3:snewlinesinsstringsliteraln
Error:
slines1:sunterminatedsstringsliteraln
Standard Input Standard Output Standard Error
abc'defn
ghi
n
jkl
n
abc'defn
ghi
n
jkl
n
Warning:slines1:snewlinesinscharactersliteraln
Warning:
slines2:snewlinesinscharactersliteraln
Warning:
slines3:snewlinesinscharactersliteraln
Error:
slines1:sunterminatedscharactersliteraln

Note that the C compiler would consider that example to be erroneous. But the C preprocessor would not, and your program should not.

Standard Input Standard Output Standard Error
abc/*defn
ghi
n
abcn
n
Error:slines1:sunterminatedscommentn
abcdefn
ghi/*
n
abcdefn
ghi
n
Error:slines2:sunterminatedscommentn
abc/*def/ghin
jkl
n
abcn
n
Error:slines1:sunterminatedscommentn
abc/*def*ghin
jkl
n
abcn
n
Error:slines1:sunterminatedscommentn
abc/*defn
ghi*
n
abcn
n
Error:slines1:sunterminatedscommentn
abc/*defn
ghi/
n
abcn
n
Error:slines1:sunterminatedscommentn

Your program should work for standard input lines of any length.

Your program may assume that the backslash-newline character sequence does not occur in standard input. That is, your program may assume that "logical" lines are identical to "physical" lines in standard input.

Design

We strongly suggest that you design your program as a deterministic finite state automaton (FSA), as described in lectures.

Your program should not consist of one large main function. Instead your program should consist of multiple small functions, each of which performs a single well-defined task. For example, you might create one function to implement each state of your FSA.

Generally, all communication of data into and out of a function should occur via the function's parameters and its return value, and not via global variables. You should use ordinary "call-by-value" parameters to communicate data from a calling function to your function. You should use your function's return value to communicate data from your function back to its calling function. You should use "call-by-reference" parameters to communicate additional data from your function back to its calling function, or as bi-directional channels of communication. However, call-by-reference involves using pointer variables, which we have not discussed yet. So for this assignment you may use global variables instead of call-by-reference parameters. (But we encourage you to use call-by-reference parameters.)

In short, you should use ordinary call-by-value function parameters and function return values in your program as appropriate. But you need not use call-by-reference parameters; instead you may use global variables. In subsequent assignments you should use global variables sparingly, and only when there is no reasonable alternative.

Generally, a (large) C program should consist of of multiple source code files.  For this assignment, you need not split your source code into multiple files. Instead you may place all source code in a single source code file. Subsequent assignments will ask you to write programs consisting of multiple source code files.

Please limit line lengths in your source code to 78 characters. Doing so allows us to print your work in two columns, thus saving paper.

We suggest that your program read characters from standard input using the standard C getchar() function.

Logistics

You should create your program on hats using bash, xemacs and gcc.

Step 1: Create Source Code

Use xemacs to create source code in a file named decomment.c.

Step 2: Preprocess, Compile, Assemble, and Link

Use the gcc command with the -Wall, -ansi, and -pedantic options to preprocess, compile, assemble, and link your program.

Step 3: Execute

Execute your program multiple times on various input files that test all logical paths through your code.

We have provided several files in hats directory /u/cos217/Assignment1. You should copy those files to your project directory, and use them to help you test your decomment program.

sampledecomment < somefile.c > output1 2> errors1
decomment < somefile.c > output2 2> errors2
diff output1 output2
diff errors1 errors1
rm output1 errors1 output2 errors2

The UNIX diff command finds differences between two given files. The executions of the diff command shown above should produce no output. If the command "diff output1 output2" produces output, then sampledecomment and your program have written different characters to standard output. Similarly, if the command "diff errors1 errors2" produces output, then sampledecomment and your program have written different characters to standard error.

Step 4: Create a readme File

Use xemacs to create a text file named "readme" that contains:

Descriptions of your code should not be in the readme file. Instead they should be integrated into your code as comments.

Step 5: Submit

Submit your work electronically on hats via the command:

/u/cos217/bin/i686/submit 1 decomment.c readme

If the directory /u/cos217/bin/i686 is in your PATH environment variable, then you can abbreviate that command as:

submit 1 decomment.c readme

If you are using the bash shell and have copied files .bashrc and .bash_profile from the /u/cos217 directory to your HOME directory, then directory /u/cos217/bin/i686 indeed is in your PATH environment variable. You can examine your PATH environment variable by executing the command "printenv PATH".

Grading

We will grade your work on functionality and design. We will consider understandability to be an important aspect of good design. See the next section for guidelines concerning program understandability. To encourage good coding practices, we will build using "gcc -Wall -ansi -pedantic" and take off points based on warning messages.

Program Understandability

An understandable program: