Princeton University
COS 217: Introduction to Programming Systems
Assignment 1: A "De-Comment" Program
Purpose
The purpose of this assignment is to help you learn or review (1) the fundamentals
of the C programming language, (2) the details of the "de-commenting" task
of the C preprocessor, and (3) how to use the GNU/UNIX programming tools,
especially bash, xemacs, and gcc.
Background
The C preprocessor is an important part of the C programming system. Given
a C source code file, the C preprocessor performs
three jobs:
- Merge "physical" lines of source code into "logical" lines.
That is, when the
preprocessor detects a backslash character immediately followed by a newline
character, it discards both of those characters.
- Remove comments from ("de-comment") the source code.
- Handle
preprocessor directives (#define, #include, etc.) that reside in the source code.
The "de-comment" job is substantial. For example, note that the C
preprocessor must be sensitive to:
- The fact that a comment is a token delimiter. After removing a
comment, the C preprocessor must make sure that a whitespace character is in its place.
- Line numbers. After removing a comment, the C preprocessor sometimes must
insert blank lines in its place to preserve the original line numbering.
- String and character literal boundaries. The preprocessor must not
consider the character sequence (/*...*/) to be a comment if it occurs inside
a string literal ("...") or character literal ('...').
Your Task
Your task is to compose a C program named "decomment" that performs a
subset of the
de-comment job of the C preprocessor, as defined below.Your program should be structured as a UNIX
"filter." That is,
your program should read characters from standard input, and write characters to
standard output and possibly to standard error. Specifically, your
program should (1) read text, presumably a C program, from
standard input, (2) write that same text to standard output with each comment replaced by a space, and (3) write error and warning messages as appropriate to standard error. A typical
command-line execution of your program might look like this:
decomment < somefile.c > somefilewithoutcomments.c 2> errorandwarningmessages
Functionality
In the following examples a space is shown as "s"
and a newline character as "n". Your program should:
- Replace each completed comment with a single space.
Examples:
|
Standard Input |
Standard Output |
Standard Error |
|
abc/*def*/ghin |
abcsghin |
|
|
abc/*def*/sghin |
abcssghin |
|
|
abcs/*def*/ghin |
abcssghin |
|
- Define "comment" as in the C89 standard. In particular,
your program should consider text of the form (/* ... */) to be a comment.
It should not consider text of the form (// ... ) to be a comment.
Example:
|
Standard Input |
Standard Output |
Standard Error |
|
abc//defn |
abc//defn |
|
- Allow a comment to span multiple lines. That is, your program should allow
a comment to contain newline characters. Your program should add blank
lines as necessary to preserve the original line numbering. Examples:
|
Standard Input |
Standard Output |
Standard Error |
abc/*defn
ghi*/jkln
mnon |
abcn
sjkln
mnon |
|
abc/*defn
ghijkln
mno*/pqrn
stun |
abcn
n
spqrn
stun |
|
- Not recognize nested comments. Example:
|
Standard Input |
Standard Output |
Standard Error |
|
abc/*def/*ghi*/jkl*/mnon |
abcsjkl*/mnon |
|
- Handle C string literals. In particular, your program should not
consider text of the form (/* ... */) that occurs within a string literal
("...") to be a comment. Examples:
|
Standard Input |
Standard Output |
Standard Error |
|
abc"def/*ghi*/jkl"mnon |
abc"def/*ghi*/jkl"mnon |
|
|
abc/*def"ghi"jkl*/mnon |
abcsmnon |
|
|
abc/*def"ghijkl*/mnon |
abcsmnon |
|
|
abc"def'ghi'jkl"mnon |
abc"def'ghi'jkl"mnon |
|
- Handle C character literals. In particular, your program should not
consider text of the form (/* ... */) that occurs within a character literal
('...') to be a comment. Examples:
|
Standard Input |
Standard Output |
Standard Error |
|
abc'def/*ghi*/jkl'mnon |
abc'def/*ghi*/jkl'mnon |
|
|
abc/*def'ghi'jkl*/mnon |
abcsmnon |
|
|
abc/*def'ghijkl*/mnon |
abcsmnon |
|
|
abc'def"ghi"jkl'mnon |
abc'def"ghi"jkl'mnon |
|
Note that the C compiler would consider some of those examples to be
erroneous. But the C preprocessor would not, and your program should not.
-
Handle escaped characters within string literals. That is, when your
program reads a backslash (\) while processing a string literal, your
program should consider the next character to be an ordinary character that
is devoid of any special meaning. In particular, your program should consider text of
the form ("...\" ...") to be a valid string literal which happens to contain
the double quote character. Examples:
|
Standard Input |
Standard Output |
Standard Error |
|
abc"def\"ghi"jkln |
abc"def\"ghi"jkln |
|
|
abc"def\'ghi"jkln |
abc"def\'ghi"jkln |
|
-
Handle escaped characters within character literals. That is, when your
program reads a backslash (\) while processing a character literal, your
program should consider the next character to be an ordinary character that is
devoid of any special meaning. In particular, your program should
consider text of the form ('...\' ...') to be a valid character literal which
happens to contain the quote character. Examples:
|
Standard Input |
Standard Output |
Standard Error |
|
abc'def\'ghi'jkln |
abc'def\'ghi'jkln |
|
|
abc'def\"ghi'jkln |
abc'def\"ghi'jkln |
|
Note that the C compiler would consider both of those examples to be
erroneous. But the C preprocessor would not, and your program should not.
-
Allow multi-line C string literals, but generate warning messages when
they occur. Specifically, your program should write the message "Warning: line
X: newline in string literal" when a newline character occurs within a string
literal. "X" should be the number of the line which contains the
offending newline character. Examples:
|
Standard Input |
Standard Output |
Standard Error |
abc"defn
ghi"jkln |
abc"defn
ghi"jkln |
Warning:slines1:snewlinesinsstringsliteraln |
abc"defn
ghin
jkl"mnon |
abc"defn
ghin
jkl"mnon |
Warning:slines1:snewlinesinsstringsliteraln
Warning:slines2:snewlinesinsstringsliteraln |
- Allow multi-line C character literals, but generate warning messages
when they occur. Specifically, your program should write the message
"Warning: line X: newline in character literal" when a newline character
occurs within a character literal. "X" should be the number of the line
which contains the offending newline character. Examples:
|
Standard Input |
Standard Output |
Standard Error |
abc'defn
ghi'jkln |
abc'defn
ghi'jkln |
Warning:slines1:snewlinesinscharactersliteraln |
abc'defn
ghin
jkl'mnon |
abc'defn
ghin
jkl'mnon |
Warning:slines1:snewlinesinscharactersliteraln
Warning:slines2:snewlinesinscharactersliteraln |
Note that the C compiler would consider both of those examples to be
erroneous. But the C preprocessor would not, and your program should not.
- Detect an unterminated string literal. If your program detects
end-of-file before a string literal is terminated, it should write the
message "Error: line X: unterminated string literal" to standard
error. "X" should be the number of the line on which the unterminated
string literal begins. Examples:
|
Standard Input |
Standard Output |
Standard Error |
abc"defn
ghin
jkln |
abc"defn
ghin
jkln |
Warning:slines1:snewlinesinsstringsliteraln
Warning:slines2:snewlinesinsstringsliteraln
Warning:slines3:snewlinesinsstringsliteraln
Error:slines1:sunterminatedsstringsliteraln |
- Detect an unterminated character literal. If your program detects
end-of-file before a character literal is terminated, it should write the
message "Error: line X: unterminated character literal" to standard
error. "X" should be the number of the line on which the unterminated
character literal begins. Examples:
|
Standard Input |
Standard Output |
Standard Error |
abc'defn
ghin
jkln |
abc'defn
ghin
jkln |
Warning:slines1:snewlinesinscharactersliteraln
Warning:slines2:snewlinesinscharactersliteraln
Warning:slines3:snewlinesinscharactersliteraln
Error:slines1:sunterminatedscharactersliteraln |
Note that the C compiler would consider that example to be
erroneous. But the C preprocessor would not, and your program should not.
- Detect an unterminated comment. If your program detects end-of-file before a
comment is terminated, it should write the message "Error: line X: unterminated
comment" to standard
error. "X" should be the number of the line on which the
unterminated comment begins.
|
Standard Input |
Standard Output |
Standard Error |
abc/*defn
ghin |
abcn
n |
Error:slines1:sunterminatedscommentn |
abcdefn
ghi/*n |
abcdefn
ghin |
Error:slines2:sunterminatedscommentn |
abc/*def/ghin
jkln |
abcn
n |
Error:slines1:sunterminatedscommentn |
abc/*def*ghin
jkln |
abcn
n |
Error:slines1:sunterminatedscommentn |
abc/*defn
ghi*n |
abcn
n |
Error:slines1:sunterminatedscommentn |
abc/*defn
ghi/n |
abcn
n |
Error:slines1:sunterminatedscommentn |
Your program should work for standard input lines of any length.
Your program may assume that the backslash-newline character sequence does
not occur in standard input. That is, your program may assume that "logical"
lines are identical to "physical" lines in standard input.
Design
We strongly suggest that you design your program as a deterministic finite state automaton
(FSA),
as described in lectures.
Your program should not consist of one large main function. Instead your
program should consist of multiple small functions, each of which performs a
single well-defined task. For example, you might create one function to
implement each state of your FSA.
Generally, all communication of data into and out of a function should occur
via the function's parameters and its return value, and not via global
variables. You should use ordinary "call-by-value" parameters to communicate
data from a calling function to your function. You should use your function's
return value to communicate data from your function back to its calling
function. You should use "call-by-reference" parameters to communicate
additional data from your function back to its calling function, or as
bi-directional channels of communication. However, call-by-reference involves
using pointer variables, which we have not discussed yet. So for this assignment
you may use global variables instead of call-by-reference parameters. (But we
encourage you to use call-by-reference parameters.)
In short, you should use ordinary call-by-value function parameters and
function return values in your program as appropriate. But you need not use
call-by-reference parameters; instead you may use global variables. In
subsequent assignments you should use global variables sparingly, and only when
there is no reasonable alternative.
Generally, a (large) C program should consist of of multiple source code
files. For this assignment, you need not split your source code into
multiple files. Instead you may place all source code in a single source code
file. Subsequent assignments will ask you
to write programs consisting of
multiple source code files.
Please limit line lengths in your source code to 78 characters. Doing so allows us
to print your work in two columns, thus saving paper.
We suggest that your program read characters from standard input using the standard C getchar() function.
Logistics
You should create your program on hats using bash, xemacs and gcc.
Step 1: Create Source Code
Use xemacs to create source code in a file named decomment.c.
Step 2: Preprocess, Compile, Assemble, and Link
Use the gcc command with the -Wall, -ansi, and -pedantic options to
preprocess, compile, assemble, and link your program.
Step 3: Execute
Execute your program multiple times on various input files that test all logical
paths through your code.
We have provided several files in hats directory /u/cos217/Assignment1. You
should copy those files to your project directory, and use them to help you test
your decomment program.
- sampledecomment is an executable version of a correct assignment
solution. Your
program should write exactly (character for character) the same data to
standard output and standard error as does sampledecomment. You should test
your program using commands similar to these:
sampledecomment < somefile.c > output1 2> errors1
decomment < somefile.c > output2 2> errors2
diff output1 output2
diff errors1 errors1
rm output1 errors1 output2 errors2
The UNIX diff command finds differences between two given files. The
executions of the diff command shown above should produce no output. If
the command "diff output1 output2" produces output, then sampledecomment and your program have written different characters to standard output.
Similarly, if the command "diff errors1 errors2" produces output, then sampledecomment and your program have written different characters to standard
error.
- Several .txt files (that is, files whose names end with ".txt") can serve as
input files to your program.
- grade1 and grade1diff are bash scripts that automate the
testing process. Comments at the beginning of those files describe how to use
them. After copying the scripts to your project directory, you may need to
execute the commands "chmod 700 grade1" and "chmod 700 grade1diff" to make
them executable.
Step 4: Create a readme File
Use xemacs to create a text file named "readme" that contains:
- Your name and the assignment number.
- A description of whatever help (if any) you received from others while
doing the assignment, and the names of any individuals with whom you
collaborated, as prescribed by the course Policies web page.
- (Optionally) An indication of how much time you spent doing the assignment.
- (Optionally) Your assessment of the assignment: Did it help you to learn? What
did it help you to learn? Do you have any suggestions for improvement?
Etc.
- (Optionally) Any information that
will help us to grade your work in the most favorable light. In particular
you should describe all known bugs.
Descriptions of your code should not be in the readme
file. Instead they should be integrated into your code as comments.
Step 5: Submit
Submit your work electronically on hats via the command:
/u/cos217/bin/i686/submit 1 decomment.c readme
If the directory /u/cos217/bin/i686 is in your PATH environment variable,
then you can abbreviate that command as:
submit 1 decomment.c readme
If you are using the bash shell and have copied files .bashrc and .bash_profile
from the /u/cos217 directory to your HOME directory, then directory
/u/cos217/bin/i686 indeed is in your PATH environment variable. You can examine your PATH environment
variable by executing the command "printenv PATH".
Grading
We will grade your work on functionality and design. We will consider
understandability to be an important aspect of good design. See the next section
for guidelines concerning program understandability. To encourage good coding
practices, we will build using "gcc -Wall -ansi -pedantic" and take off points
based on warning messages.
Program Understandability
An understandable program:
- Uses a consistent and appropriate indentation scheme. All
statements that are nested within a compound, if, switch, while, for, or do...while statement should be indented.
Please use spaces instead of tabs to indent. Please use at least a 3-space indentation scheme. Note that the xemacs editor can apply
a consistent indentation scheme to your program automatically.
- Contains descriptive identifiers. The names of variables, constants,
structures, types, and functions should indicate their purpose. Remember that C can
handle identifiers of any length, and the first 31 characters are significant.
We encourage you to prefix each variable name with characters that indicate its
type. For example, the prefix "c" might indicate that the variable is
of type char, "i" might indicate int, "pc" might mean pointer to char, "ui" might mean unsigned int, etc.
- Contains carefully worded comments. Each source code file should
begin with a comment that includes your name, the number of the assignment,
and the name of the file. Each function -- especially the main function --
should begin with a comment that describes what the computer does when it
executes that function. It should do so by explicitly referring to the
function's parameters and return value. The comment also should state what, if
anything, the computer reads from standard input or any other stream, and
what, if anything, the computer writes to standard output, standard error, or
any other stream while executing the function. Finally, the function's comment should
state which global variables the computer uses and affects when executing the
function.