Princeton University
COS 217: Introduction to Programming Systems

Assignment 2: A String Module and Client


Purpose

The purpose of this assignment is to help you learn (1) arrays and pointers in the C programming language, (2) how to create and use stateless modules in C, (3) the design by contract style of programming, and (4) how to use the GNU/Unix programming tools, especially bash, emacs, gcc217, and gdb.


Rules

Composing the two Str_search functions (as described below) is the "extra challenge" part of this assignment. While doing the "extra challenge" part of the assignment, you are bound to observe the course policies regarding assignment conduct as given in the course Policies web page, plus one additional constraint: you may not use any "human" sources of information. That is, you may not consult with the course's staff members, the lab teaching assistants, other current students via Piazza, or any other people while working on the "extra challenge" part of an assignment, except for clarification of requirements.

The "extra challenge" part is worth 16 percent of this assignment. So if you don't do any of the "extra challenge" part and all other parts of your assignment solution are perfect, then your grade for the assignment will be 84 percent.


Part 1: A Str Module

Background

As you know, the C programming environment contains a standard library. The facilities provided in the standard library are declared in interface (alias header, alias .h) files. One of those interface files is string.h; it contains the declarations of "string functions," that is, functions that perform operations on character strings. Chapter 13 and Appendix D of the King textbook and the Linux man pages describe the string functions.


The Task

Your task is to use C to compose a Str module that contains versions of the most commonly used standard string functions. Specifically, design your Str module so it contains these functions, each of which behaves the same as a corresponding standard C function:

Str Function Standard C Function
Str_getLength strlen
Str_copy strcpy
Str_concat strcat
Str_compare strcmp
Str_search strstr

The Details

Design each function definition so it calls the assert macro to validate the function's parameters. Specifically, design each function definition to assert that each parameter is not NULL.

Don't add any other calls of assert to your code. But consider whether it would be possible to do so. In particular, provide an answer to this question in your readme file:

Is it possible for Str_copy or Str_concat to call assert to verify that the destination memory area specified by the caller is large enough? Explain.

Create two implementations of your Str module. Place the first implementation in a file named stra.c. Design the function definitions in stra.c such that they use array notation as much as possible, and traverse each given string using an index relative to the beginning of the string. For example, in stra.c you might define the Str_getLength function like this:

size_t Str_getLength(const char pcSrc[])
{
   size_t uLength = 0;
   assert(pcSrc != NULL);
   while (pcSrc[uLength] != '\0')
      uLength++;
   return uLength;
}

Note that the type of uLength is size_t. The type size_t is defined in the standard interface file stddef.h. It is a system-dependent unsigned integer type that is large enough to hold the length of any string. Typically it is defined to be identical to either unsigned int or unsigned long. On FC010, it is identical to unsigned long. Several of the standard string functions use type size_t, and so several of your functions must use it too.

Place the second implementation in a file named strp.c. Design the function definitions in strp.c such that they use pointer notation as much as possible, and traverse each given string using an incremented pointer — not using an index relative to the beginning of the string. For example, in strp.c you might define the Str_getLength function like this:

size_t Str_getLength(const char *pcSrc)
{
   const char *pcEnd;
   assert(pcSrc != NULL);
   pcEnd = pcSrc;
   while (*pcEnd != '\0')
      pcEnd++;
   return (size_t)(pcEnd - pcSrc);
}

It is unacceptable to create the strp.c implementation from the stra.c implementation simply by replacing each occurrence of a[i] with *(a+i). For example, in strp.c this definition of Str_getLength is unacceptable:

size_t Str_getLength(const char *pcSrc)  /* unacceptable */
{                                        /* unacceptable */
   size_t uLength = 0;                   /* unacceptable */
   assert(pcSrc != NULL);                /* unacceptable */
   while (*(pcSrc + uLength) != '\0')    /* unacceptable */
      uLength++;                         /* unacceptable */
   return uLength;                       /* unacceptable */
}                                        /* unacceptable */

Note that the function traverses the given string by incrementing the integer uLength, not by incrementing a pointer. Also note that the function effectively uses uLength as an index relative to the beginning of the string pcStr.

Place the Str module's interface in a file named str.h. You may use either array or pointer notation in the interface. The two notations are equivalent to the compiler, so both implementations match the one interface. Use the #ifndef...#define...#endif construct to protect your str.h file against accidental multiple inclusion into the same .c file.

This assignment does not focus on efficiency. Nevertheless, your functions must not be grossly inefficient. For example, a function is grossly inefficient if it traverses a (potentially long) string multiple times when a single traversal would suffice.

Design your Str functions so they do not call any of the standard string functions. In the context of this assignment, pretend that the standard string functions do not exist. However your functions may call each other, and you may define additional (non-interface) functions.

Beware of type mismatches. In particular, beware of the difference between type size_t and type int: a variable of type size_t can store larger integers than a variable of type int can. Your functions must (in principle) be able to handle strings whose lengths exceed the capacity of type int. Also beware of type mismatches related to the use of the const keyword. Using the splint tool, as described below, will help you detect type mismatches.

In your assignment solution you may use any of the definitions of the Str_getLength function given in this assignment specification.

A client that you can use to test your Str module is available on FC010 in the file /u/cos217/Assignment2/teststr.c.


Part 2: A Str Client

Background

As you know, the C preprocessor performs three jobs:

For Assignment 1 you composed a program that does the second of those jobs. For this assignment you must compose a program that mimics (a small subset of) the third of those jobs. More specifically...

One of the preprocessor directives that the preprocessor handles is #define. In its simplest form, a #define preprocessor directive maps a macro to a replacement string. The preprocessor replaces the macro with the replacement string in all code that follows the #define directive.

For example, this #define directive:

#define MAX 1000

maps the macro MAX to the replacement string 1000. When the preprocessor subsequently encounters an occurrence of the macro MAX, the preprocessor replaces MAX with 1000. (Precepts and our King book describe the #define directive in more detail.)


The Task

Your job in this part of the assignment is to compose a C program named replace that mimics the preprocessor's handling of simple #define directives. More specifically, you are given a partial program named replace.c in the /u/cos217/Assignment2 directory on FC010. Your job is to copy the file to your working directory, and edit the file to make it complete.

Doing so, in part, involves completing the definition of the replaceAndWrite function. A comment in replace.c describes what that function must do. Define that function so it calls the assert macro to make sure that each parameter is not NULL. Don't be concerned about the context of the replacements; for example, define your function such that it performs replacements even if the text to be replaced occurs within quotes, comments, etc.

replace.c must be a client of your Str module. That is, replace.c must call functions declared in your str.h file and defined in either your stra.c file or your strp.c file in meaningful ways, whenever reasonably possible, to perform its task. It's fine for replace.c to do low-level string manipulation, of the kind found within the definitions of your Str functions, in places where it would be awkward to call your Str functions. In such low-level string-manipulation code you may use either array notation or pointer notation.

If and only if your Str functions are not working, then design your replace.c so it calls the standard C string-handling functions declared in string.h instead of your Str functions. In that case don't forget to change #include "str.h" to #include <string.h> near the beginning of the replace.c file.

The FC010 /u/cos217/Assignment2 directory contains an executable binary file named samplereplace. Your replace program must have the same behavior as samplereplace. That is, when given the same input as samplereplace, your program must write exactly the same data to stdout and stderr as samplereplace does.

Test your replace program thoroughly by running it on several input files with a variety of "from string" and "to string" command-line arguments. The /u/cos217/Assignment2 directory contains some .txt files that you can use as input to replace. That directory also contains scripts named testreplace and testreplacediff that automate the testing process.


Logistics

Do your work on the FC010 cluster using bash, emacs, gcc217, and gdb.

Critique your code using the splint tool, as described in the next section of this document. Each time splint generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme file.

Similarly, critique your code using the critTer tool, as described in the next section of this document. Each time critTer generates a warning on your code, you must either (1) edit your code to eliminate the warning, or (2) explain your disagreement with the warning in your readme file.

Create a readme file by copying the file /u/cos217/Assignment2/readme to your project directory, and editing the copy by replacing each area marked <Complete this section.> as appropriate.

One of the sections of the readme file requires you to list the authorized sources of information that you used to complete the assignment. Another section requires you to list the unauthorized sources of information that you used to complete the assignment. Your grader will not grade your submission unless you have completed those sections. To complete the "authorized sources" section of your readme file, copy the list of authorized sources given in the "Policies" web page to that section, and edit it as appropriate.

Use the gcc217 command to preprocess, compile, assemble, and link your programs. Perform each step individually, and examine the intermediate results to the extent possible. Tip: It would be fine to use "shortcut" gcc217 commands during development, and perform the four steps individually only once, immediately before you submit your work.

Submit all source code files, the files that gcc217 generated from them, and your readme file electronically on FC010 using these commands:

submit 2 readme
submit 2 str.h
submit 2 stra.c stra.i stra.s stra.o
submit 2 strp.c strp.i strp.s strp.o
submit 2 teststr.i teststr.s teststr.o
submit 2 teststra teststrp
submit 2 replace.c replace.i replace.s replace.o
submit 2 replace

where:


Checking Your Code with splint and critTer

splint is a high-powered code analysis tool developed by researchers at the University of Virginia. critTer (critique from the terminal) is a more COS 217-specific code analysis tool. The first version of critTer was developed by Erin Rosenbaum '11; the current version was developed by Alice Kroutikova '15. (Erin and Alice are former COS 217 students.) splint and critTer are available on the FC010 cluster.

Execute splint and critTer after your programs build cleanly. splint and critTer write a list of stylistic flaws that they find in your source code. Some of the warnings may be a little cryptic; feel free to ask your preceptor about them as the need arises.

Assuming that you've copied the .splintrc file from the /u/cos217 directory to your HOME directory (as described in the document entitled A Minimal COS 217 Computing Environment from the first precept), using splint is easy. At the bash prompt, issue this command:

splint all-.c-files-comprising-your-program

For example, issue these commands to use splint for this assignment:

splint teststr.c stra.c
splint teststr.c strp.c
splint replace.c stra.c
splint replace.c strp.c

Using critTer is even easier. At the bash prompt, issue this command:

critTer any-.c-file

For example, issue these commands to use critTer for this assignment:

critTer stra.c
critTer strp.c
critTer replace.c

Think of splint (with the given .splintrc file) and critTer as defining coding standards for the COS 217 course. Critique your code using those tools, and edit your code to eliminate all splint and critTer warnings before you submit.

splint and critTer are aggressive; some of the warnings that they generate might not indicate true stylistic errors in your code. If you disagree with a splint or critTer warning, then copy the warning to your readme file, and explain your disagreement with it in your readme file. Also feel free to discuss the matter with your preceptor before you submit.

If you would like to learn more about splint, see www.splint.org.


Grading

Minimal requirement to receive credit for the stra.c and strp.c implementations:

Minimal requirement to receive credit for the replace.c client:

We will grade your work on quality from the user's and programmer's points of view. From the user's point of view, a program has quality if it behaves as it must. The correct behavior of the Str module is defined by the previous sections of this assignment specification, and by the C90 specifications of the corresponding string.h functions. The correct behavior of the replace program is defined by the previous sections of this assignment specification, and by the given samplereplace program.

From the programmer's point of view, a program has quality if it is well styled and thereby easy to maintain. In part, good style is defined by the splint and critTer tools as noted above, and by the rules given in The Practice of Programming (Kernighan and Pike) as summarized by the Rules of Programming Style document.

The more course-specific style rules listed in the previous assignment specification also apply, as do these:

To encourage good coding practices, we will deduct points if gcc217 generates warning messages.


Note: Using Idioms

C programmers sometimes use idioms that rely on the fact that the null character ('\0'), the NULL address, the integer 0, and the logical concept FALSE have the same representation. You may use those idioms. For example, you may define Str_getLength like this:

size_t Str_getLength(const char pcSrc[])
{
   size_t uLength = 0;
   assert((int)pcSrc); /* NULL address, 0, and FALSE are identical. */
   while (pcSrc[uLength]) /* null character and FALSE are identical. */
      uLength++;
   return uLength;
}

or like this:

size_t Str_getLength(const char *pcSrc)
{
   const char *pcEnd;
   assert((int)pcSrc); /* NULL address, 0, and FALSE are identical. */
   pcEnd = pcSrc;
   while (*pcEnd) /* null character and FALSE are identical. */
      pcEnd++;
   return (size_t)(pcEnd - pcSrc);
}

(Note the use of the cast operator within the calls of assert to suppress splint warnings.)

But you are not required to use those idioms. In fact, we recommend that you avoid the use of idioms that adversely affect understandability.


Note: The const Keyword and Str_search

The use of the const keyword within Str_search is tricky, as this question-and-answer sequence indicates.

Question

According to the man pages, the parameters of strstr are of type const char*. That implies that the parameters of Str_search also must be of type const char*. Why are they not of type char*?

Answer

Suppose you were to define Str_search like this:

char *Str_search(char *pcHaystack, char *pcNeedle)
{
   ...
}

Further suppose the client then calls Str_search like this:

const char *pcBig = "hello";
const char *pcSmall = "lo";
 ...
 ... Str_search(pcBig, pcSmall) ...
 ...

(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pcBig is of type const char* and that pcHaystack is of type char*, would generate a warning on the function call. Thus pcHaystack (and pcNeedle) must be of type const char*.

Question

According to the man pages, the return type of strstr is char*. That implies that the return type of Str_search also must be of type char*. Why is the return type not const char*?

Answer

Suppose you were to define Str_search like this:

const char *Str_search(const char *pcHaystack, const char *pcNeedle)
{
   ...
}

Further suppose the client then calls Str_search like this:

char *pcBig = "hello";
char *pcSmall = "lo";
char *pc;
 ...
pc = Str_search(pcBig, pcSmall);
 ...

(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pc is of type char* and that the value returned by Str_search is of type const char*, would generate a warning on the assignment statement. Thus the return type of Str_search must be char* and must not be const char*.

Question

Within the definition of Str_search, I decided to define a local variable named pc that "points into" the first of the two given strings, as indicated by this code:

char *Str_search(const char *pcHaystack, const char *pcNeedle)
{
   ...
   pc = pcHaystack;
   ...
   /* Increment pc so it points to the appropriate character. */
   ...
   return pc;
}

If I define pc to be of type char*, then the assignment statement generates a warning. If I define pc to be of type const char*, then the return statement generates a warning. How can I resolve that problem?

Answer

Unfortunately, C provides no elegant solution. We recommend that you define pc to be of type const char* so the assignment statement is warningless. Then use a cast operator in the return statement:

return (char*)pc;

to explicitly inform the compiler to not generate a warning. C programmers refer to that solution as "casting away the constness" of the variable. Sadly, often that inelegant technique is unavoidable.


Note: Function Return Types

The assignment requires that you use pointer notation as much as possible in your strp.c file. So, for example, you must define Str_copy like this:

char *Str_copy(char *pcDest, const char *pcSrc)
{
   ...
}

The assignment requires that you use array notation as much as possible in your stra.c file. So it would be logical to try to define your Str_copy function something like this:

char[] Str_copy(char pcDest[], const char pcSrc[])
{
   ...
}

Sadly, C disallows char[] as a function return type. So in your stra.c file, you must define Str_copy in this hybrid way:

char *Str_copy(char pcDest[], const char pcSrc[])
{
   ...
}

That same point applies to the definitions of other functions as well.


Note: critTer Warnings

When given the teststr.c file, critTer generates warnings about the use of "magic numbers," the length of some functions, and the length of the file. Don't be concerned about those warnings. You need not address them in your readme file. We judged that using magic numbers and defining long functions in teststr.c was clearer than the alternative, and that splitting teststr.c into multiple files would have complicated the build process unnecessarily.

When given the replace.c file, critTer generates a warning about the use of 3 in the main function, claiming that it is a "magic number." Don't be concerned about that warning. You need not address it in your readme file. There is no name that could be given to that number in that context that would make the program clearer.


This assignment was written
by Robert M. Dondero, Jr.
with input from other faculty members