Princeton University
COS 217: Introduction to Programming Systems

Assignment 2: A String Module

Purpose

The purpose of this assignment is to help you learn/review (1) arrays and pointers in the C programming language, (2) how to create and use stateless modules in C, (3) the "design by contract" style of programming, and (4) how to use the GNU/UNIX programming tools, especially bash, xemacs, gcc, and gdb.

Background

As you know, the C programming environment contains a standard library. The facilities provided in the standard library are declared in header files. One of those header files is string.h; it contains the declarations of "string functions," that is, functions that perform operations on character strings. Appendix D of the King textbook, Appendix B3 of the Kernighan and Ritchie textbook, Chapter 13 of the Harbison and Steele textbook, and the UNIX "man" pages describe the string functions. The string functions are used heavily in programming systems; certainly any editor, compiler, assembler, or operating system created with the C programming language would use them.

Your Task

Your task in this assignment is to use C to create a MyString module that contains versions of the most commonly used standard string functions. Specifically, design your MyString module so it contains these functions, each of which behaves the same as a corresponding standard C function:

   MyString Function

   Standard C Function

   MyString_length()    strlen()
   MyString_copy()    strcpy()
   MyString_ncopy()    strncpy()
   MyString_concat()    strcat()
   MyString_nconcat()    strncat()
   MyString_compare()    strcmp()
   MyString_ncompare()    strncmp()
   MyString_search()    strstr()

The Details

Use "design by contract." Design each function comment so it describes that function's "checked runtime errors." Design each function definition so it calls the assert() macro to enforce those checked runtime errors. (In that way your MyString functions should differ from the standard string functions.) Specifically, design each function definition so it calls assert() to make sure that none of its pointer/array formal parameters is NULL.

Do not add any other calls to assert() to your code. But consider whether it would be possible to do so. In particular, provide answers to these two questions in your readme file:

  1. Is it possible for MyString_copy(), MyString_ncopy(), MyString_concat(), or MyString_nconcat() to call assert() to verify that the specified destination memory area is large enough? Explain.
  2. Is it possible for MyString_ncopy(), MyString_nconcat(), or MyString_ncompare() to call assert() to verify that the specified length parameter is non-negative? Explain.

Place the MyString module's interface in a file named mystring.h. You may use either array or pointer notation in the interface.

Create two implementations of your MyString module. Place the first implementation in a file named mystringa.c. Design the function definitions in mystringa.c such that they use array notation, and not pointer notation. For example, in mystringa.c you might define the MyString_length() function like this:

size_t MyString_length(const char pcStr[])
{
   size_t uiLength = 0U;
   assert(pcStr != NULL);
   while (pcStr[uiLength] != '\0')
      uiLength++;
   return uiLength;
}

(The type size_t is defined in the standard header file stddef.h. It is a system-dependent unsigned integral type that is large enough to hold the length of any string. Typically it is defined to be identical to either "unsigned int" or "unsigned long." Several of the standard string functions use type size_t, and so several of your functions should use it too.)

Place the second implementation in a file named mystringp.c. Design the function definitions in mystringp.c such that they use pointer notation, and not array notation. For example, in mystringp.c you might define the MyString_length() function like this:

size_t MyString_length(const char *pcStr)
{
   size_t uiLength = 0U;
   assert(pcStr != NULL);
   while (*(pcStr + uiLength) != '\0')
      uiLength++;
   return uiLength;
}

We encourage you to define the functions in mystringp.c more efficiently, by moving beyond a simple translation of "a[i]" to "*(a + i)". For example:

size_t MyString_length(const char *pcStr)
{
   const char *pcStrEnd = pcStr;
   assert(pcStr != NULL);
   while (*pcStrEnd != '\0')
      pcStrEnd++;
   return (size_t)(pcStrEnd - pcStr);
}

Design your MyString functions such that they do not call any of the standard string functions. In the context of this assignment, pretend that the standard string functions do not exist. However your functions may call each other, and you may define additional (non-interface) functions.

Pay special attention to boundary cases. In particular, make sure that your functions work when given empty strings as arguments. For example, make sure that the function call MyString_length("") returns 0.

Beware of type mismatches. In particular, beware of the difference between type size_t and type int: a variable of type size_t can store larger numbers than a variable of type int can. Also beware of type mismatches related to the use of the const keyword.

In your assignment solution you may use any of the definitions of the MyString_length() function given in this assignment statement.

Using Idioms

C programmers sometimes use idioms that rely on the fact that the end-of-string character, the NULL pointer, and FALSE have the same representation. You may use those idioms. For example, you may define your MyString_length() function like this:

size_t MyString_length(const char pcStr[])
{
   size_t uiLength = 0U;
   assert(pcStr); /* Works because NULL and FALSE are identical. */
   while (pcStr[uiLength]) /* Works because end-of-string and FALSE are identical. */
      uiLength++;
   return uiLength;
}

or like this:

size_t MyString_length(const char *pcStr)
{
   const char *pcStrEnd = pcStr;
   assert(pcStr); /* Works because NULL and FALSE are identical. */
   while (*pcStrEnd) /* Works because end-of-string and FALSE are identical. */
      pcStrEnd++;
   return (size_t)(pcStrEnd - pcStr);
}

But you are not required to use those idioms. In fact, we recommend that you avoid the use of idioms that adversely affect understandability.

Logistics

Create your MyString module on hats using the bash shell, xemacs, gcc, and gdb.

Limit line lengths in your source code to 78 characters. Doing so allows us to print your work in two columns, thus saving paper.

Code that you can use to test your MyString module is available in the file /u/cos217/Assignment2/testmystring.c.

Create a "readme" text file that contains:

Submit your work electronically on hats via the command:

/u/cos217/bin/i686/submit 2 mystring.h mystringa.c mystringp.c readme

Grading

We will grade your work on correctness and design. We will consider understandability to be an important aspect of design. See the next section of this document for guidelines concerning program understandability. We also will consider absence of gross inefficiencies to be an important aspect of design. For example, a function is grossly inefficient if it traverses a string multiple times when a single traversal would suffice. To encourage good coding practices, we will compile using "gcc -Wall -ansi -pedantic" and take off points based on warning messages.

Program Understandability

An understandable program:

For example, this is an appropriate way to comment the MyString_length() function:

In file mystring.h:

...
size_t MyString_length(const char pcStr[]);
/* Return the length of string pcStr.
   It is a checked runtime error for pcStr to be NULL. */
...
In file mystringp.c:
...
size_t MyString_length(const char *pcStr)

/* Return the length of string pcStr.
   It is a checked runtime error for pcStr to be NULL. */

{
   const char *pcStrEnd = pcStr;
   assert(pcStr != NULL);
   while (*pcStrEnd != '\0')
      pcStrEnd++;
   return (size_t)(pcStrEnd - pcStr);
}
...
Note that the comment explicitly states what the function returns, explicitly refers to the function's parameter (pcStr), and explicitly describes the function's checked runtime error.

A Hint: "Const" and the MyString_search Function

The use of the "const" keyword within the MyString_search() function is tricky, as this question/answer sequence indicates.

Question

According to the man pages, the formal parameters of the strstr() function are of type const char*. That implies that the formal parameters of MyString_search() also should be of type const char*. Why aren't they of type char*?

Answer

Suppose you were to define your MyString_search() function like this:

char *MyString_search(char *pc1, char *pc2) { ... }

Further suppose the client then calls MyString_search() like this:

const char *pcString1 = "hello";
const char *pcString2 = "lo";
...
... MyString_search(pcString1, pcString2) ...
...

(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pcString1 is of type const char* and that pc1 is of type char*, would generate a warning on the function call. Thus pc1 (and pc2) should be of type const char*.

Question

According to the man pages, the return type of strstr() is char*. That implies that the return type of MyString_search() also should be of type char*. Why isn't the return type const char*?

Answer

Suppose you were to define MyString_search() like this:

const char *MyString_search(const char *pc1, const char *pc2) { ... }

Further suppose the client then calls MyString_search() like this:

char *pcString1 = "hello";
char *pcString2 = "lo";
char *pc;
...
pc = MyString_search(pcString1, pcString2);
...

(Note that's a perfectly reasonable way to call the function.) In that case the compiler, noting that pc is of type char* and that the value returned by MyString_search() is of type const char*, would generate a warning on the assignment statement. Thus the return type of MyString_search() should be char* and should not be const char*.

Question

Within the definition of MyString_search(), I decided to define a local variable named pc that "points into" the first of the two given strings, as indicated by this code:

char *MyString_search(const char *pc1, const char *pc2)
{
   ...
   pc = pc1;
   ...
   /* Increment pc so it points to the appropriate character. */
   ...
   return pc;
}

If I define pc to be of type char*, then the assignment statement generates a warning. If I define pc to be of type const char*, then the return statement generates a warning. How can I resolve that problem?

Answer

Unfortunately, C provides no elegant solution. We recommend that you define pc to be of type const char* so the assignment statement is warningless. Then use a cast operator in the return statement:

return (char*)pc;

to explicitly inform the compiler to not generate a warning. C programmers refer to that solution as "casting away the constness" of the variable. Sadly, that inelegant technique often is unavoidable.