Princeton University
COS 217: Introduction to Programming Systems

Assignment 2: A String Module

Purpose

The purpose of this assignment is to help you learn/review (1) arrays and pointers in the C programming language, (2) how to create and use (stateless) modules in C, and (3) how to use the GNU/UNIX programming tools, especially bash, xemacs, gcc, and gdb.

Background

As you know, the C programming environment contains a standard library. The facilities provided in the standard library are declared in header files. One of those header files is string.h; it contains the declarations of "string functions," that is, functions that perform operations on character strings. Appendix D of the King textbook, Chapter 13 of the Harbison and Steele textbook, and the UNIX "man" pages describe the string functions thoroughly. The string functions are used heavily in programming systems; certainly any editor, compiler, assembler, or operating system created with the C programming language would use them.

Your Task

Your task in this assignment is to use C to create a MyString module that contains versions of the most commonly used standard string functions. Specifically, your MyString module should contain six functions, each of which behaves the same as a corresponding standard C function. This table lists the functions that your module should contain, and indicates the corresponding standard C functions:

MyString Function Corresponding Standard C Function
MyString_length strlen
MyString_copy strcpy
MyString_ncopy strncpy
MyString_concat strcat
MyString_compare strcmp
MyString_search strstr

The Details

You should use "design by contract." Each function comment should describe that function's "checked runtime errors." Each function definition should call the assert macro to enforce those checked runtime errors. (In that way your MyString functions should differ from the standard string functions.) You should consider carefully what runtime errors your functions can, and cannot, check.

You should define the MyString module's interface in a file named mystring.h. You may use either array or pointer notation in the interface.

You should define two implementations of your MyString module. The first implementation should reside in a file named mystringa.c. It should contain definitions of your MyString functions that use array notation, and not pointer notation. For example, in mystringa.c you might define the MyString_length function like this:

size_t MyString_length(const char pcStr[])
{
   size_t uiLength = 0U;
   assert(pcStr != NULL);
   while (pcStr[uiLength] != '\0')
      uiLength++;
   return uiLength;
}

(As you may know, the type size_t is defined in the standard header file stddef.h. It is a system-dependent integral type that is large enough to hold the length of any string. It is typically defined to be identical to either "unsigned int" or "unsigned long." Several of the standard string functions use type size_t, and so several of your functions should use it too.)

The second implementation should reside in a file named mystringp.c. It should contain definitions of your MyString functions that use pointer notation, and not array notation. For example, in mystringp.c you might define the MyString_length function like this:

size_t MyString_length(const char *pcStr)
{
   size_t uiLength = 0U;
   assert(pcStr != NULL);
   while (*(pcStr + uiLength) != '\0')
      uiLength++;
   return uiLength;
}

We encourage you to define the functions in mystringp.c more efficiently, by moving beyond a simple translation of "a[i]" to "*(a+i)". For example:

size_t MyString_length(const char *pcStr)
{
   const char *pcStrEnd = pcStr;
   assert(pcStr != NULL);
   while (*pcStrEnd != '\0')
      pcStrEnd++;
   return pcStrEnd - pcStr;
}

Your MyString functions should not call any of the standard string functions. In the context of this assignment, you should pretend that the standard string functions do not exist. However your functions may call each other, and you may define additional (non-interface) functions.

Pay special attention to boundary cases. In particular, make sure that your functions work when given empty strings as arguments. For example, make sure that the function call MyString_length("") returns 0.

You should beware of type mismatches. In particular, beware of the difference between type size_t and type int: a variable of type size_t can store larger numbers than a variable of type int can. Also beware of type mismatches related to the use of the "const" keyword.

In your assignment solution you may use any of the definitions of the MyString_length function given in this assignment statement.

Using Idioms

C programmers sometimes use idioms that rely on the fact that the end-of-string character, the NULL pointer, and FALSE have the same representation. You may use those idioms. For example, you may define your MyString_length functions like this:

size_t MyString_length(const char pcStr[])
{
   size_t uiLength = 0U;
   assert(pcStr);
   while (pcStr[uiLength])
      uiLength++;
   return uiLength;
}
size_t MyString_length(const char *pcStr)
{
   const char *pcStrEnd = pcStr;
   assert(pcStr);
   while (*pcStrEnd)
      pcStrEnd++;
   return pcStrEnd - pcStr;
}

But you are not required to use those idioms. In fact, we recommend that you avoid the use of idioms that adversely affect understandability.

Logistics

Create your MyString module on hats using the bash shell, xemacs, gcc, and gdb.

Limit line lengths in your source code to 78 characters. Doing so allows us to print your work in two columns, thus saving paper.

Code that you can use to test your MyString module is available in the file /u/cos217/Assignment2/testmystring.c.

Create a "readme" text file that contains:

Submit your work electronically on hats via the command:

/u/cos217/bin/i686/submit 2 mystring.h mystringa.c mystringp.c readme

Grading

We will grade your work on correctness and design. We will consider understandability to be an important aspect of good design. See the last section of this document for guidelines concerning program understandability. To encourage good coding practices, we will compile using "gcc -Wall -ansi -pedantic" and take off points based on warning messages during compilation.

Extra Credit

For extra credit (up to 10%), enhance your MyString module so it also contains functions MyString_nconcat and MyString_ncompare.  Those functions should behave the same as the standard strncat and strncmp functions, respectively. The program /u/cos217/Assignment2/testmystringextra.c will help you to test those functions. Extra credit will be awarded only if you implement both array and pointer versions of the functions, and only if the functions pass all of the tests in testmystringextra.c. 

Program Understandability

An understandable program:

(1) Uses a consistent and appropriate indentation scheme. All statements that are nested within a compound, if, switch, while, for, or do...while statement should be indented. Most programmers use either a 3- or 4-space indentation scheme. Note that the xemacs editor can automatically apply a consistent indentation scheme to your program.

(2) Uses descriptive identifiers. The names of variables, constants, structures, types, and functions should indicate their purpose. Remember: C can handle identifiers of any length, and the first 31 characters are significant. We encourage you to prefix each variable name with characters that indicate its type. For example, the prefix "c" might indicate that the variable is of type "char," "i" might indicate "int," "pc" might mean "pointer to char," "ui" might mean "unsigned int," etc.

(3) Contains carefully worded comments. You should begin each program file with a comment that includes your name, the number of the assignment, and the name of the file. Each function should begin with a comment that describes what the computer does when it executes that function. That comment should explicitly state what (if anything) the computer reads from stdin (or any other stream), and what (if anything) the computer writes to stdout (or any other stream). The function's comment should also describe what the computer does when it executes that function by explicitly referring to the function's parameters and return value. The comment should also explicitly state the function's checked runtime errors.  The comment should appear in both the interface (.h) file for the sake of the clients of the function and the implementation (.c) file for the sake of the maintainers of the function.

For example, here is an appropriate way to comment the MyString_length function:

In file mystring.h:

...
size_t MyString_length(const char pcStr[]);
/* Return the length of string pcStr.
   It is a checked runtime error for pcStr to be NULL. */
...
In file mystringp.c:
...
size_t MyString_length(const char pcStr[])

/* Return the length of string pcStr.
   It is a checked runtime error for pcStr to be NULL. */

{
   const char *pcStrEnd = pcStr;
   assert(pcStr != NULL);
   while (*pcStrEnd != '\0')
      pcStrEnd++;
   return pcStrEnd - pcStr;
}
...
Note that the comment explicitly states what the function returns, explicitly refers to the function's parameter (pcStr), and explicitly describes the function's checked runtime error.