Princeton University
COS 217:  Introduction to Programming Systems

Assignment 1:  String Functions

Purpose

The purpose of this assignment is to help you learn/review (1) the fundamentals of the C programming language, (2) how to create and use interfaces and implementations in C, and (3) how to use the GNU/UNIX programming tools, especially bash, Emacs, and gcc.

Background

As you know, the C programming environment contains a standard library.  The facilities provided in the standard library are declared in header files.  One of those header files is string.h; it contains the declarations of "string functions," that is, functions that perform operations on character strings.  Chapter 13 of the Harbison and Steele textbook describes the string functions thoroughly.  The string functions are used heavily in programming systems; certainly any editor, compiler, assembler, or operating system created with the C programming language would use them.

Your Task

Your task in this assignment is to use C to create your own versions of many of the standard string functions.  Specifically, you should create these functions: mystrlen, mystrcpy, mystrncpy, mystrcat, mystrncat, mystrcmp, mystrncmp, mystrchr, mystrrchr, mystrspn, mystrcspn, mystrpbrk, and mystrstr.  Each of your functions should have the same behavior as the corresponding ISO C standard string function.  For example, your mystrlen function should have the same behavior as the ISO C standard strlen function. 

Your functions should not call any of the standard string functions.  In the context of this assignment, you should pretend that the standard string functions do not exist.  However your functions may call each other as appropriate.

You should pay special attention to boundary cases.  In particular, make sure that your functions work when given empty strings as arguments.  For example, make sure that the function call mystrlen("") returns 0.

You should create your functions on arizona using the bash shell, Emacs, and gcc.

Step 1: Create Source Code

Use Emacs to create source code.  First create a C header file named mystring.h containing the interface to your functions.  The interface should consist of a set of function declarations.  Make sure you encapsulate the function declarations with "#ifndef ... #define ... #endif" macros to prevent double inclusion into a compilation unit.  Then create a file named mystring.c containing the implementation of your functions, that is, a set of function definitions.  It should "#include" the interface file to insure that each function definition is consistent with its declaration.

You may use array notation to define your functions.  For example, this is an acceptable version of the mystrlen function:

size_t mystrlen(const char pcString[])
{
   size_t uiLength = 0;
   while (pcString[uiLength] != '\0')
      ++uiLength;
   return uiLength;
}
(As you may know, the type "size_t" is defined in the standard header file stddef.h. It is a system-dependent integral type that is large enough to hold the length of any string. It is typically defined to be identical to either "unsigned int" or "unsigned long."  Several of the standard string functions use type size_t, and so several of your functions should use it too.)

However we encourage you to use pointer notation instead of array notation to define your functions; pointer notation is used heavily throughout the course, and it would be wise to use this assignment to insure that you are comfortable with it.  For example, we encourage you to define your mystrlen function similar to this:

size_t mystrlen(const char *pcString)
{
   size_t uiLength = 0;
   while (*(pcString + uiLength) != '\0')
      ++uiLength;
   return uiLength;
}
or, more efficiently, like this:
size_t mystrlen(const char *pcString)
{
   const char *pcStringEnd = pcString;
   while (*pcStringEnd != '\0')
      ++pcStringEnd;
   return pcStringEnd - pcString;
}
Relying on the fact that the end-of-string character and FALSE have the same representation in C, you may define your mystrlen function even more efficiently in this idiomatic way:
size_t mystrlen(const char *pcString)
{
   const char *pcStringEnd = pcString;
   while (*pcStringEnd)
      ++pcStringEnd;
   return pcStringEnd - pcString;
}

But we do not encourage the use of idioms that adversely affect understandability.

Step 2: Compile, Assemble, and Link

Code that you can use to test your mystring functions is available in the file /u/cs217/Assignment1/testmystring.c.  Use the gcc command to compile, assemble, and link mystring.c and testmystring.c.  Repeat step 1 if necessary.

Step 3: Execute

Execute the main function in testmystring.c to test your mystring functions. Make sure that your functions pass all of the tests defined in testmystring,c. Create and execute additional tests if necessary.  Repeat steps 1 and 2 if necessary.

Step 4: Submit

Create a "readme" text file.  Your readme file should contain information that will help us to grade your work in the most favorable light.  For example, the file should contain thorough descriptions of any known bugs. Submit your work electronically by executing the command "/u/cs217/bin/submit 1 mystring.h mystring.c readme" (without the quotes) on arizona.

Grading

We will grade your work on correctness and understandability.  Guidelines concerning program understandability are listed below.  To encourage good coding practices, you will lose points for any warning messages generated by gcc during the compilation of your work.

Program Understandability

An understandable program:

(1) Uses a consistent and appropriate indentation scheme.  All statements that are nested within a compound, if, switch, while, for, or do...while statement should be indented.  Most programmers use either a 3- or 4-space indentation scheme. Note that the Emacs editor can automatically apply a consistent indentation scheme to your program.

(2) Uses descriptive identifiers.  The names of variables, constants, structures, types, and functions should indicate their purpose.  Remember: C can handle identifiers of any length, and the first 31 characters are significant.  We encourage you to prefix each variable name with characters that indicate its type.  For example, the prefix "c" might indicate that the variable is of type "char," "i" might indicate "int," "pc" might mean "pointer to char," "ui" might mean "unsigned int," etc.

(3) Contains carefully worded comments.  You should begin each program file with a comment that includes your name, the number of the assignment, and the name of the file.  Each function should begin with a comment that describes what the computer does when it executes that function.  That comment should explicitly state what (if anything) the computer reads from stdin (or any other stream), and what (if anything) the computer writes to stdout (or any other stream).  The function's comment should also describe what the computer does when it executes that function by explicitly referring to the function's parameters and return value.  The comment should appear in both the .h file (for the sake of the users of the function) and the .c file (for the sake of the maintainers of the function).

For example, here is an appropriate way to comment a mystrlen function:

In file mystring.h:

...
size_t mystrlen(const char pcString[]);
/* Return the length of string pcString.  */
...
In file mystring.c:
...
size_t mystrlen(const char pcString[])

/* Return the length of string pcString.  */

{
   size_t uiLength = 0;
   while (pcString[iuLength] != '\0')
      ++uiLength;
   return uiLength;
}
...
Note that the comment explicitly states what the function returns, and explicitly refers to the function's parameter (pcString).