Princeton University
COS 217:  Introduction to Programming Systems

Assignment 4:  UNIX Commands in SPARC Assembly Language

Purpose

The purpose of this assignment is to help you learn about SPARC architecture and assembly language programming.  It will also give you the opportunity to learn more about the GNU/UNIX programming tools, especially bash, Emacs, gcc, gdb (for assembly language programs), and make.

Background

The UNIX operating system has a command named echo.  Its job is to print its command-line arguments to standard output.  Each argument is separated from the next by a single space.  For example, the command
$ echo one two three    four five
prints this line to standard output:
one two three four five
and the command:
$ echo one "two   three" four   five
prints this line to standard output:
one two   three four five
Another UNIX command is wc (word count). In its simplest form, the wc command reads characters from standard input until end-of-file, and prints to standard output a count of how many lines, words, and characters it has read.  It prints the three counts on the same line, each in a field of width 8.  A "word" is a sequence of characters that is delimited by one or more "whitespace" characters, as defined by the C standard function isspace.  For example, if the file myfile.txt contains these lines:
Learning is a
treasure which
accompanies its
owner everywhere.
-- Chinese proverb

then the command 

$ wc < myfile.txt

prints

       5      12      82

Another commonly used UNIX command is sort.  In its simplest form, it reads lines from standard input, sorts them into ascending (i.e. alphabetical, i.e. lexicographic) order as defined by the strcmp function, and prints them to standard output.  For example, the command

$ sort < myfile.txt

prints these lines to standard output:

-- Chinese proverb
Learning is a
accompanies its
owner everywhere.
treasure which

Your Task

This assignment asks you to create your own versions of the echo, wc, and sort commands in C and in SPARC assembly language, as specified below.

echo1

Create a C program in a file named echo1.c, from which gcc can produce an executable file named echo1.  The echo1 program should have the same functionality as the standard echo command.

echo2

Create a SPARC assembly language program in a file named echo2.S, from which gcc can produce an executable file named echo2.  The echo2 program should have the same functionality as the standard echo command.

wc1

Create a C program in a file named wc1.c, from which gcc can produce an executable file named wc1.  The functionality of the wc1 program should should be a subset of the functionality of the UNIX standard wc program.  The wc1 program should read characters from standard input, and write its three counts to standard output.  It need not process command-line arguments as the UNIX wc program does.

wc2

Create a SPARC assembly language program in a file named wc2.S, from which gcc can produce an executable file named wc2.  The functionality of the wc2 program should should be the same as that of the wc1 program.

Hint:  The program should read one character at a time from standard input.  The C standard library provides several functions/macros that can read a single character:  getchar, getc, fgetc, and scanf (with the "%c" format string).  On arizona getchar and getc are implemented as macros; thus it is impossible to call them from an assembly language program.  On arizona fgetc is implemented as a function, and so it is possible to call it from assembly language.  But fgets demands that you pass stdin as an actual parameter, and that is awkward to implement in assembly language.  Thus, the best option is to call the scanf function. 

sort1

Create a C program in a file named sort1.c, from which gcc can produce an executable file named sort1.  The functionality of the sort1 program should should be a subset of the functionality of the UNIX standard sort program.  The sort1 program should read lines from standard input, sort them in ascending order, and print them to standard output.  It need not process command-line arguments as the UNIX sort program does.

For simplicity, your sort1 program should assume that standard input contains no more than 2048 lines.  It should ignore all lines beyond the 2048th.  Your program should also assume that no line of standard input contains more than 1024 characters.  It need not work properly when a line of standard input contains more than 1024 characters, but it should not crash.  Hint:  You will find the fgets and fputs functions useful.

The program's primary data structure should be an array of pointers.  Each pointer should point to the zero'th character of a null-terminated array of characters (i.e., a string).  Each string should contain the characters of one line of standard input.  The program should dynamically allocate the memory occupied by each string.  The amount of memory allocated for each string should be less than or equal to 1024 bytes, depending upon the length of the string.

The program should be modular.  Each function should be relatively small, and should have a single well-defined purpose.  Please review the course lecture notes for suggestions concerning modularity and the sort1 program.

The program should use these functions to sort the array of strings:

void swap(char *ppcArray[], int i1, int i2)

/* Swap ppcArray[i1] and ppcArray[i2]. */
 
{
   char *pcTemp;
   pcTemp = ppcArray[i1];
   ppcArray[i1] = ppcArray[i2];
   ppcArray[i2] = pcTemp;
}

int partition(char *ppcArray[], int iLeft, int iRight)

/* Divide ppcArray[iLeft...iRight] into two partitions so elements 
   in the first partition are <= elements in the second partition.
   Return the index of the element that marks the partition 
   boundary. */

{
   int i1 = iLeft-1;
   int i2 = iRight;

   while (1)
   {
      while (strcmp(ppcArray[++i1], ppcArray[iRight]) < 0)
         ;
      while (strcmp(ppcArray[iRight], ppcArray[--i2]) < 0)
         if (i2 == iLeft)
            break;
      if (i1 >= i2)
         break;
      swap(ppcArray, i1, i2);
   }
   swap(ppcArray, i1, iRight);
   return i1;
}

void quickSort(char *ppcArray[], int iLeft, int iRight)

/* Sort ppcArray[iLeft...iRight] in ascending order. */

{
   if (iRight > iLeft)
   {
      int iMid = partition(ppcArray, iLeft, iRight);
      quickSort(ppcArray, iLeft, iMid - 1);
      quickSort(ppcArray, iMid + 1, iRight);
   }
}
Those functions implement the quicksort algorithm as presented in Princeton's COS 126 course.  

sort2

Suppose your sort1 program is unacceptably slow.  Further suppose that a performance analysis indicates that sort1 spends most of its time executing the quickSort, partition, and swap functions...

Create a file named sort2.c that is the same as sort1.c except that the definitions of the quickSort, partition, and swap functions have been removed.  Then create a file named sort2.S containing SPARC assembly language versions of quickSort, partition, and swap.  When linked together, sort2.c and sort2.S thus create sort2 -- a (presumably) faster version of your sort1 program.  The functionality of sort2 should be the same as that of sort1.

You should implement swap as a leaf subroutine, as defined in your Paul textbook in Section 7.9.  Note that, from the calling subroutine's point of view, a properly designed leaf subroutine is indistinguishable from a non-leaf subroutine.  You should design your swap subroutine so its caller (partition) does not know that swap is a leaf subroutine.

Suggestion:  Create sort2.S by translating functions from C to assembly language one at a time.  Here is how you might proceed:

  1. Copy file sort1.c to sort2.c.
  2. Remove a function from sort2.c.
  3. Add an assembly language version of that function to sort2.S
  4. Test the hybrid C/assembly language program by executing the commands gcc -o sort2 sort2.c sort2.S and sort2 < someinputfile.
  5. Repeat steps 2 through 4 for each of the three functions.

Logistics

You should develop on arizona using the bash shell.  Use Emacs to create source code.  Use the "make" tool to automate the build process.  Use gdb to debug.

In accord with the purpose of the assignment, you should not use a C compiler to produce your assembly language programs.  Rather you should produce them manually.  

We suggest that you not use the m4 preprocessor.  We suggest that you use the C preprocessor to define symbolic constants.

You need not optimize your assembly language programs, but we encourage you to do so.  In particular, we encourage you to use registers instead of memory whenever possible.  We will give you extra credit -- up to 5% -- if you minimize the use of "nop" instructions and optimize "if," "while," and "for" constructs.

You should create a makefile containing commands to the "make" program indicating how it should build your programs.  The first rule in the makefile should command the "make" program to build all six executable files.  You should also create a readme text file that contains your name and any information that will help us to grade your work in the most favorable light.  In particular, your should describe what (if anything) you have done to optimize your assembly language programs.

Submit your work electronically via the command:

/u/cs217/bin/submit 4 echo1.c echo2.S wc1.c wc2.S sort1.c sort2.c sort2.S makefile readme

Grading

We will grade your work on correctness and understandability/design.  We will place particular emphasis on the comments in your programs, especially the assembly language ones.