Princeton University
COS 217: Introduction to Programming Systems

Assignment 6: The UNIX "sort" Command in SPARC Assembly Language

Purpose

The purpose of this assignment is to help you learn about the advanced aspects of SPARC architecture and assembly language programming. In particular, it will give you experience with using arrays and subroutines in SPARC assembly language. It will also give you further opportunity to learn about the GNU/UNIX programming tools, especially bash, Emacs, gcc, gdb (for assembly language programs), and make.

Background

A commonly used UNIX command is sort. In its simplest form, it reads lines from standard input, sorts them into ascending (i.e. alphabetical, i.e. lexicographic) order as defined by the strcmp function, and prints them to standard output. For example, if the file proverb contains these lines:

Learning is a
treasure which
accompanies its
owner everywhere.
-- Chinese proverb

then the command

--> sort < proverb

prints these lines to standard output:

-- Chinese proverb
Learning is a
accompanies its
owner everywhere.
treasure which

Your Task

This assignment asks you to create your own version of the sort command in C and in SPARC assembly language, as specified below.

sort1

Create a C program in a file named sort1.c, from which gcc can produce an executable file named sort1. The functionality of the sort1 program should be a subset of the functionality of the UNIX standard sort program. The sort1 program should read lines from standard input, sort them in ascending order, and print them to standard output. It need not process command-line arguments as the UNIX sort program does. It may assume that the final line of standard input ends with a newline character.

For simplicity, your sort1 program should assume that standard input contains no more than 2048 lines. If standard input contains more than 2048 lines, then sort1 should ignore all lines beyond the 2048th. 

Your sort1 program should assume that no line of standard input contains more than 1023 characters. (The terminating newline character is included in that count.) It need not work properly when a line of standard input contains more than 1023 characters, but it should not corrupt memory. Hint: You will find the fgets and fputs functions useful.

The program's primary data structure should be an array of pointers. Each pointer should point to the zero'th character of a null-terminated array of characters (i.e., a string). Each string should contain the characters of one line of standard input. The program should dynamically allocate the memory occupied by each string. The amount of memory allocated for each string should be less than or equal to 1024 bytes, depending upon the length of the string.

The program should be modular. Each function should be relatively small, and should have a single well-defined purpose.

The program should use these functions to sort the array of strings:

void swap(char *ppcArray[], int iOne, int iTwo)

/* Swap ppcArray[iOne] and ppcArray[iTwo]. */
 
{
   char *pcTemp;
   pcTemp = ppcArray[iOne];
   ppcArray[iOne] = ppcArray[iTwo];
   ppcArray[iTwo] = pcTemp;
}

int partition(char *ppcArray[], int iLeft, int iRight)

/* Divide ppcArray[iLeft...iRight] into two partitions so elements 
   in the first partition are <= elements in the second partition.
   Return the index of the element that marks the partition 
   boundary. */

{
   int iFirst = iLeft-1;
   int iLast = iRight;

   while (1)
   {
      while (strcmp(ppcArray[++iFirst ], ppcArray[iRight]) < 0)
         ;
      while (strcmp(ppcArray[iRight], ppcArray[--iLast]) < 0)
         if (iLast == iLeft)
            break;
      if (iFirst >= iLast)
         break;
      swap(ppcArray, iFirst, iLast);
   }
   swap(ppcArray, iFirst, iRight);
   return iFirst;
}

void quicksort(char *ppcArray[], int iLeft, int iRight)

/* Sort ppcArray[iLeft...iRight] in ascending order. */

{
   if (iRight > iLeft)
   {
      int iMid = partition(ppcArray, iLeft, iRight);
      quicksort(ppcArray, iLeft, iMid - 1);
      quicksort(ppcArray, iMid + 1, iRight);
   }
}
Those functions implement the quicksort algorithm as presented in Princeton's COS 126 course.

sort2

Suppose your sort1 program is unacceptably slow. Further suppose that a performance analysis indicates that sort1 spends most of its time executing the quicksort, partition, and swap functions...

Create a file named sort2.c that is the same as sort1.c except that the definitions of the quicksort, partition, and swap functions have been removed. Then create a file named sort2.S containing SPARC assembly language versions of quicksort, partition, and swap. When linked together, sort2.c and sort2.S thus create sort2 -- a (presumably) faster version of your sort1 program. The functionality of sort2 should be the same as that of sort1.

You should implement swap as a leaf subroutine, as defined in your Paul textbook in Section 7.9. Note that, from the calling subroutine's point of view, a properly designed leaf subroutine is indistinguishable from a non-leaf subroutine. You should design your swap subroutine so its caller (partition) does not know that swap is a leaf subroutine.

Suggestion: Create sort2.S by translating functions from C to assembly language one at a time. Here is how you might proceed:

  1. Copy file sort1.c to sort2.c.
  2. Remove a function from sort2.c.
  3. Add an assembly language version of that function to sort2.S
  4. Test the hybrid C/assembly language program by executing the commands:
    gcc -Wall -ansi -pedantic -o sort2 sort2.c sort2.S
    sort2 < someinputfile
    .
  5. Repeat steps 2 through 4 for each of the three functions.

Logistics

You should develop on arizona using the bash shell. Use Emacs to create source code. Use the "make" tool to automate the build process. Use gdb to debug.

In accord with the purpose of the assignment, you should not use a C compiler to produce your assembly language functions. Rather you should produce them manually.

We suggest that you not use the m4 preprocessor. We suggest that you use the C preprocessor to define symbolic constants.

You need not optimize your assembly language functions, but we encourage you to do so. In particular, we encourage you to use registers instead of memory whenever possible. We will give you extra credit -- up to 5% -- if you minimize the use of "nop" instructions and optimize "if," "while," and "for" constructs.

You should submit:

Your readme file should contain:

Submit your work electronically via the command:

/u/cs217/bin/submit 6 sort1.c sort2.c sort2.S makefile readme

Grading

We will grade your work on functionality and design. As always, we will consider understandability to be an important aspect of good design. We will place particular emphasis on the comments in your assembly language code. To encourage good coding practices, we will compile your C program using "gcc -Wall -ansi -pedantic" and take off points based on warning messages during compilation.