Princeton University
COS 217: Introduction to Programming Systems

Assignment 5: A SPARC Assembler

Purpose

The purpose of this assignment is to help you understand SPARC assembly language, SPARC machine language, and the assembly process.

Background

A SPARC assembler reads a source code (.s) file containing SPARC assembly language code and writes an object (.o) file containing SPARC machine language. The object file is expressed in the UNIX Executable and Linking Format (ELF). An ELF file consists of sections; the most common are data, bss, and text.

An assembler makes two passes over the source code. During pass1 the assembler creates a symbol table. Each binding of the table relates a label to an offset within a particular section. During pass2 the assembler uses that symbol table to generate the sections that comprise the object code.

The SPARC assembler that underlies the gcc command is named "as".

Your Task

Your task in this assignment is to use the C programming language to create an assembler that can process a substantial subset of SPARC assembly language. More precisely...

You are given code that reads a source code file and creates an in-memory representation of the assembly language program.  Your job is to create code that accepts the in-memory representation, performs two passes over it (as described above), and thereby produces an in-memory representation of the program's data, bss, and text sections. Finally, you are given code that accepts those in-memory sections and writes the object file.

Important note: You must compare the performance of your assembler with that of the "as" assembler whose full pathname is /usr/ccs/bin/as. You can verify that you are using that assembler by typing the command "which as"; the reply should be /usr/ccs/bin/as. If not, you should change the value of your PATH environment variable accordingly.

The Assembly Language Subset

Your assembler should be able to process the subset of SPARC assembly language that is described in A Subset of SPARC Assembly Language for the Assembler Assignment.  Note in particular these features of the subset:

.skip  4 * 100
.word  (18 + 5) * 3
save   %sp, (-92 - 8) & -8, %sp
sethi  %hi(label), %o0
or     %o0, %lo(label), %o0
cmp    %l0, 'a'
add    %l0, 0x20, %l1
set 12345, %l0	
set 0, %l0
set label + 2, %l0
bg label + 4
call label - 8
.word label

The Modules

You are given modules that accomplish the task of reading the assembly language program and creating an in-memory representation:

You also are given modules that comprise the in-memory representation:

Finally, you are given these modules:

You should also use your SymTable ADT from previous assignments. If your SymTable ADT is not working, we will provide you with working symtable.h and symtable.o files at your request. 

You should create modules named Pass1 and Pass2, as described below.

The Pass1 Module

Pass1 should be an abstract object. The Assembler object will give the Pass1 object (1) a SymTable object, (2) three Section objects: one for the data section, one for the bss section, and one for the text section, and (3) the address of a "label sequence number" integer.

The Pass1 module should:

Your Pass1 object may assume that the given assembly language program is semantically valid.

Your Pass1 object should contain no memory leaks. Your Pass1 must free all memory that it has dynamically allocated. More precisely, the given modules take care of freeing memory contained within the Program, Instr, Arg, Section, and SymTable objects. Your Pass1 object must free any other memory that it dynamically allocated. It is likely that there will be no such memory.

Testing the Pass1 Module

You can test your Pass1 module using the given example assembly language programs (01p1data.s, 02p1all.s, and 03p1global.s), the standard UNIX elfdump command, and this procedure:

  1. Temporarily comment-out all calls to Pass2 functions in the assembler.c file.
  2. Run the standard UNIX assembler via the command "as -o one.o 01p1data.s", thus producing the file one.o.
  3. Run your assembler via the command "assembler -o two.o 01p1data.s", thus producing the file two.o.
  4. Execute the command "elfdump -s one.o" to print the symbol table of one.o to standard output.
  5. Execute the command "elfdump -s two.o" to print the symbol table of two.o to standard output.
  6. Compare the symbol tables. For each label, the section index (shndx), offset (value), type (bind), and name fields should be identical.
  7. Repeat steps 2 to 6 for each of the three assembly language programs

Don't forget to comment-in the calls to Pass2 functions in the assembler.c file before proceeding with the development and testing of your Pass2 module.

The Pass2 Module

Pass2 should be an abstract object. The Assembler object will give the Pass2 object a SymTable object, three Section objects, and an integer indicating the next label sequence number. 

The primary job of the Pass2 object is to fetch data from the SymTable, Program, Instr, and Arg objects to add code to the three Section objects. It should dynamically allocate a block of memory of the appropriate size for the data section, fill that memory with the appropriate data, and install it into the data Section object. Similarly it should dynamically allocate a block of memory of the appropriate size for the text section, fill that memory with the appropriate data, and install it into the text Section object. The Pass2 object should not allocate a block of memory for the bss section, and should not install a block of memory into the bss Section object.

Another job of the Pass2 object is to add a new binding to the SymTable object whenever it encounters a label that is referenced but not defined. It should use the given next label sequence number to do that.

The final task of the Pass2 object is to add relocation records to the text Section object. Each relocation record should be a RelRecord object.

Your Pass2 object may assume that the given assembly language program is semantically valid.

Your Pass2 object should contain no memory leaks. Your Pass2 must free all memory that it has dynamically allocated. More precisely, the given modules take care of freeing memory contained within the Program, Instr, Arg, Section, and SymTable objects. Your Pass2 object must free any other memory that it dynamically allocated. It is likely that there will be no such memory.

Testing the Pass2 Module

You can test your Pass2 module using the given example assembly language programs (04p2directives.s, 05p2arith.s, 06p2log.s, 07p2shift.s, 08p2load.s, 09p2store.s, 10p2control.s, 11p2branch.s, 12p2call.s, 13p2syntheticarith.s, 14p2syntheticlog.s, 15p2syntheticcontrol.s, 16p2relbranch.s, 17p2relcall.s, 18p2relset.s, 19fibonacci.s, and 20bubblesort.s), the standard UNIX elfdump command, the given objdump program, and this procedure:

  1. Execute the command "as -o one.o 04p2directives.s", thus producing the file one.o.
  2. Execute the command "assembler -o two.o 04p2directives.s", thus producing the file two.o.
  3. Execute the command "elfdump -s one.o" to display the symbol table of one.o to standard output.
  4. Execute the command "elfdump -s two.o" to display the symbol table of two.o to standard output.
  5. Compare the symbol tables. For each label, the "shndx," "value," "bind," and "name" fields should be identical.
  6. Execute the command "objdump -d one.o" to display the contents of the data section of one.o to standard output.
  7. Execute the command "objdump -d two.o" to display the contents of the data section of two.o to standard output.
  8. Compare the data sections. They should be identical.
  9. Execute the command "objdump -t one.o" to display the contents of the text section of one.o to standard output.
  10. Execute the command "objdump -t two.o" to display the contents of the text section of two.o to standard output.
  11. Compare the text sections. Then should be identical.
  12. Execute the command "elfdump -r one.o" to display the relocation records of one.o to standard output.
  13. Execute the command "elfdump -r two.o" to display the relocation records of two.o to standard output.
  14. Compare the relocation records. All of the "type" and "offset" fields should be identical. Many of the "with respect to" fields should be identical.
  15. Repeat steps 1 to 14 for each example assembly language program.
  16. Use you assembler to assemble the 19fibonacci.s and 20bubblesort.s assembly language programs. Then use gcc to link them, thus producing executable files. Finally, execute those files. They should execute properly

You will find the given UNIX shell script objdiff useful. Given an assembly language program, it assembles the program using as and using your assembler, and then uses the standard UNIX diff command to compare the symbol table, data section, text section, and relocation records of the two resulting object files. Thus it implements steps 1-14 as described above.

You will also find the given UNIX shell script grade5 useful. It executes objdiff for each example assembly language program. It also uses your assembler to assemble 19fibonacci.s and 20bubblesort.s, uses gcc to link them, and executes them.

Logistics

You should develop on arizona. Use Emacs to create source code. Use gdb to debug.

The directory /u/cs217/Assignment5 contains files that are pertinent to the assignment:

You should submit:

Your readme file should contain:

You should not submit the given source code files.

Submit your work electronically via the command:

/u/cs217/bin/submit 5 pass1.h pass1.c pass2.h pass2.c 
   symtable.h symtable.c othersourcecodefiles makefile readme

Grading

As always, we will grade your work on functionality and design, and will consider understandability to be an important aspect of good design. Defining each function so it does a single small well-defined task is critical to understandability, as are function-level comments in both .h and .c files. To encourage good coding practices, we will take off points based on warning messages during compilation.

Your grade for the assignment will be based upon these criteria:

Criterion Points
readme file 3
makefile 3
Properly translating the 01p1data.s program 3
Properly translating the 02p1all.s program 4
Properly translating the 03p1global.s program 4
Properly translating the 04p2directives.s program 4
Properly translating the 05p2arith.s program 3
Properly translating the 06p2log.s program 3
Properly translating the 07p2shift.s program 3
Properly translating the 08p2load.s program 4
Properly translating the 09p2store.s program 4
Properly translating the 10p2control.s program 4
Properly translating the 11p2branch.s program 4
Properly translating the 12p2call.s program 4
Properly translating the 13p2syntheticarith.s program 3
Properly translating the 14p2syntheticlog.s program 3
Properly translating the 15p2syntheticcontrol.s program 3
Properly translating the 16p2relbranch.s program 4
Properly translating the 17p2relcall.s program 4
Properly translating the 18p2relset.s program 3
Properly translating the 19fibonacci.s program 5
Properly translating the 20bubblesort.s program 5
Design and understandability 20
TOTAL 100

Note that it will not be possible to give partial credit for the "Properly translating..." criteria.