Princeton University
COS 217: Introduction to Programming Systems

Assignment 5: A SPARC Assembler

Purpose

The purpose of this assignment is to help you understand SPARC assembly language, SPARC machine language, and the assembly process.

Background

A SPARC assembler reads a source code (.s) file containing SPARC assembly language code and writes an object (.o) file containing SPARC machine language. The object file is expressed in the UNIX Executable and Linking Format (ELF). An ELF file consists of sections; the most common are data, bss, and text.

An assembler makes two passes over the source code. During pass1 the assembler creates a symbol table. Each binding of the table relates a label to an offset within a particular section. During pass2 the assembler uses that symbol table to generate the sections that comprise the object code.

The SPARC assembler that underlies the gcc command is named "as".

Your Task

Your task in this assignment is to use the C programming language to create an assembler that can process a substantial subset of SPARC assembly language. More precisely...

You are given code that reads a source code file and creates an in-memory representation of the assembly language program. Your job is to create code that accepts the in-memory representation, performs two passes over it (as described above), and thereby produces an in-memory representation of the program's data, bss, and text sections. Finally, you are given code that accepts those in-memory sections and writes the object file.

Important note: You must compare the performance of your assembler with that of the "as" assembler whose full pathname is /usr/ccs/bin/as. You can verify that you are using that assembler by typing the command "which as"; the reply should be /usr/ccs/bin/as. If not, you should change the value of your PATH environment variable accordingly.

The Assembly Language Subset

Your assembler should be able to process the subset of SPARC assembly language that is described in A Subset of SPARC Assembly Language for the Assembler Assignment. Note in particular these features of the subset:

Only data, bss, and text sections are allowed. In particular, the rodata section is excluded.
Assembly-time expressions must be simple decimal literals. For example, these instructions contain assembly-time expressions (highlighted) that are excluded:

.skip  4 * 100
.word  (18 + 5) * 3
save   %sp, (-92 - 8) & -8, %sp
sethi  %hi(label), %o0
or     %o0, %lo(label), %o0
cmp    %l0, 'a'
add    %l0, 0x20, %l1

The first argument of a "set" instruction must be a label. For example, these instructions are excluded:

set 12345, %l0	
set 0, %l0

Labels can be referenced only within set, branch, and call instructions, and must be unadorned. For example, these instructions are excluded:

set label + 2, %l0
bg label + 4
call label - 8
.word label

The Modules

You are given modules that accomplish the task of reading the assembly language program and creating an in-memory representation:

Lexer (abstract object). The Lexer object is a lexical analyzer. It reads the sequence of characters that comprises an assembly language source code program, and converts it into a sequence of tokens.
Parser (abstract object). The Parser object is a syntactic analyzer. It accepts the sequence of tokens generated by the Lexer object, and verifies that the sequence is a syntactically valid assembly language program. While doing so it calls functions of other modules to translate the assembly language source code into an in-memory representation.

You also are given modules that comprise the in-memory representation:

Program (abstract object). Program is the in-memory representation of the assembly language program. It contains Instr objects.
Instr (abstract data type). An Instr object is an in-memory representation of an assembly language instruction. An Instr object has a name (e.g. add, .global), a type (directive, mnemonic, or label), and Arg objects.
Arg (abstract data type). An Arg object is an in-memory representation of an assembly language argument. An Arg object is either a register (e.g. %i4, %g0), a literal (e.g. 0, 1, 555), a symbol (alias label) (e.g. label1), or a string (e.g. "hello", ".text").

Finally, you are given these modules:

LabelInfo (abstract data type). A LabelInfo object contains information about a label: its section, its offset within its section, its type (local or global), and its label sequence number. The SymTable object that your pass1 code creates should contain values that are LabelInfo objects.
Section (abstract data type). A Section object contains an in-memory representation of a section. Your pass1 and pass 2 code should create three Section objects: one for the data section, one for the bss section, and one for the text section. A Section object contains the byte length of the section and the code that comprises the section. The text Section object also may contain RelRecord objects.
RelRecord (abstract data type). A RelRecord object is a relocation record through which your assembler can tell the linker about "patches" that the linker must perform. A RelRecord object contains the offset of the patch, and a label sequence number that identifies a label, and the type of the patch. The type is either a 22-bit displacement patch (appropriate for patching a branch instruction), a 30-bit displacement patch (appropriate for patching a "call" instruction), a high-order 22-bit patch (appropriate for patching a "sethi" instruction that is generated from a "set" instruction), or a low-order 10-bit patch (appropriate for patching an "or" instruction that is generated from a "set" instruction).
ElfGen (abstract object). The ElfGen object accepts the SymTable object and the three Section objects, and writes the object file.
Assembler. The Assembler module contains a main function that orchestrates the assembly process by calling functions of the other modules.

You should also use your SymTable ADT from previous assignments. If your SymTable ADT is not working, we will provide you with working symtable.h and symtable.o files at your request.

You should create modules named Pass1 and Pass2, as described below.

The Pass1 Module

Pass1 should be an abstract object. The Assembler object will give the Pass1 object (1) a SymTable object, (2) three Section objects: one for the data section, one for the bss section, and one for the text section, and (3) the address of a "label sequence number" integer.

The Pass1 module should:

Fetch data from the Program, Instr, and Arg objects to create bindings in the given SymTable object. The Pass1 object should create one binding for each label that is defined in the program. Each key should be a label. Each value should be a LabelInfo object.
Process .global directives, thus affecting the given SymTable object.
Set the size of each Section object.
Tell the Assembler object the next label sequence number to be used. It should do that via the "label sequence number" address that the Assembler object passes to it. The Assembler object will pass that integer to your Pass2 object -- in case it needs to add additional bindings to the SymTable object.

Your Pass1 object may assume that the given assembly language program is semantically valid.

Your Pass1 object should contain no memory leaks. Your Pass1 must free all memory that it has dynamically allocated. More precisely, the given modules take care of freeing memory contained within the Program, Instr, Arg, Section, and SymTable objects. Your Pass1 object must free any other memory that it dynamically allocated. It is likely that there will be no such memory.

Testing the Pass1 Module

You can test your Pass1 module using the given example assembly language programs (01p1data.s, 02p1all.s, and 03p1global.s), the standard UNIX elfdump command, and this procedure:

Temporarily comment-out all calls to Pass2 functions in the assembler.c file.
Run the standard UNIX assembler via the command "as -o one.o 01p1data.s", thus producing the file one.o.
Run your assembler via the command "assembler -o two.o 01p1data.s", thus producing the file two.o.
Execute the command "elfdump -s one.o" to print the symbol table of one.o to standard output.
Execute the command "elfdump -s two.o" to print the symbol table of two.o to standard output.
Compare the symbol tables. For each label, the section index (shndx), offset (value), type (bind), and name fields should be identical.
Repeat steps 2 to 6 for each of the three assembly language programs

Don't forget to comment-in the calls to Pass2 functions in the assembler.c file before proceeding with the development and testing of your Pass2 module.

The Pass2 Module

Pass2 should be an abstract object. The Assembler object will give the Pass2 object a SymTable object, three Section objects, and an integer indicating the next label sequence number.

The primary job of the Pass2 object is to fetch data from the SymTable, Program, Instr, and Arg objects to add code to the three Section objects. It should dynamically allocate a block of memory of the appropriate size for the data section, fill that memory with the appropriate data, and install it into the data Section object. Similarly it should dynamically allocate a block of memory of the appropriate size for the text section, fill that memory with the appropriate data, and install it into the text Section object. The Pass2 object should not allocate a block of memory for the bss section, and should not install a block of memory into the bss Section object.

Another job of the Pass2 object is to add a new binding to the SymTable object whenever it encounters a label that is referenced but not defined. It should use the given next label sequence number to do that.

The final task of the Pass2 object is to add relocation records to the text Section object. Each relocation record should be a RelRecord object.

Your Pass2 object may assume that the given assembly language program is semantically valid.

Your Pass2 object should contain no memory leaks. Your Pass2 must free all memory that it has dynamically allocated. More precisely, the given modules take care of freeing memory contained within the Program, Instr, Arg, Section, and SymTable objects. Your Pass2 object must free any other memory that it dynamically allocated. It is likely that there will be no such memory.

Testing the Pass2 Module

You can test your Pass2 module using the given example assembly language programs (04p2directives.s, 05p2arith.s, 06p2log.s, 07p2shift.s, 08p2load.s, 09p2store.s, 10p2control.s, 11p2branch.s, 12p2call.s, 13p2syntheticarith.s, 14p2syntheticlog.s, 15p2syntheticcontrol.s, 16p2relbranch.s, 17p2relcall.s, 18p2relset.s, 19fibonacci.s, and 20bubblesort.s), the standard UNIX elfdump command, the given objdump program, and this procedure:

Execute the command "as -o one.o 04p2directives.s", thus producing the file one.o.
Execute the command "assembler -o two.o 04p2directives.s", thus producing the file two.o.
Execute the command "elfdump -s one.o" to display the symbol table of one.o to standard output.
Execute the command "elfdump -s two.o" to display the symbol table of two.o to standard output.
Compare the symbol tables. For each label, the "shndx," "value," "bind," and "name" fields should be identical.
Execute the command "objdump -d one.o" to display the contents of the data section of one.o to standard output.
Execute the command "objdump -d two.o" to display the contents of the data section of two.o to standard output.
Compare the data sections. They should be identical.
Execute the command "objdump -t one.o" to display the contents of the text section of one.o to standard output.
Execute the command "objdump -t two.o" to display the contents of the text section of two.o to standard output.
Compare the text sections. Then should be identical.
Execute the command "elfdump -r one.o" to display the relocation records of one.o to standard output.
Execute the command "elfdump -r two.o" to display the relocation records of two.o to standard output.
Compare the relocation records. All of the "type" and "offset" fields should be identical. Many of the "with respect to" fields should be identical.
Repeat steps 1 to 14 for each example assembly language program.
Use you assembler to assemble the 19fibonacci.s and 20bubblesort.s assembly language programs. Then use gcc to link them, thus producing executable files. Finally, execute those files. They should execute properly

You will find the given UNIX shell script objdiff useful. Given an assembly language program, it assembles the program using as and using your assembler, and then uses the standard UNIX diff command to compare the symbol table, data section, text section, and relocation records of the two resulting object files. Thus it implements steps 1-14 as described above.

You will also find the given UNIX shell script grade5 useful. It executes objdiff for each example assembly language program. It also uses your assembler to assemble 19fibonacci.s and 20bubblesort.s, uses gcc to link them, and executes them.

Logistics

You should develop on arizona. Use Emacs to create source code. Use gdb to debug.

The directory /u/cs217/Assignment5 contains files that are pertinent to the assignment:

The source code files for all given modules.
A makefile. You must add the dependency rules for your Pass1 and Pass2 modules, and for any additional modules that you create. (It is likely that you will create no additional modules.)
An executable file for the objdump program.
The objdiff UNIX shell script.
Assembly language programs that you can use to test your assembler.
A UNIX shell script (grade5) that automates the process of testing your assembler.
A partial makefile that you can use as the basis for defining your complete makefile.

You should submit:

The source code files for your Pass1 and Pass2 modules (pass1.h, pass1.c, pass2.h, pass2.c).
The source code files for your SymTable module (symtable.h, symtable.c).
The source code files for any additional modules that you have created.
A makefile. The first dependency rule should build your entire program, compiling with the -Wall, -ansi, and -pedantic options.
A readme file.

Your readme file should contain:

Your name.
A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course "Policies" web page.
If you are submitting your work late, an indication of how many "free late days" you wish to apply to this assignment.
Optionally, an indication of how much time you spent doing the assignment.
Optionally, your assessment of the assignment.
Optionally, any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.

You should not submit the given source code files.

Submit your work electronically via the command:

/u/cs217/bin/submit 5 pass1.h pass1.c pass2.h pass2.c 
   symtable.h symtable.c othersourcecodefiles makefile readme

Grading

As always, we will grade your work on functionality and design, and will consider understandability to be an important aspect of good design. Defining each function so it does a single small well-defined task is critical to understandability, as are function-level comments in both .h and .c files. To encourage good coding practices, we will take off points based on warning messages during compilation.

Your grade for the assignment will be based upon these criteria:

Criterion	Points
readme file	3
makefile	3
Properly translating the 01p1data.s program	3
Properly translating the 02p1all.s program	4
Properly translating the 03p1global.s program	4
Properly translating the 04p2directives.s program	4
Properly translating the 05p2arith.s program	3
Properly translating the 06p2log.s program	3
Properly translating the 07p2shift.s program	3
Properly translating the 08p2load.s program	4
Properly translating the 09p2store.s program	4
Properly translating the 10p2control.s program	4
Properly translating the 11p2branch.s program	4
Properly translating the 12p2call.s program	4
Properly translating the 13p2syntheticarith.s program	3
Properly translating the 14p2syntheticlog.s program	3
Properly translating the 15p2syntheticcontrol.s program	3
Properly translating the 16p2relbranch.s program	4
Properly translating the 17p2relcall.s program	4
Properly translating the 18p2relset.s program	3
Properly translating the 19fibonacci.s program	5
Properly translating the 20bubblesort.s program	5
Design and understandability	20
TOTAL	100

Note that it will not be possible to give partial credit for the "Properly translating..." criteria.

Princeton University COS 217: Introduction to Programming Systems