Princeton University
COS 217: Introduction to Programming Systems

Assignment 5: An IA-32 Assembler

Purpose

The purpose of this assignment is to help you understand IA-32 assembly language, IA-32 machine language, and the assembly process.

Background

An IA-32 assembler reads a source code (.s) file containing IA-32 assembly language code and writes an object (.o) file containing IA-32 machine language code. The object file is expressed in the UNIX Executable and Linking Format (ELF). An ELF file consists of sections. Three of the most important sections are the data section, the bss section, and the text section.

An assembler makes two passes. During pass 1 the assembler traverses the assembly language code to generate the sections that comprise the machine language code.  It also generates a symbol table section; each binding of the symbol table relates a label to an offset within a particular section. Finally, it generates a relocation record section; each relocation record indicates an area of the generated code that must be patched.  Then during pass 2 the assembler traverses the relocation records to patch the machine language code in the data and text sections.

Your Task

Your task in this assignment is to use the C programming language to create an assembler that can process a subset of IA-32 assembly language. More precisely...

You are given code that reads an assembly language source code file and creates an in-memory representation of the assembly language program.  Your job is to create code that accepts the in-memory representation, performs two passes (as described above), and thereby produces an in-memory representation of the program's data, bss, text, symbol table, and relocation record sections. Finally, you are given code that accepts those in-memory sections and writes the object file.

The Assembly Language Subset

Your assembler should accept the subset of IA-32 assembly language that is described in A Subset of IA-32 Assembly Language for the Assembler Assignment.  Note in particular these features of the subset:

The Modules

[Some of the modules are defined as abstract data types (ADTs), as described previously in our course.  Other modules are defined as "abstract objects." An abstract object is a simpler alternative to an ADT. An ADT is appropriate when your program needs to create many objects of the same type; an abstract object is appropriate only when your program uses exactly one object of its type. See Sections 19.1 and 19.2 of our King textbook for more information about abstract objects.]

You are given modules that accomplish the task of reading the assembly language program and creating an in-memory representation:

You also are given modules that comprise the in-memory representation:

Finally, you are given these modules:

You also should use your SymTable ADT from Assignment 3. You may use either the linked list or the hash table implementation of your SymTable ADT. If your SymTable ADT is not working, we will provide you with working symtable.h and symtable.o files at your request.

You should create modules named Pass1 and Pass2, as described below.

The Pass1 Module

Pass1 should be an abstract object. The Assembler object will give the Pass1 object an empty SymTable object, and three empty Section objects: one for the data section, one for the bss section, and one for the text section.

The Pass1 module should:

Your Pass1 object may assume that the Parser object checks the given assembly language program for syntax errors. Your Pass1 object should detect and report only one semantic error: a duplicate definition of a label.

Your Pass1 object should contain no memory leaks. Your Pass1 should free all memory that it has dynamically allocated. More precisely, the given modules take care of freeing memory consumed by the Program, Instr, Operand, Section, RelRecord, SymTable, and LabelInfo objects. Your Pass1 object should free any other memory that it dynamically allocates. It is likely that there will be no such memory.

Testing the Pass1 Module

You can test your Pass1 module using:

and this procedure:

  1. Temporarily comment-out all calls to Pass2 functions in the assembler.c file.
  2. Execute the command "sampleassembler -o one.o 01testdata.s", thus producing the file one.o.
  3. Execute the command "assembler -o two.o 01testdata.s", thus producing the file two.o.
  4. Execute the command "readelf -s one.o" to display the symbol table of one.o to standard output.
  5. Execute the command "readelf -s two.o" to display the symbol table of two.o to standard output.
  6. Compare the symbol tables. they should be identical.
  7. Execute the command "objdump -s --section=.data one.o" to display the contents of the data section of one.o to standard output.
  8. Execute the command "objdump -s --section=.data two.o" to display the contents of the data section of two.o to standard output.
  9. Compare the data sections. They should be identical.
  10. Execute the command "objdump -d one.o" to display the contents of the text section of one.o to standard output.
  11. Execute the command "objdump -d two.o" to display the contents of the text section of two.o to standard output.
  12. Compare the text sections. Then should be identical.
  13. Execute the command "readelf -r one.o" to display the relocation records of one.o to standard output.
  14. Execute the command "readelf -r two.o" to display the relocation records of two.o to standard output.
  15. Compare the relocation records. They should be identical.
  16. Repeat steps 2 to 15 for each example assembly language program.
You also should test your Pass1 module by using it to assemble the 16hello.s and 17powerfunction.s assembly language programs, thus producing 16hello.o and 17powerfunction.o. Then use gcc to link those file, thus producing executable files. Finally, execute those files. They should execute properly.

You will find the given bash shell script objdiff useful. Given an assembly language program, it assembles the program using sampleassembler and using your assembler, and then uses the standard UNIX diff command to compare the symbol table, data section, text section, and relocation records of the two resulting object files. Thus it implements steps 2-15 as described above.

You will also find the given bash shell script grade5 useful. It executes objdiff for each example assembly language program. It also uses your assembler to assemble 16hello.s and 17powerfunction.s, uses gcc to link them, and executes them.

The Pass2 Module

Pass2 should be an abstract object. The Assembler object will give the Pass2 object the filled SymTable object (with its LabelInfo objects), the filled text Section object (with its RelRecord objects), and the filled data Section object (with its RelRecord objects). The Pass2 object should traverse each Section's RelRecord objects. For each RelRecord object:

Your Pass2 object should contain no memory leaks. Your Pass2 should free all memory that it has dynamically allocated. More precisely, the given modules take care of freeing memory consumed by the Section, RelRecord, SymTable, and LabelInfo objects. Your Pass2 object should free any other memory that it dynamically allocates.

Testing the Pass2 Module

You can test your Pass2 module using the procedure described above, adding the 13testpass2call.s, 14testpass2jump.s, and 15testpass2movl.s files.

Logistics

You should develop on hats. Use xemacs to create source code. Use gdb to debug.

The directory /u/cos217/Assignment5 contains files that are pertinent to the assignment:

You should submit:

Your readme file should contain:

You should not submit the given source code files.

Submit your work electronically via the command:

/u/cos217/bin/i686/submit 5 pass1.h pass1.c pass2.h pass2.c 
   symtable.h symtable{list|hash}.c othersourcecodefilesthatyoucreated makefile readme

Grading

As always, we will grade your work on functionality and design, and will consider understandability to be an important aspect of good design. Defining each function so it does a single small well-defined task is critical to understandability, as are function-level comments in both .h and .c files. To encourage good coding practices, we will take off points based on warning messages during compilation.