Programming Assignment 5: Garbage Collection
In this assignment you will create a copying garbage collector for a modified
version of the Fun language interpreter.
- Copy /u/cos320/chap5/* and /u/cos320/chap5/*/* into a new directory
as5. (The code has been rearranged to put a big chunk of
code in the subdirectory called fun and the subdirectory called tests. Make sure you copy
- If you have your own working copies of the
parser, lexer, etc from assignments 2, 3 and 4 you may use them for this
assignment. If you don't have
working versions, then use the ones we provide.
- Your main job is to implement memory management for the Fun language
interpreter. The interpreter is substantially more complicated than what
you have seen before because it now operates using an explicit machine memory.
The machine memory has several parts, each of which corresponds to the real
memory you would find in a machine. You will want to look through the
interfaces to these files and understand how they work. Part of the
assignment is to understand the new interpreter well enough so that you will be
able to extend it with garbage collection.
- word.sig/sml: Every location in the stack or the heap
contains a memory word. A word is an integer with two extra bits for
encoding type information used by the garbage collector. A bit is
represented as a data type and is either ON or OFF. The bit
encoding indicates whether or not the word is a pointer to data (ON, ON), an integer
(OFF, ON) , a tuple header (ON,OFF), or a code pointer (OFF,OFF).
- representation of heap data: all data structures in memory
should be represented as a tuple header, followed by the actual data.
The tuple header gives the number of fields in the tuple. For
instance, to represent the pair (7,12), we would use three memory words in
total. The first memory word would be (2,ON,OFF) indicating that it is
a header word and its data has 2 fields. The second memory word would
be (7,OFF,ON), indicating it is a simple integer with value 7. The
third memory word would be (12,OFF,ON), indicating it is also a simple
integer with value 12. Your garbage collector will need the headers
and bits to be able to properly traverse the heap and copy it from
from-space to to-space.
- codeseg.sig/sml: the memory area where executable code is
stored. And there are no pointers directly from code into the dynamic
heap, so your garbage collector does not need to "traverse" the code.
It can stop traversal of that portion of the heap as soon as it sees a code
- stack.sig/sml: the run-time stack where activation records
or "frames" for each function are stored. In the interpreter, each
function has its own stack frame that contains the local variables that the
function can use. The stack is a list of these frames.
- heap.sig/sml: this is the dynamic memory allocation area
where tuples and references are allocated and stored. Memory has a
fixed size given by the constant memsize. memsize is
currently set to 64. If programs use too much deep space, you can run
out! We use objects with type address (= int) to index
into memory. Addresses between 0 and memsize-1 are
valid. Other addresses are not. Two important data structures in
this module are the current allocation pointer and the limit pointer.
- allocptr: this reference will hold a pointer into memory at
the place you should allocate the next new data.
- limitptr: this reference will hold a pointer into memory at
the end of the current allocation space.
- store.sig/sml: this is where the garbage collector
(function gc) and allocator (function alloc, which will call
the garbage collector when it runs out of memory) are implemented. The
gc should be a copying collector with equal-sized to-space and from-space.
Most of your task involves filling in this part of the garbage collector.
- alloc function: Within the current space, all addresses below
the allocptr contain data structures in use by the program.
In order to allocate a new data structure, it is necessary to check
whether or not there is enough space left, by adding the size of the
object to be allocated (including the header word) to the allocptr
and comparing with the limitptr. If and there is enough
space, add the appropriate amount to the allocptr and return a
pointer to the allocated space to the interpreter, which can initialize
it properly. If there is not enough space, alloc should call the
function gc. (The interpreter does not call gc directly, only when
alloc finds it is out of space.)
- gc function: This function will implement a copying
garbage collection. It will copy data from the currently-used half
of memory (now called the from-space) to the currently-unused space (now
called the to-space). For your reference, copying collection is
described in Chapter 13.3 of Appel's textbook. You should use a
variant of Cheny's algorithm to implement your garbage collector.
To perform garbage collection beginning from the roots, you will need to
use the stackmap function from the stack module. This
function walks down the stack for you and calls the function that you
choose on each word in the stack. It replaces the current word on
the stack with the word you return.
- interpret.sig/sml: this is the main body of the
interpreter. It is your job to implement the commands for allocating
and operating on tuples (expressions Fun.Tuple and Fun.Proj).
Fun.Tuple allocates a new block of memory in the heap. Fun.Proj takes
the ith element of a tuple (starting the count from 0).
- notes.txt: contains a few more useful notes about the code
- tests/*.fun: contains a number of test files you might use
to debug your code. Of course, and as always, you may want to test
your code on other files as well.
- As usual, compile by invoking CM.make();.
- You can test your code by running Interp.test "filename.fun".
- Submit by running submit 5 <all files>.
- Additional challenge: a garbage collector operates when there
is no space left to use. Therefore, in principle, your gc function should
run in a constant amount of space. ML is not a particularly good language
when it comes to writing constant-space programs (which is why garbage
collectors are always written in C --- ah ha, so C is good for something!).
Try to write your garbage collector so it operates in constant space. I
will not give you extra credit for this one because it is not too hard.
Extra credit: instead of a simple to-space/from-space collector, write
a generational collector with two generations, each of which does copying
collection. Make sure to implement the write barrier properly. This
may take substantial modification to the basic infrastructure. I have not
done it yet. I don't know! Note: We will record any extra
credit separately from the rest of the grades. In the end, it may help
boost your grade a notch if you are on the borderline, or it might not.