COS 320, Spring 2000. Programming Assignment
Programming Assignment 5: Garbage Collection
In this assignment you will create a copying garbage collector for a modified 
version of the Fun language interpreter. 
Type checking
- Copy /u/cos320/chap5/* and /u/cos320/chap5/*/*  into a new directory
as5.  (The code has been rearranged to put a big chunk of 
code in the subdirectory called fun and the subdirectory called tests.  Make sure you copy 
everything.)
- If you have your own working copies of the 
parser, lexer, etc from assignments 2, 3 and 4 you may use them for this 
assignment.  If you don't have
working versions, then use the ones we provide.
- Your main job is to implement memory management for the Fun language 
interpreter.  The interpreter is substantially more complicated than what 
you have seen before because it now operates using an explicit machine memory.  
The machine memory has several parts, each of which corresponds to the real 
memory you would find in a machine.  You will want to look through the 
interfaces to these files and understand how they work.  Part of the 
assignment is to understand the new interpreter well enough so that you will be 
able to extend it with garbage collection.
	- word.sig/sml:  Every location in the stack or the heap 
	contains a memory word.  A word is an integer with two extra bits for 
	encoding type information used by the garbage collector.  A bit is 
	represented as a data type and is either ON or OFF.  The bit 
	encoding indicates whether or not the word is a pointer to data (ON, ON), an integer 
	(OFF, ON) , a tuple header (ON,OFF), or a code pointer (OFF,OFF).
	- representation of heap data:  all data structures in memory 
	should be represented as a tuple header, followed by the actual data.  
	The tuple header gives the number of fields in the tuple.  For 
	instance, to represent the pair (7,12), we would use three memory words in 
	total.  The first memory word would be (2,ON,OFF) indicating that it is 
	a header word and its data has 2 fields.  The second memory word would 
	be (7,OFF,ON), indicating it is a simple integer with value 7.  The 
	third memory word would be (12,OFF,ON), indicating it is also a simple 
	integer with value 12.  Your garbage collector will need the headers 
	and bits to be able to properly traverse the heap and copy it from 
	from-space to to-space.
 
- codeseg.sig/sml:  the memory area where executable code is 
	stored.  And there are no pointers directly from code into the dynamic 
	heap, so your garbage collector does not need to "traverse" the code.  
	It can stop traversal of that portion of the heap as soon as it sees a code 
	pointer.
- stack.sig/sml:  the run-time stack where activation records 
	or "frames" for each function are stored.  In the interpreter, each 
	function has its own stack frame that contains the local variables that the 
	function can use.  The stack is a list of these frames.
- heap.sig/sml:  this is the dynamic memory allocation area 
	where tuples and references are allocated and stored.  Memory has a 
	fixed size given by the constant memsize.  memsize is 
	currently set to 64.  If programs use too much deep space, you can run 
	out!  We use objects with type address (= int) to index 
	into memory.  Addresses between 0 and memsize-1 are 
	valid.  Other addresses are not.  Two important data structures in 
	this module are the current allocation pointer and the limit pointer.
	- allocptr:  this reference will hold a pointer into memory at 
	the place you should allocate the next new data.  
- limitptr:  this reference will hold a pointer into memory at 
	the end of the current allocation space.
 
- store.sig/sml:  this is where the garbage collector 
	(function gc) and allocator (function alloc, which will call 
	the garbage collector when it runs out of memory) are implemented.  The 
	gc should be a copying collector with equal-sized to-space and from-space.   
	Most of your task involves filling in this part of the garbage collector. 
		- alloc function: Within the current space, all addresses below 
		the allocptr contain data structures in use by the program.  
		In order to allocate a new data structure, it is necessary to check 
		whether or not there is enough space left, by adding the size of the 
		object to be allocated (including the header word) to the allocptr
		and comparing with the limitptr.  If and there is enough 
		space, add the appropriate amount to the allocptr and return a 
		pointer to the allocated space to the interpreter, which can initialize 
		it properly.  If there is not enough space, alloc should call the 
		function gc.  (The interpreter does not call gc directly, only when 
		alloc finds it is out of space.)
- gc function:  This function will implement a copying 
		garbage collection.  It will copy data from the currently-used half 
		of memory (now called the from-space) to the currently-unused space (now 
		called the to-space).  For your reference, copying collection is 
		described in Chapter 13.3 of Appel's textbook.  You should use a 
		variant of Cheny's algorithm to implement your garbage collector.  
		To perform garbage collection beginning from the roots, you will need to 
		use the stackmap function from the stack module.  This 
		function walks down the stack for you and calls the function that you 
		choose on each word in the stack.  It replaces the current word on 
		the stack with the word you return. 
 
- interpret.sig/sml:  this is the main body of the 
	interpreter.  It is your job to implement the commands for allocating 
	and operating on tuples (expressions Fun.Tuple and Fun.Proj).  
	Fun.Tuple allocates a new block of memory in the heap.  Fun.Proj takes 
	the ith element of a tuple (starting the count from 0).
- notes.txt:  contains a few more useful notes about the code 
	set up.
- tests/*.fun:  contains a number of test files you might use 
	to debug your code.  Of course, and as always, you may want to test 
	your code on other files as well.
 
 
- As usual, compile by invoking CM.make();.
- You can test your code by running Interp.test "filename.fun".
- Submit by running submit 5 <all files>.
- Additional challenge:  a garbage collector operates when there 
is no space left to use.  Therefore, in principle, your gc function should 
run in a constant amount of space.  ML is not a particularly good language 
when it comes to writing constant-space programs (which is why garbage 
collectors are always written in C --- ah ha, so C is good for something!).  
Try to write your garbage collector so it operates in constant space.  I 
will not give you extra credit for this one because it is not too hard.
- 
Extra credit:  instead of a simple to-space/from-space collector, write 
a generational collector with two generations, each of which does copying 
collection.  Make sure to implement the write barrier properly.  This 
may take substantial modification to the basic infrastructure.  I have not 
done it yet.  I don't know!  Note:  We will record any extra 
credit separately from the rest of the grades.  In the end, it may help 
boost your grade a notch if you are on the border line, or it might not.