The heap is a collection of records that represent environments,
continuations, and data. Data includes closures, pairs, boxes, etc. It can
be represented as a directed graph with nodes being records and edges being
pointers. Many heap records are not needed after program execution passes
a certain point. Consider: (map add1 (map sub1 '(1 2 3)))
. The
intermediate list created by sub1
is not need after
add1
has executed. This example leads us to the question:
when do we want to reclaim heap records that will never be used again? Put
another way: when is data garbage?
At one end of the spectrum, we never collect garbage. This is a simple answer, but un-practical because it requires an arbitrarily large memory. On the other end of the spectrum, we reclaim data immediately after its last use. Unfortunately, this is not possible to implement because it is undecidable. Consider
A: (use x) ... B: (if P x Q)Point A could be the last use of
x
if P
evaluates
false. But predicting P is undecidable, hence whether A is the last use is
undecidable as well. For this reason, we must pick a decidable point on the
spectrum.
Consider the state of the CPS, first-order, registerized interpreter when
stopped at some point during execution. To resume execution, we need to
know the contents of the registers, namely e, env, k
and
everything to which they refer. Nothing else.
e
refers to parts of the abstract syntax tree (ie program),
which
refer only to other parts of the program. These are static - allocated
once before program execution begins, and never again. Let's forget about
them.
env, k
hold records that refer to other environment
records, continuations, and values (ie. heap records). We only need
those heap records that are reachable from env, k
. Heap
records that are not reachable will never again be used.
We now need some definitions.
The idea behind Mark/Sweep collection is to mark each record as live or dead and place the dead records on a "free list" for reuse. We allocate a mark bit in each record.