Unboxing analysis

Implement an unboxing optimization for tuples and refs for the Fun compiler.

What is unboxing

Data that is represented by a pointer to a (typically heap-allocated) record is called boxed. Data represented directly in registers is unboxed. Consider this C program:
struct foo {int i,j;}
struct foo * f(int x, int y, struct foo *p) {
   int r=0;
   r=p->i;
   r++;
   return makeNewFoo(r,p->y);
}
We say that x and y are unboxed; r is unboxed; p is boxed; and the return result of the function is boxed.

Now consider this Fun program:

fun f(arg: <int,int,<int,int>>) : <int,int> =
 let x = #0 arg in
 let y = #1 arg in
 let p = #2 arg in
 let r = ref 0 in
 (r := #0 p; r := !r + 1; )
Here, in a naive implementation of Fun, arg is boxed, r is boxed, p is boxed, bug x and y are unboxed.

The naive implementation will run more slowly than the C program because it is heap-allocating so many tuples, and there are so many loads and stores.

But it need not be that way. There's nothing that prevents a more sophisticated compiler from generating unboxed code for Fun programs. Here's how to do it.

Type annotations

The first step is to modify the type-checker to record information about the types of program variables. This could be done by making a Symbol.table that maps variable names to types—to do this it would help to rename all local variables so that no two ever have the same name; but let's not do it that way.

Modify your typecheck.sml to match the following signature:

signature TYPECHECK =
sig
  val tc : Absyn.prog -> Absyn.prog
  val exp_type: Absyn.exp -> Absyn.tp
  val sub: Absyn.tp * Absyn.tp -> bool
  val join: (string->unit) -> Absyn.tp * Absyn.tp -> Absyn.tp
end
The tc function now produces a new program annotated with extra Constrain constructors. That is, as before, it checks the program for well-typedness, calling ErrorMsg.error if it finds problems. But in addition it new produces a new program with the following characteristics:
  1. If all the Constrain constructors were stripped out of both programs, they would be identical.
  2. Every Id is wrapped in a Constrain
  3. Every If is wrapped in a Constrain
  4. The argument and result of every Call is wrapped in a Constrain, that is, Constrain(Call(e1,Constrain(e2,t1)),t3). Make sure that t1 matches the formal-parameter type of the function, not necessarily the actual-parameter type of e2; because of subtyping they may be different.
  5. Both children of every Let are wrapped in a Constrain, that is, Let(id,Constrain(e1,t1),Constrain(e2,t2)).

The exp_type function tells the type of an expression. It may assume that this expression comes from the output of some call to tc, that is, every identifier is wrapped in a Constrain. Therefore, exp_type does not need to be given an environment mapping identifers to types. Furthermore, it may assume that the expression is already well typed. Therefore, if it sees (Op(Add,[e1,e2])) then it can just return Inttp without even looking inside e1 and e2.

Modify your compile.sml so that it calls codegen on the annotated absyn that comes from calling Typecheck.tc.

Unboxing references

First I will explain unboxing refs, then unboxing tuples.

Read the explanation of Frame-resident variables on pages 131-132 of the textbook. Read Calculating escapes on page 138-139. Now consider both the function f shown above, and the function g right here:

fun h(p: int ref) : int =
  !p
fun g(x: int) : int ref =
  let e = ref x in
      (e := h(e); 
       e)
The reference r in function f does not escape, so it can be kept in a register when f is compiled to machine code. But the reference e does escape, twice: it escapes downward when it is passed to h, and it escapes upward when it is returned from g. Either of these means "its address is taken", so it had better have an address, so it can't be kept in a register.

Escape analysis. In your codegen.sml, write a function ref_escape (id: Absyn.id) (e: Absyn.exp) that returns true iff the only uses of id within e are of the form, Op(Get, [Id id]) or Op(Set,[Id id,_]). You'll have to wade through all the Constrain constructors; in this version of codegen you cannot simply strip all the Constrain constructors out of the expression before generating code.

However, the following auxiliary function is useful:

 fun strip_head(A.Pos(_,e))     = strip_head e
   | strip_head(A.Constrain(e,_)) = strip_head e
   | strip_head e          = e        
Matching the pattern. To implement unboxing of refs, your gen_exp function needs new rules for the following three patterns:
Let(id,Ref(e1),e2)
Op(Get, [Id id])
Op(Set,[Id id,_]) Of course, it's not quite so simple, because those pesky Constrain constructors may get in the way, but the appropriate use of strip_head will be helpful.

To handle Let(id,Ref(e1),e2), if the id does not ref_escape within the expression e2, then you can simply move the result of e1 into a new temporary instead of calling alloc. If the id does ref_escape, compile it the old (boxed) way.

To handle Op(Get, [Id id]), if the id has been compiled as an unboxed reference, then you simply move its register into a new temporary instead of doing a load; and similarly for Set.

But when you find Op(Get, [Id id]), how do you know whether that id has been compiled as unboxed or boxed? The answer is that you record this information in the environment. Modify the value datatype to look like this,

datatype value = Reg of M.reg | Lab of M.lab | RefVar of M.reg

where RefVar means that the variable has been compiled as an unboxed reference. (Boxed references will be same as before, they will be Reg).

That's all there is to it. The rest of your compiler should not need any changes.

Unboxing function arguments

I will explain this one in much less detail, and you should be able to figure out how to implement it.

Let f be a function whose argument is an n-tuple, where n is between 0 and 4 inclusive. Instead of passing f a single argument that's a pointer to a boxed record, pass it n seperate arguments in registers $a0,$a1, etc.

To translate (Call(f,Constrain(Tuple[e0,e1,e2],Tupletp[t0,t1]))), don't allocate anything on the heap. Just generate e0 into a temporary (such as $x0), generate e1 into a temporary (such as $x1), generate e2 into a temporary, then move $x0,$x1 into $a0,$a1. Note that it's the Tupletp[t0,t1] that tells you how many actual parameters there are, not the Tuple[e0,e1,e2]; this is because of subtyping. But you still need to generate the code for e2, in case it has side effects.

To translate (Call(f,Constrain(e,Tupletp[t0,t1]))), where e is not necessarily a Tuple expression, you'll have to generate e into a temporary $x and then generate load instructions to fetch the fields of that $x into $a0,$a1.

To translate the entry into a procedure whose formal parameter type is a tuple with 4 or fewer fields: Let p be the formal-parameter variable. First you have to see whether p escapes. If every occurence of p is directly inside a Proj constructor, it does not escape; if there is some occurrence that is not in a Proj, then it does escape.

If it does not escape, then you just copy the first n $a registers to fresh temporaries. Record the names of those temporaries in a new value constructor in the environment. Whenever you compile Proj(i,Id(p)), you look up p to see if its ith projection is just a register.

If it does escape, then you make a heap-allocated record by calling alloc, then storing the first n $a registers into that record. Now you can compile the function in the normal way.

If you're lucky, the remaining phases of your compiler didn't make the assumption that every function has exactly one parameter, and everything should work as it.