COS 441 - Type Inference - April 11, 1996

Type Inference

To see how to build an algorithm for inferring types when they are omitted from the language, one can reformulate the application rule to allow any types for the subexpressions, but generate a constraint on the side:

A |- e[1] : t[1]     A |- e[2] : t[2]
-------------------------------------  t[1] = t[2] -> a
A |- (e[1] e[2]): a                   
For the lambda rule we make a guess, a, at the type of the argument.
A[x |-> a] |- e : t
----------------------------
A |- (lambda (x) e) : a -> t
Applications in the lambda's body may place constraints on the guess. Here is an example:
0[x |-> a] |- succ : num -> num   0[x |-> a] |- x : a
-----------------------------------------------------  (num -> num = a -> b)
0[x |-> a] |- (succ x) : b
-----------------------------------
0 |- (lambda (x) (succ x)) : a -> b
We can solve constraints at any time. We might construct the entire tree first, then solve the collected set of constraints afterwards. Or, we can solve the constraints as we generate them by substituting variables in the tree. This has the advantage that we don't have to manipulate a collection of constraints (because they are always solved), at the cost of requiring us to be able to substitute for type variables wherever they appear in the tree.

To substitute for type variables wherever they appear in the tree, we use a box to represent a type variable. If the box is empty, it is a variable. If it contains a type, then this type has been substituted for the variable (and the variable no longer exists). To aid in printing types, each empty box will contain a number.

(define make-type-var
  (let ((n 0))
    (lambda ()
      (set! n (+ n 1))
      (make-box n))))

(define subst-type-var!
  (lambda (v t)
    (set-box! v t)))

(define type-var?
  (lambda (t)
    (and (box? t) (number? (unbox t)))))
The representations for types are now:
t ::= num | bool | (t -> t) | empty-box | (box t)
where we consider (box t) = t. Now let's write the type checker.
(define type-check
  (lambda (e A)
    (variant-case e
      (Const (value) ... )
      (Var (name) (lookup A name))
      (Lam (name body) (let* ((v (make-type-var))
                              (bt (type-check body (extend A name v))))
                         (list v '-> bt)))
      (Ap (fun arg) (let ((ft (type-check fun A))
                          (at (type-check arg A))
                          (v (make-type-var)))
                      (unify ft (list at '-> v)) 
                      v)))))

(define unify
  (lambda (a b)
    (cond ((eq? a b) #t)
          ((and (arrow? a) (arrow? b))
           (unify (dom a) (dom b))
           (unify (ran a) (ran b)))
          ((type-var? a)
           (if (occurs? a b)
               (error "incorrectly typed")
               (subst-type-var! a b)))
          ((type-var? b)
           (unify b a))
          ((box? a)
           (unify (unbox a) b))
          ((box? b)
           (unify (unbox b) a))
          (else (error "not typable")))))
This algorithm can build big representations of simple types like (box (box (box 'bool))). To avoid this as much as possible, move the last two clauses up before the first type-var?, but be careful as a variable answers true to box?.

The unification algorithm is do to Robinson (67). It is used in many other contexts including PROLOG.

Lists

Suppose we want to add cons, car, cdr, (), pair?, null? to the typed version of Scheme. First let's recall what a list is:

list-of-thing ::= () | (cons thing list-of-thing)
Since both () and cons are lists we give them the same type. This type is (t list), where t is the type of elements in the list:
   () : (t list)
 cons : (t -> ((t list) -> (t list)))    (cons = (lambda (x) (lambda (y) ...)))
  car : ((t list) -> t)
  cdr : ((t list) -> (t list))
pair? : ((t list) -> bool)
null? : ((t list) -> bool)
for any type t. These operations are said to be polymorphic because they work with values of many different types: "poly" = many and "morph" = variety.

It is easy to add polymorphic primitives to our type checker. When encountering a primitive, construct a type for it using new type variables appropriately. For example, for each occurrence of car build the type ((a list) -> a) using a fresh variable a. The use of car will place constraints on a so that car is used at a specific type. So (car '(1)) will force a = num.

Since there is only one element type for a list, a given list must contain elements of all the same type. ((cons 1) ((cons #t) '())) will be rejected because ((cons #t) '()) builds a (bool list), and the first use of cons requires 1 to have type bool.

Now it is clear why static type systems reject some "correct" programs. In our typed language, we can only build lists whose elements are all the same type, but in Scheme we can put anything in any list so long as it is used appropriately.

What about user-defined polymorphic operations, like sort? We use the same idea as for primitives: each individual use of a user-defined operation must have a single type. To identify uses, we require polymorphic procedures be bound by let-expressions. Then each bound occurrence of a let variable may have a distinct type. This leads to the following type inference rule:

A |- e[1] : t[1]     A |- e[2][x -> e[1]] : t
---------------------------------------------
A |- ((let ((x e[1])) e2) : t
The second antecedent of this rule allows a different type for each occurrence of x in e[2] by substituting a copy of e[1] at each x. Each copy of e[1] can be typed differently. For example.
(let ((p (lambda (x) ((cons x) '()))))
  (p 1)     // p: (t -> (t list))
  (p #t)    // p: (num -> (num list))
  (p p))    // p: (t -> (t list)) -> ((t -> (t list)) list)
Now let's try a more interesting example. But first we need to add letrec because Y is not typable.
A[x |-> t[1]] |- f : t[1]    A[x |-> t[1]] |- e[2] : t[2]
---------------------------------------------------------
A |- (letrec ((x f)) e[2]) : t[2]
In this kind of type checker, we guess a to construct the type environment A[x |-> a], use this type environment to find a type t[1] for e[1], then use unify to ensure our guess a matches t[1]. It is tempting to think that we can use the type system to force f to be a lambda, but no, we can only force it to be a procedure which could be computed.

Now we are ready to do map.

(let ((map ((letrec ((map (lambda (f l)
                            (if (null? l) '()
                                ((cons (f (car l))) ((map f) (cdr l))))))))
            map)))
  ((map (succ) ((cons 1) ((cons 2) '()))))
  ((map (not) ((cons #t) ((cons #f) '())))))
Final note: real type systems for ML don't work by substituting e[2][x |-> e[1]] when type checking let. They assign polymorphic types like cdr: (for-all a) (a list) -> (a list) to let variables. But the effect is exactly the same.

Exercise