Programming Assignment 4: Type Checker

Implement a type-checker for the Fun language.
  1. Copy all your as3 files into a new directory as4, then unpack as4.zip to get some new files: typecheck.sml, sources.cm, compile.sml. Note that sources.cm and compile.cm are new, and will overwrite your as3 versions of these files.
  2. If your own fun.lex and fun.grm work well, use them. Otherwise you may ask the Teaching Assistant for a working version.
  3. Your main job is to edit the file typecheck.sml to implement the functions tc, sub and join. The typing rules for the Fun language are described in the Fun language definition. You need to implement these rules. The main catch is that these rules are declarative. In other words, they are nondeterministic and do not directly specify an algorithm for type checking. You need to implement a (deterministic) algorithm for type checking that is sound and complete with respect to the nondeterministic rules. Your algorithm will be sound if whenever it says a program type checks, there exists some typing derivation in the nondeterministic, declarative type system for the program. Your algorithm will be complete if whenever there exists some typing derivation, your algorithm will find the typing derivation and say the program in question type checks.

    It turns out that this problem really only bites hard in the rule for if-then-else. First I'll describe how to handle everything besides if-then-else.
    The first part of defining the type checking algorithm is defining the subtyping function sub. When you call sub(t1,t2), the function should return true if t1 is a subtype of t2 and return false otherwise. The subtyping rules in the Fun language definition are nondeterministic and cannot be implemented directly primarily because of the transitivity rule. The transitivity rule says:

    t1 < t2         t2 < t3
    ------------------------ (transitivity)
    t1 < t3

    If we were trying to read this rule bottom-up like a function then it would say t1 is a subtype of t3 if we can magically conjure up some type t2 such that t1 is a subtype of t2 and t2 is a subtype of t3. Fortunately, if we restructure our other subtyping rules a little bit to make sure that overall our subtype in relation is transitive, we don't really need to implement the transitivity rule directly. In particular, this means it will be necessary to combine the rules for tuples and check all of the conditions for subtyping on tuples at once.

    The next part of the algorithm involves implementing the type checking algorithm for expressions themselves. This algorithm will invoke sub when necessary. Notice that in the declarative rules there is a troublesome rule called subsumption:

    G |-- e : t         t < t'
    -------------------------
    G |-- e : t'

    Implementing this rule directly in the function tc_exp is problematic because if we try to read this rule as a function bottom-up in which the arguments to the function are the context G, the expression e and the type t' then we will somehow have to "guess" what type t to recursively invoke tc_exp with. Moreover, the subsumption rule can always be used. A stupid type checker might try to check subsumption over and over again and go into an infinite loop.

    Instead, what we will do is call tc_exp ctxt e and expect it either to fail (by reporting an ErrorMsg.error) or to succeed and return the least type t that e can possibly have in the context ctxt.

    If the type-checker reports an error, it shouldn't necessarily give up. It should try to recover so that it can report other errors to the user. One way to recover is just pretend that the erroneous expression has type int and hope for the best.

    The only places that we will use subtyping in our algorithm is when we absolutely have to (or else type checking will fail). For instance, one place that we might need to use subtyping is when we have a function call f (e). If f has type t1 -> t2 and e has type t1' then our type checker should succeed if t1' is a subtype of t1. Then, the type of f(e) can be any supertype of t2; in order to report the least supertype of t2, then just return t2 as the result of tc_exp ("f(e)").

    Note: The type-checker needs to traverse expressions and maintain a context at the same time. Another program that also traverses expressions and maintains a context is eval.sml that you have already been given. That program does different things with its context, so you can't follow it exactly, but you could learn a lot from a careful reading of that program.

    The most difficult problem is the type-checking of if-then-else. Consider this Fun expression, in a context where f has type int->int:
    if x then <0, f> else <1,2,3>
    This expression is guaranteed to return a record whose first field is an int. That is, the least supertype of all the values that could be returned by this if-expression is <int>. Below I will describe the high-tech solution to this problem. Until you get everything else working, make the following approximation: the then-clause and the else-clause must have exactly the same type, that is,
    join complain (t1,t2) = if t1=t2 then t1 else (complain "t1 and t2 do not join"; t1)
    This is not faithful to the Fun semantics; it will reject some legal Fun programs. However, it will not accept any illegal Fun programs. (This algorithm is sound but not complete.)

  4. As usual, compile by invoking CM.make "sources.cm"; You can test your code by running Compile.compile "file".
  5. Checklist:
    • Did you check that no two functions have the same name? (Use efficient Symbol.table lookups, don't use a quadratic algorithm.)
    • Did you check that main is int->int?
    • Did you make sure to allow a parameter name the same as a global function name?
    • You should allow such silliness as (fun f(x:int):<int>=let x=3 in let x=<x> in x). That is, a "let" binding is permitted to hide an outer binding of the same name in a surrounding context.
    • You should even permit (fun g(int:<int>): <int> = int). That is, the name space of types (which in basic Fun has only "int" in it) is orthogonal to the name space of values.
    • Don't just give up at the first type error. Try to recover so that other errors can be found. The typical way to do this is to call ErrorMsg.error, then return the integer type, or the Tuple[] type, or whatever.
  6. After you have gotten everything else working technically, implement accurate type-checking for if-then-else. See the note below.
  7. After you have gotten everything working technically, and as time permits, improve the informativeness of your error messages. 10% of the points will be given on the basis of the subjective usefulness of the type-error messages.
  8. After you have done all the above, make incremental changes that increase the elegance, cleanliness, and readability of your program.
  9. Write some test cases. Collect 2 cases where typechecking succeeds (and should succeed) in files tc_success1.fun and tc_success2.fun, and include a brief description of the 2 programs (and their expected type) in your README. Collect 2 cases where typechcking fails in files tc_failure1.fun and tc_failure2.fun, and describe what the type violations consist of in README.
  10. Submit typecheck.sml, README, and the four test cases to dropbox here. In case you want to submit modified versions of fun.lex or fun.grm, you can use the "Upload optional file" feature.

Least common supertype

To properly implement the type-checking of if-then-else, you need to implement an algorithm that computes the least common supertype of two types, also called the join of two types.

When you call join complain (t1,t2) it should return a third type t3 such that t1 is a subtype of t3 and t2 is also a subtype of t3. If no such type exists, use complain to report an error (you'll want to pass join a complain-function that calls ErrorMsg.error with an appropriate position). Furthermore, join should return the "least" type t3. For example,

tc_join (<int, (int->int)>) (<int>) = <int>

Even though <> is a supertype of both <int,(int->int)> and <int>, it is also a supertype of <int>. Therefore, the join function should return <int> not <>. <int> is the least supertype of both <int,bool> and <int> and therefore it is a "join." Hint: there is a substantial twist when implementing the join function for function types. You'll need to implement a second, mutually recursive helper function in this case, to compute the greatest common subtype, called the "meet".

Due date: Monday, 28 March 2016, 9:00 PM