Caml
Power

Problem Set 4: Interpreters and Proofs

Quick Links:

Objectives

In this assignment, you will develop an environment-based interpreter to replace the substitution-based interpreter defined in class. You will do this assignment individually.

Getting Started

We have used a parser generator for OCaml called menhir and a few extra libraries from a package called "batteries." In order to compile the code we hand out, you'll need to have those things installed on your machine. Be sure to do this before you start:

opam install batteries menhir
If you already have those packages installed, nothing goes wrong if you try to install them again. Once you have done that, download the code in a4.zip. Unzip it. Go in to the directory and look at its contents. You should see:

  • A collection of .ml files. Do not be overwhelmed by the number---you will only need to look at a few of them most likely. The others are infrastructure. We will talk more about the infrastructure and how it is organized in the COS 326 JRW in case folks want to build an interpreter like this, or extend this one, as an IW project.
  • A collection of .mli files. These are interfaces that describe what functions the corresponding .ml file must have. You can take a look but you can also ignore them if you want. We will discuss modules and interfaces in the 2nd half of the course.
  • A Makefile to simplify building the main program from its many sources.
  • A .merlin file to configure merlin for you. Before merlin will be particularly helpful, you will have to compile so that Merlin can find definitions and dependencies.
  • An .ocamlinit file to load the appropriate files in to the OCaml toplevel if you use it. As with merlin, before starting the toplevel, you will have to compile your binaries.
  • proofs.txt, which contains part 2 of the assignment.
  • Tests, A directory containing a few programs in our new little language (highly reminiscent of OCaml!) in .txt files for testing.

A few important things to remember before you start:

  • This assignment is to be done individually.
  • Make sure that the functions you are asked to write have the correct names and the number and type of arguments specified in the assignment. Programs that do not compile will be subject to penalties on top of regular deductions.
  • In this problem set, it is important to use good style (style will factor in to your grade). Style is naturally subjective, but the COS 326 style guide provides some good rules of thumb. As always, think first; write second; test and revise third. Aim for elegance.

Part 1: Environment-based Evaluator

In class, we developed a substitution-based interpreter in OCaml. While substitution makes for a good theoretical model of how OCaml programs evaluate, practical implementations use an environment-based model of execution. Fortunately, the two models coincide, so you can safely use substitution to reason about your programs while your interpreter uses environments to implement them.

In this part of the assignment, you will implement an environment-based interpreter for a simplified OCaml-like language (we will call it MiniML). Here are the key .ml files you need to look at---browse them first before starting to program:

  • Syntax.ml: This file defines the abstract syntax of the language. It includes the type operator (for things like addition and subtraction), the type constant (for integers 1, 2, 3 and booleans true and false), and the type variable for identifiers. It also contains the type exp, which tells all the different kinds of things that count as expressions in the language: integers, pairs, lists, and recursive functions. This file also contains a definition of environments and a few small functions you must complete. It is really important that you understand the intended meaning of each different element of the type exp. We have given the data type constructors suggestive names but if you have questions, one place to look is in EvalSubst.ml (see below).
  • EvalSubst.ml: This file defines a call-by-value, substitution-based interpreter. It gives you a reference implementation to ensure that you understand the semantics of recursive functions, variables, let expressions, etc. You do not have to change this file but looking at it is probably a good idea
  • EvalEnv.ml: A call-by-value, environment-based interpreter. In this interpreter, variable bindings are stored in an environment. When we encounter a variable in our code, we look up the value of the variable in the environment. In addition, to evaluate a function definition, we create a closure, which is a pair of the recursive function definition and its environment. Your job will be to implement this interpreter. When it generates results that do not include functions, those results should be identical to the results generated by the substitution-based interpreter. In other words, the substitution-based interpreter is your specification.

Part 1.1

Your main task is to implement the environment-based interpreter. To do this, you will initially have to modify 2 files:

  • In Syntax.ml, fill in the small definitions of lookup and update. (See Syntax.ml for a description of what they should do.)
  • In EvalEnv.ml, examine is_value. This defines the elements of the language that are values. Next, look at eval_body, which is the bulk of the interpreter. We have given you 1 case of the interpreter: how to evaluate variables. That case simply looks up a variable in the current environment. You should not change this case. The other key cases are:
    • The case for Rec. Rec is the definition of a function, which might be recursive. In this case, you will have to return not a Rec (like in the subsitution-based interpreter), but instead a Closure, which packages the components of the function definition up with an environment.
    • The case for function application. We will let you figure out what to do there.
    Of course, you will also have to fill in cases involving pairs, lists and match statements.

Testing

There are a few different ways to run and test the interpreter you have constructed:
  • Quickcheck.ml: We recommend you start your testing by looking at this file. In fact, you can do all your testing using this file. At the top, what it does is define a collection of expressions using the abstract syntax (found in Syntax.ml). For example, it defines the abstract syntax for the expression 0:
    let zero = Constant (Int 0) 
    
    Then it goes on to define a few more expressions. Define any expressions you would like to test your work on. Further down in the file, you will see the following declaration.
    let examples = [
      zero;    (* 0 *)
      one;     (* 1 *)
      p1;      (* (1,2) *)
      let1;
      let2;
      let3;
      swap_p1; (* swap (1,2) *)
      fact4;   (* fact 4 *)
      sl4;     (* sumlist [1;2;3;4] *)
    ]
    
    This declaration generates the list of examples that will be used to test your interpreter. To run the tests, type the following at the prompt $:
    $ make qc
    $ ./Quickcheck.byte
    

    Quickcheck.byte will run EvalSubst.eval as well as EvalEnv.eval on both expressions and compare the answers. If the answers are the same, you win---your interpreter is doing the same thing as the substitution-based interpreter, which is the goal. Note, if the test is successful, nothing is printed. If a test fails then some debugging information is printed (the expression evaluated and the results given by the two interpreters. If your evaluator throws an exception, the exception will be caught and that will be reported. At the end, it prints out the number of successful tests.

    Probably the best way to test things is to add more expressions of your own creation to the list of examples. As always, test your functions on small examples first. You do not have to hand in Quickcheck.ml so you can do what you like with it.

  • Tests: This folder contains some small example programs (the .mml indicates a "MiniML" file). You can try your interpreter on these programs by compiling with
    $ make main
    
    And then running the program by supplying the name of a MiniML program file:
    $ ./Main.byte Tests/math.mml
    
    You can also try running several of the files and comparing outputs between env-based and subst-based interpreters by doing the following. (You can edit Debug.ml in the obvious way to change the set of programs it runs in case you want to write your own. Be forewarned that we have not tested the parser much and its error messages are unlikely to be "amazing". Note that our little language does not support type inference or polymorphism. You must put type annotations on your functions. In addition, our pattern matching statements only recognize [] and hd::tl as patterns. Note also that we have not implemented the greater-than symbol.)
    $ make debug
    $./Debug.byte
    

Part 1.2

Note: This part is only worth a relatively small amount of the total grade (see at the bottom of this page for grading information). Give it a try but if you do not have time to finish it, do not stress too much---it is not such a big deal.

In functional programming language implementations, it is important to optimize the size of the closures that are created. The most naïve thing one can do when creating a closure is to put ALL variables in the current execution environment into the closure's environment. However, any given function only requires access to its free variables. For a refresher on free and bound variables read this note.

Some key examples:

  • expression: x + 2
    free variables: x
  • expression: let x = 3 in x + 2
    free variables: none (x is bound by the let statement)
  • expression: rec f x = x + y
    free variables: only y (x is bound as a function parameter)
  • expression: closure [y=3] f x = x + y
    free variables: none (x is bound as function parameter, y is found in environment)

The environment that a Rec is evaluated with to form the closure could have many superfluous bindings that can be pruned, including duplicate bindings of the same variable or variables that do not appear in the function. Revise your implementation of the Rec case of the interpreter to create minimal closures — closures where the environment contains only the variables that are free variables in that function. To do so, you'll need to write a function that analyzes an arbitrary function (and consequently, you will need to analyze an arbitrary expression as well) to extract the free variables of the function, and include only these bindings in the environment.

Part 2: Program Correctness

NEW: Please SKIP Part 2

Handin Instructions and Grading Information

This problem set is to be done individually.

You must hand in these files to TigerFile (see link on assignment page):

  1. Syntax.ml
  2. EvalEnv.ml

There are 30 total points in this assignment, broken down as

  • 25 for part 1.1
  • 5 for part 1.2