COS 320 - Fun Language Definition

Compiling Techniques, Spring 2012, Princeton University.

The Definition of Fun

In this document, we use ordinary font for nonterminal symbols in the grammar (eg: id). We use bold for keywords (eg: let, in).

Lexical Issues

  1. Identifiers (id). An identifier is a sequence of letters, digits, and underscores, starting with a letter. Uppercase letters are distinguished from lowercase.
  2. Comments. Comments may appear between any two tokens. Comments start with /* and end with */ and may be nested.
  3. White space. White space (newline, carriage-return, tab, space characters) may appear between any two tokens.
  4. Numbers (num) are 30-bit positive integers. (numbers may be preceded by a minus sign, but the minus sign should be lexed as a separate token.)
  5. Fun reserves the following keywords: fun in let while do if then else ref not type

Program Structure

A program is a sequence of (mutually recursive) function and type declarations.

prog ::= decl ... decl

decl ::= fundecl

Function Declaration Structure

A function declaration specifies the name of the function, its formal parameter, parameter type, return type and expression body. Functions are interpreted in call-by-value evaluation order. All functions must have different names (different functions may have the same formal parameter names). In this document we use the Greek letter τ for types.

fundecl ::= fun id ( id : τ ) : τ = exp

A program must contain a declaration of the function main,

fun main ( id : τ1 ) : τ2 = exp

where τ1 and τ2 are both int.

There is one predeclared (library) function printint of type int → ⟨ ⟩.

Type Structure

The types are int, n-ary tuples, single-argument/single-result function and references (ref).

τ ::= int | ⟨τ, ..., τ⟩ | ττ | τ ref | (τ)

The name "int" is not a reserved word, it is an ordinary identifier. The ref constructor has higher precedence (binds tighter) than the function type constructor (→). The function type constructor is right associative and the ref type constructor is left associative.

Expression Structure

Expressions are evaluated in left-to-right order. Expression structure is given by the following BNF.

un ::= - | not | ! | #i

bin ::= + | - | * | & |  ||  | = | < | :=

exp ::= (exp) | id | num | exp; exp | un exp | exp bin exp | ⟨exp, ..., exp⟩ | exp ( exp ) | exp : τ

| if exp then exp else exp | if exp then exp | while exp do exp | let id = exp in exp | ref exp

  1. Any expression may be parenthesized: (exp)
  2. Basic values include identifiers, ⟨ ⟩ (the unit value) and numbers. Sometimes numbers are interpreted as Boolean values (eg: in logical operations; if statements; while loops). In this case, 0 should be interpreted as "false" and non-zero should be interpreted as "true."
  3. Binary arithmetic and logical operations include &, ||, =, <, +, - , and *. The & operator is short-circuit conjunction, and || is short-circuit disjunction. Equality (=) operates exclusively over integers. The assignment operator is :=
  4. Unary operations include - (unary minus) and not (logical negation: 0 ⇒ 1; non-zero ⇒ 0). The unary operator #i extracts the ith field from a tuple. Tuples are indexed starting with 0. The i in #i must a non-negative integer and #-0 is not legal.
  5. The operators ref, :=, and ! creates a reference, assigns a value to reference and dereferences a reference respectively.
  6. If expressions (if exp then exp else exp) execute the "then" branch if the first expression evaluates to a non-zero integer and execute the "else" branch if the first expression evaluates to 0. The else branch extends as far as possible. The else branch may be omitted, in which case it is equivalent to "else ⟨ ⟩".
  7. N-ary tuples (N >= 0) are comma-separated sequences separated by angle-brackets: ⟨exp,...,exp⟩.
  8. Precedence levels are as follows (1 is tightest binding):
    1. #i, ref, function call, !, unary minus
    2. *
    3. +, binary minus
    4. =, <
    5. not
    6. &, ||
    7. :
    8. :=
    9. if-then-else, do-while
    10. ;
    11. let-in
  9. All binary operators are left-associative.
  10. In a call expression, the function argument is surrounded by parens. Function call is left-associative so f (x) y is (f (x)) y.
  11. It is legal to use the name int as a function name, parameter name, or variable name; this does not interfere with its use as a type name.
(This informal definition of Fun-language syntax is hard to use and imprecise. Much better, in a good real-life language definition, would be to have a formal LR(1) grammar for the language. But that's just what you have to build in the Parser assignment.)

Typing

Fun has a strong type system with subtyping. The following sections define the typing rules for the language.

Subtyping

The judgment (τ1 < τ2) states that τ1 is a subtype of τ2.


-------

τ < τ


τ1 < τ2   τ2 < τ3

------------------------

τ1 < τ3


τ0 < τ0' ... τn < τn'

---------------------------------

τ0, ..., τn⟩ < ⟨τ0', ..., τn'⟩


(0 <= n < k)

-------------------------------------------

τ0, ..., τn, ..., τk⟩ < ⟨τ0, ..., τn⟩


τ1' < τ1    τ2 < τ2'

---------------------------

τ1 → τ2 < τ1' → τ2'


τ < τ'    τ' < τ

--------------------

τ ref < τ' ref

Expression Typing

The judgment (G |-- exp : τ) states that expression exp has type τ in context G. A context G is a finite partial map from identifiers to types. The context "nil" is the empty context. The context G [x : τ] is the same as G except it maps identifier x to τ. The following rules define the expression typing judgement.

------------------

G |-- id : G(id)


----------------

G |-- num : int


G |-- exp1 : τ1     G |-- exp2 : τ2

---------------------------------------

G |-- exp1 ; exp2 : τ2


optype (un) = τ1→ τ2    G |-- exp : τ1

-----------------------------------------------

G |-- un exp : τ2


optype (bin) = τ1→ τ2 → τ3    G |-- exp1 : τ1    G |-- exp2 : τ2

-------------------------------------------------------------------------------

G |-- exp1 bin exp2 : τ3


G |-- exp0 : τ0   ...  G |-- expn : τn

--------------------------------------------

G |-- ⟨exp0, ..., expn⟩ : ⟨τ0, ..., τn⟩


G |-- exp : ⟨τ0, ..., τn⟩

------------------------------- (0 <= i <= n)

G |-- #i exp : τi


G |-- exp1 : τ1 → τ2    G |-- exp2 : τ1

-----------------------------------------------

G |-- exp1 exp2 : τ2


G |-- exp1 : int    G |-- exp2 : τ    G |-- exp3 : τ

--------------------------------------------------------

G |-- if exp1 then exp2 else exp3 : τ


G |-- exp1 : int    G |-- exp2 : ⟨ ⟩

--------------------------------------------------------

G |-- if exp1 then exp2 : ⟨ ⟩


G |-- exp1 : int    G |-- exp2 : ⟨ ⟩

--------------------------------------

G |-- while exp1 do exp2 : ⟨ ⟩


G |-- exp1 : τ1    G [x : τ1] |-- exp2 : τ2

-------------------------------------------------

G |-- let x = exp1 in exp2 : τ2


G |-- exp : τ

----------------------

G |-- ref (exp : τ) : τ ref


G |-- exp : τ ref

----------------------

G |-- ! exp : τ


G |-- exp1 : τ ref    G |-- exp2 : τ

---------------------------------------

G |-- exp1 := exp2 : ⟨ ⟩


τ=τ'    G |-- exp : τ'

--------------------------- (type constraint)

G |-- exp : τ : τ

(Note that in the rule above, the first colon is part of program syntax and the second colon is part of our description of typing rules.)

G |-- exp : τ' τ' < τ

--------------------------- (subsumption rule)

G |-- exp : τ


The optype function is defined by the following table.

operator typeof (operator)
- (unary) , not int → int
+ , - (binary) , * , & , || , = , < int → int → int


Function Declaration Typing


G [id2 : τ1] |-- exp : τ2

----------------------------------------------------------

G |-- (fun id1 ( id2 : τ1 ) : τ2 = exp) ok


Program Typing


contextof(decl1 ... decln) = G

G |-- decl1 ok   ...  G |-- decln ok

----------------------------------------------------

|-- decl1 ... decln ok


where

  1. None of the fundecls declare the same function name (argument names may be repeated).
  2. contextof(decls) = G means,
    • G(x)=τ1→τ2 if and only if x=printint, τ1=int, τ2=⟨ ⟩, or if one of the decls is
      (fun x ( id : τ1 ) : τ2 = exp)