# **Typed Machine Language and its Semantics**

Kedar N. Swadi Princeton University kswadi@cs.princeton.edu Andrew W. Appel Princeton University appel@cs.princeton.edu

## ABSTRACT

We present TML, a new low level typed intermediate language for the proof-carrying code framework. The type system of TML is expressive enough to compile high level languages like core ML to and can be guaranteed sound. It is also flexible enough to provide a lot of freedom for low-level data representations. We can model real machine instructions in TML, and thus avoid high-level opaque operations like memory allocation and perform provably safe optimisations like array bounds check eliminations. Most important, TML has a semantic model.

## 1. INTRODUCTION

Proof-carrying code (PCC) [11] is a framework for the generation of provably safe programs. In this framework, an untrusted program is accompanied by a proof of its compliance with some predefined safety policy (resource access, type safety, or memory safety). The host mechanically checks the proof for correctness to ensure safety of execution of the untrusted code. Though this technique is general enough to encode a very wide range of safety properties, existing PCC systems [10] suffer from a lack of flexibility with respect to the high-level source languages they translate from and the target machine architectures they compile to.

Building on a semantic model of machine-level types [4], we design TML, a new low-level language that is general enough so that a wide variety of high-level languages may be compiled to it, and is also retargetable to different machines.

TML makes the following contributions :

- We have a semantic model for TML types based on a very small set of axioms. Our trusted computing base comprises the rules of higher-order logic augmented by some elementary facts about number arithmetic. All TML types are modelled as (defined) predicates, and type inference rules are lemmas for which we provide machine-checkable proofs. Previous approaches to low-level semantics [9, 11] did not provide models, but only syntactic consistency results.
- Low-level type constructors for type intersections and address arithmetic give front-end compilers more freedom in data layout than high-level (TAL-style) records or objects would.

## Preliminary version, July 2001

- TML is really a machine language, with instructions specified by integer opcodes; thus we do not need to trust an assembler.
- Unlike TAL [9] or DTAL [16], we can model each machine instruction in TML, even those occuring in sequences allocating a heap record. While TAL and DTAL have an atomic "malloc" instruction that expands into several real machine instructions, TML is expressive enough to reason about the intermediate states in the malloc subroutine; similarly for multi-instruction case-discrimination sequences. Each TML instruction corresponds to a single instruction on a real machine like SPARC. This allows a compiler to perform provably safe optimisations and the trusted computing base is also made smaller as a result.
- We can encode complex dataflow facts within the type system. Tracking dataflow allows us to reason about intermediate machine states and thus make operations like sum-type discrimination nonatomic in a safe way.
- Like other PCC systems, TML has no decision procedure: we assume that a type-preserving compiler from a safe source language produces hints and invariants useful in constructing machine-checkable proofs. However, because each typing rule is proved separately as a derived lemma (not through a global syntactic metatheorem), we find it easy to add ad-hoc rules for the convenience of the prover or compiler (in the case where no change is necessary to the semantic model).

#### 2. TYPED MACHINE LANGUAGE

TML is stratified into the levels of *kinds*, *types* and *values*. To reduce complexity of the system we have only two kinds in TML. There are no other kind constructors in TML like functions or pairing. This simplifies the semantic model. The type constructors are sufficiently low-level to allow translation from a wide range of source languages into TML. We cannot model quantification over higher-order kinds, which are required by the polymorphic typed lambda calculus  $F_{\omega}$ . We can extend TML to higher kinds such as  $\Omega \rightarrow \Omega \rightarrow \Omega$ , but we can't quantify over these higher kinds; still, this is enough to do type constructors in core ML.

### 2.1 TML Kinds

To reason about machine-level programs with machine-level types, in TML we have kinds for *instructions* and *type maps*. *Type maps* are judgements on a vector of values associating them with their types, and have the kind  $\Omega$ . In our semantic model, we can use type maps also to represent scalar types (judgements on a single value, not a vector) and also to represent numbers. The main motivation to unify the three kinds into one was to have a smaller set of quantification operators and also fewer rules for specifying static semantics.

#### **2.2** Types $(\tau : \Omega)$

In TML, the state comprises a register bank R and a memory M. All accessible values live in registers of R. These values may point to data in memory M. If x is a number then R(x) is the value of the xth register; informally, we write  $r_x$ . Unlike TAL, which distinguishes address values from integer values, our machine-level calculus uses numbers to index memory.

As in the Appel-Felty semantic model of types [4], types are encoded as predicates on values. Then,  $M \vdash r_x : \tau$  if M and R(x)satisfy the predicate  $\tau$ . TML has the following primitive types:

- $\top$ ,  $\perp$ : Every value has type  $\top$ ; no value has type  $\perp$ .
- id : The identity type-constructor i.e.,  $id(\tau) = \tau$ .
- box(τ): The box type constructor is for memory references.
   ν: box (τ) if the content of M(ν) satisfies the predicate τ. box makes immutable references. TML can accommodate mutable references using the semantic model of Ahmed, Appel, and Virga [2], but we will not show details here.
- $rec(\tau)$ : This constructor allows us to define recursive types.
- offset(i)(τ) : The value v has this type if v + i satisfies τ. This constructor is useful wherever address arithmetic is required. For example, if location R(x) contains a pointer to a record with two fields [0 : τ<sub>1</sub>, 1 : τ<sub>2</sub>], we could say r<sub>x</sub> : offset(1) (box τ<sub>2</sub>). A convenient abbreviation, field c τ is the same as offset(c) (box τ).
- τ<sub>1</sub> ∩ τ<sub>2</sub> is the type of a value satisfying predicates for both τ<sub>1</sub> and τ<sub>2</sub>, while a value of type τ<sub>1</sub> ∪ τ<sub>2</sub> satisfies predicates for at least one of types.
- ∀,∃: These are the universal and existential quantifiers. We allow quantification over Ω but not over *INSTR*. We use de Bruijn indices [6] (see Section 3.1) rather than variables, so ∀ or ∃ implicitly binds a de Bruijn variable. Informally, however, we will often show variables with the quantifiers.
- rec : General (covariant, contravariant) recursive types are written as rec τ, where rec implicitly binds a de Bruijn index that may be free in τ. For appropriate ("contractive") expressions τ, we have rec τ is a fixed point of τ, which is written as rec τ = τ[rec τ · id] in the calculus of explicit substitutions.
- codeptr : The judgement ν : codeptr φ holds if v is a code pointer with formal parameters φ. That is, it is safe to jump to location v if the registers satisfy φ. The formal parameters are modelled as a type map (Section 2.4).
- at(ı, s): If at memory location l, we have a TML instruction t of size s, then l: at(ı, s).

## **2.3** Integers $(n:\Omega)$

In TML, we can specify the types of integers, and constant and bounded integers.

- int<sub>π</sub>(i): The type of integers x such that x π i. For example, int<sub><</sub>100 is the type of integers less than 100, and int<sub>=</sub>(c) is the singleton type of integers equal to c. For example, the type of machine integers is int<sub>32</sub> = int<sub>≥</sub>0∩int<sub><</sub>2<sup>32</sup>.
- +: This type captures the result of additions. After executing  $r_i \leftarrow r_j + r_k$  where  $r_j : \text{int}_{=}n_1$  and  $r_k : \text{int}_{=}n_2$ , we infer  $r_i : \text{int}_{=}(n_1+n_2)$  Other arithmetic operators like subtraction or multiplication on integers can also be easily defined, as can modular arithmetic.

These constructors allow us a great deal of flexibility for data representation. Consider, for example, a list-of-integers datatype. List cells could have untagged or tagged representations. In an

|                      |     | Kinds                                  |
|----------------------|-----|----------------------------------------|
| κ                    | ::= | Ω                                      |
|                      |     | INSTR                                  |
|                      |     | Types $(\mathbf{\Omega})$              |
| $\tau, \phi, \Gamma$ | ::= | TIL                                    |
|                      |     | codeptr φ                              |
|                      |     | offset $(n, \tau)$                     |
|                      |     | id τ                                   |
|                      |     | box τ                                  |
|                      |     | rec τ                                  |
|                      |     | $\tau \cap \tau' \mid \tau \cup \tau'$ |
|                      |     | $\forall \tau \mid \exists \tau$       |
|                      |     |                                        |
|                      |     | $\{n: \tau\}$                          |
|                      |     | $\phi \setminus n$                     |
|                      |     |                                        |
|                      | ļ   | $\operatorname{int}_{\pi} n$           |
|                      |     | $n_1 + n_2$                            |
|                      | i.  | -+()                                   |
|                      |     | $\operatorname{at}(\iota,n)$           |
|                      |     | <u>n</u>                               |
|                      |     | Instructions (INSTR)                   |
| ι                    | ::= | $instr(\Gamma, \phi, \phi')$           |
|                      |     | Vιι                                    |
|                      |     | ∃ <sub>l</sub> l                       |
| π                    | ::= | $>  \geq  <  \leq  =  \neq$            |

Figure 1: TML Syntax : Kinds and Types

untagged scheme (Figure 2a), if  $r_1$  is a pointer to a list, assuming a 4-byte word, we represent this as

 $r_1: \operatorname{int}_{=}0 \cup (\operatorname{int}_{\neq}0 \cap (\operatorname{field} 0 \operatorname{int}_{32}) \cap (\operatorname{field} 4 \tau))$ 

where the left union type is the nil case and the right is the cons case. If  $r_1$  contains a non-zero value it points to a record of two fields, the first being the data and the second being the pointer to the next cell. This makes pointers non-abstract in TML. (This example is a simplification which avoids dealing with the recursive nature of the list data type; see section 3.1 for a discussion of recursive types.)



#### **Figure 2: Data Representation**

To get a tagged representation (Figure 2b) we write

$$r_1: \quad (\mathsf{field} \ 0 \ \mathsf{int}_{=}0) \cup \\ (\mathsf{field} \ 0 \ \mathsf{int}_{-}1 \ \cap \ \mathsf{field} \ 4 \ \mathsf{int}_{32} \ \cap \ \mathsf{field} \ 8 \ \tau)$$

In this case, a nil cell has a tag of 0, and the cons cell has a nonzero tag field followed by the data and the next-cell pointer.

#### **2.4** Type Maps $(\phi: \Omega)$

A type map is a collection of typing judgements for registers, associating each register  $r_x$  to some type  $\tau_x$ . For most purposes, type maps serve the function of environments. If any register is not explicitly referenced in the environment, it is assumed unconstrained.

- The empty type map is  $\top$ .
- {*i*: τ}: This is a *singleton* type map, where the only judgement is that register *r<sub>i</sub>* has type τ.
- φ<sub>1</sub> ∩ φ<sub>2</sub> : If register bank R : φ<sub>1</sub> and R : φ<sub>2</sub>, we have R : φ<sub>1</sub> ∩ φ<sub>2</sub>. We also write {*i* : τ<sub>1</sub>, *j* : τ<sub>2</sub>} for {*i* : τ<sub>1</sub>} ∩ {*j* : τ<sub>2</sub>}.
- φ\i contains the type judgements for all registers in φ except for the r<sub>i</sub>.

Existential and singleton types allow us to capture dataflow information which can be used to relate the contents of two registers. For example, if R(i) = R(j), we can express this fact as

$$\exists n. \{i: int = n, j: int = n\}$$

A more general type map, relate, captures a wider class of such register relations.

$$\begin{array}{l} \mathsf{relate}_{\pi}(F,G)(i,j) \equiv \\ \exists n. \\ \{r_i : F \; (\mathsf{int}_{\pi}(n))\} \cap \{r_j : G \; (\mathsf{int}_{=}(n))\} \end{array}$$

We use the following example to illustrate the use of relate. Let  $\phi_1 = \{r_i : box \tau\}$ . We fetch the contents of the memory (M) at R(i) into register  $r_j$ ,  $(r_j \leftarrow M[r_i])$ . This operation results in a new environment,  $\phi_2$ . Trivially, R(j) = M[R(i)] in  $\phi_2$ . This fact is expressed using the relate constructor as relate=(box, id)(i, j). In the definition above, the existential is instantiated with M[R(i)]. This instantiation can be shown to satisfy the two sub-environments. We now have

 $\phi_2 = \{i : box \ \tau\} \cap relate_{=}(box, id)(i, j)$ . From this environment, we have lemmas which can infer that  $r_j : \tau \in \phi_2$ . We can also use relate to reason about the safety of certain optimisations. For example, Section 7.3 explains how to perform provably safe sum-type discriminations using relate.



Figure 3: relate Example

## **2.5** Code Context $(\Gamma : \Omega)$

Program safety can be proved from type safety. Thus, we ensure that certain typing judgements hold before and after the execution of every instruction. Consider an instruction at location l which requires its arguments to be of certain types. For example, a load instruction might require its source register to be of a field type. In terms of Hoare logic, if the register bank R satisfies some precondition, it is safe to jump to location l. This precondition on R is represented as the type map  $\phi$ . Thus we have l being of type codeptr( $\phi$ ). This fact itself can be encoded as the type map  $\{l: codeptr(\phi)\}$ , though l is an address indexing into the code and

not the register bank. This type map describes the local typing invariants at location l. Using intersection  $(\cap)$ , we collect the local invariants at all program locations. This resultant map is  $\Gamma$ .

As a running example, consider the SPARC program in Figure 5 which adds up the elements of a list.



Figure 4: List Representation in Memory

In this program, register 01 is a list value, which is either 0 or a pointer to a two-word record containing an integer and a list value. A length-3 list containing [1,2,3] is shown in Figure 4. Register 02 is a temporary to hold the data till it is added to the accumulator in register 03. SPARC uses delay slots after branch instructions. Instructions which go in delay slots are shown prefixed with a "\*", and they are executed whether or not the branch is taken.

For this program to be safe, it must start with register 01 having the type of a pointer to a list, and register 07 should have the return address, i.e. it should have the type of a code pointer which expects register 03 to have an integer. Assuming definitions for high-level types like list\_ty for lists (as shown at the end of Section 3.1), we have the precondition  $\phi_0$  for address 0 to be

$$\phi_0 = \{o1 : |ist_ty\} \cap \{o7 : codeptr\{o3 : int_{32}\}\}$$

If this condition holds, then it is safe to jump to location 0. The compiler provides invariants  $\phi_l$  for each location *l*, which are combined into

$$\begin{split} \Gamma = & \{0: \mathsf{codeptr}(\phi_0)\} \cap \\ & \{4: \mathsf{codeptr}(\phi_4)\} \cap \\ & \vdots \\ & \{36: \mathsf{codeptr}(\phi_{36})\} \end{split}$$

One of the goals of TML is to be able to work with multiple target machines. Unlike TAL or PCC, which assume the correctness of the assembler, we wish to have evidence that the code was assembled correctly. Hence, proofs of safety of programs must also include proofs that each word in the code part of the memory matches its semantic specification. For each machine, we need to know the instruction semantics, or how instructions in that machine change the state. Assuming that program location *l* has value *v*, we must prove that this value decodes to some instruction (described in Section 2.6) t, of size *s*, which has the required semantics. Informally, we must prove that the contents of the "opcode" column in Figure 5 implement the contents of the "pseudo code" column.  $\Delta$  encodes the contents of the part of the memory containing the program as intersections of singleton type maps  $\{l : box(int=v)\}$ 

| Address |        | Program           | Pseudocode                       | opcode     |
|---------|--------|-------------------|----------------------------------|------------|
| 0       |        | tst %ol           | R[o1] == 0 ?                     | 0x80900009 |
| 4       |        | mov 0, %o2        | $R[o2] \leftarrow 0$             | 0x94102000 |
| 8       |        | ba entry          | goto entry                       | 0x10800005 |
| 12      | *      | mov 0, %o3        | $R[o3] \leftarrow 0$             | 0x96102000 |
| 16      | loop:  | ld [%o1],%o2      | $R[o2] \leftarrow M[R[o1]]$      | 0xd4024000 |
| 20      |        | ld [%o1+4],%o1    | $R[o1] \leftarrow M[R[o1] + 4]$  | 0xd2026004 |
| 24      |        | tst %ol           | if R[o1] <> 0 ?                  | 0x80900009 |
| 28      | entry: | bne loop          | goto loop                        | 0x12bffffd |
| 32      | *      | add %o3, %o2, %o3 | $R[o3] \leftarrow R[o2] + R[o3]$ | 0x9602c00a |
| 36      |        | retl              | return                           | 0x81c3e008 |
| 40      | *      | nop               |                                  | 0x01000000 |

Figure 5: SPARC program to add the contents of an integer list

for each address l in the program. For our example program,

$$\Delta = \{0: box(int_80900009_{16})\} \cap \\ \{4: box(int_94102000_{16})\} \cap \\ \vdots$$

$$\{40: box(int=0100000_{16})\}$$

If we shuffle  $\Delta \cap \Gamma$  to interleave the contents of  $\Gamma$  and  $\Delta$  to look like

$$\{0: codeptr(\phi_{0})\} \cap \\ \{0: box(int=80900009_{16})\} \cap \\ \{4: codeptr(\phi_{4})\} \cap \\ \{4: box(int=94102000_{16})\} \cap \\ \vdots \\ \{40: box(int=01000000_{16})\} \}$$

the connection to traditional Hoare logic style presentation becomes obvious. The  $\phi$ 's are the program annotations, while the other singletons form the program instructions.

#### 2.6 Instructions

An instruction in TML is a relation on  $\phi_1$ , the type map which holds before its execution, and  $\phi_2$ , the one that holds after its execution. The tuple  $(\phi_1, \phi_2)$  is the Hoare-logic style specification of an instruction with  $\phi_1$  being the precondition and  $\phi_2$  being the postcondition for that instruction.

Unlike Hoare logic, we have explicit branches and jumps. For branches we need to know the preconditions of all possible branch targets. This is provided by the code context  $\Gamma$ . We therefore have the instruction constructor instr depending on three type maps,  $(\Gamma, \phi_1, \phi_2)$ .

For example, consider the add  $(r_i \leftarrow r_j + r_k)$  instruction in code context  $\Gamma$ . Addition requires the source values to be integers. So we have the precondition  $\phi_1 = (\phi \cap \{j : \operatorname{int}_{32}\} \cap \{k : \operatorname{int}_{32}\})$ . After addition, all previous typing judgements about  $r_i$  are invalidated. Additionally, in  $\phi_2$ ,  $r_i$  gets the integer type, encoded as  $\{r_i : \operatorname{int}_{32}\}$ . Therefore,  $\phi_2 = (\phi_1 \setminus i) \cap \{i : \operatorname{int}_{32}\}$ .

We can actually have a stronger formulation for add. If we know that in  $\phi_1$ ,  $\{j : \operatorname{int}_{=n}, k : \operatorname{int}_{=m}\}$ , we have the postcondition  $\phi_2 = (\phi_1 \setminus i) \cap \{i : \operatorname{int}_{=}(n+m)\}$  In TML, we use the instr instruction constructor to encode add for  $\phi_1$  and  $\phi_2$  under context  $\Gamma$  as add =  $\operatorname{instr}(\Gamma, \phi_1, \phi_2)$ . We see that add is polymorphic over all environments  $\phi$  and contexts  $\Gamma$ . Therefore, in writing TML-instruction constructors we need quantifications over  $\Omega$ , using operators  $\forall_{\iota}, \exists_{\iota}$ . We usually wish to prove that an instruction at some location satisfies some semantic specifications. Subtyping on instructions, " $\subset_{\iota}$ " gives us a convenient way of expressing this. Therefore, we have

$$\begin{array}{rl} \operatorname{add}(i,j,k) & \subset_{\mathfrak{l}} \forall \Gamma, \phi, n, m. \\ & \operatorname{instr}(\Gamma, (\phi \cap \{j: \operatorname{int}_{=}n\} \cap \{k: \operatorname{int}_{=}m\}), \\ & ((\phi \cap \{j: \operatorname{int}_{=}n\} \cap \{k: \operatorname{int}_{=}m\}) \backslash i \\ & \cap \{i: n+m\})) \end{array}$$

The form of instruction for addition given above is the most general form. However, add may be called in other contexts, most importantly for address arithmetic needed for subsequent loads and stores. This form of add is significantly different in its semantic specification and we must prove that the most general form is a subtype of the form below.

$$\begin{array}{ll} \operatorname{add}(i,j,k) & \subset_{\iota} \forall \Gamma, \phi, \tau, c. \\ \operatorname{instr}(\Gamma, (\phi \cap \{j : \operatorname{field} c \ \tau\} \cap \{k : \operatorname{int}_{=}c\}), \\ & ((\phi \cap \{j : \operatorname{field} c \ \tau\} \cap \{k : \operatorname{int}_{=}c\}) \backslash i \\ & \cap \{i : \operatorname{box}(\tau)\})) \end{array}$$

Unlike add which made no reference to the code context ( $\Gamma$ ), given the instruction jmp  $r_1$ , we need to make sure that it is typesafe to jump to the branch target (given by R(1), the contents of  $r_1$ ) with the current register environment,  $\phi$ . Hence,  $\phi$  should satisfy the precondition for code address R(1). Thus the TML instruction characterising a jump should have a code context  $\Gamma \cap \{R(1) :$ codeptr( $\phi_1$ )}. The instruction subtyping rule for a jump is written as

$$\begin{split} \mathtt{jmp}(i) &\subset_{\iota} \forall \Gamma, \phi, n. \\ & \mathsf{instr}(\Gamma \cap \{n: \mathsf{codeptr}(\phi)\}, \\ & \phi \cap \{i: \mathsf{int}_{=}n\}, \\ & \phi \cap \{i: \mathsf{int}_{=}n\}) \end{split}$$

## 3. TML STATIC SEMANTICS

TML static semantics are given in terms of rules for the wellformedness of types, type maps and instructions (Figure 10), subtyping rules on types in all these kinds (Figure 12), and well-formedness of programs.

### 3.1 Well-Formedness of Types

Some types like  $\forall$  and  $\exists$  take type arguments and could be written as  $\forall$  ( $\lambda x. F x$ ). But type functions using  $\lambda$  would require higher kinds, complicating the semantic model. We avoid this complication by having quantifiers implicitly bind de Bruijn indices [6], represented as <u>*n*</u> instead of named variables in the type terms. The above term, for example, looks like  $\forall$  ( $F \ 0$ ). We use explicit substitution [1] rules given in Figure 11 to manipulate the terms.

For reasoning about recursive types, we need to know which types are contractive (as in the ideal model [7] or the indexed model [5]). The type  $\alpha = \text{offset}(3, \alpha)$  is not meaningful because the operator offset is not contractive, but

$$list = int_{=}0 \cup (int_{\neq}0 \cap (field \ 0 \ int_{32}) \cap (field \ 4 \ list))$$

is meaningful because field is contractive.

To define well-formedness, we use a context formed by W, a list mapping indices in the type term to their contractiveness, and – in effect – we reason about type-functions, not types. If the  $n^{th}$  element in W is  $\mathbf{Y}$ , it means that  $\underline{n}$  may be assumed to be contractive for determining the contractiveness of the term containing it. If it is  $\mathbf{N}$ , the contractiveness of the term is not dependent on the contractiveness of  $\underline{n}$ . Whenever we introduce any new variable, it gets the de Bruijn index 0. We shift all other indices by 1. Therefore, we add the  $\mathbf{Y}$  or  $\mathbf{N}$  to the head of the context W. We use the notation  $W^{\mathbf{Y}}$  and  $W^{\mathbf{N}}$  to force all entries in W to  $\mathbf{Y}$  and  $\mathbf{N}$  respectively. Figure 10 gives the complete set of rules to determine the well-formedness of type terms. We list a few of these below.

$$\begin{array}{c} \frac{W \vdash \tau : \Omega \quad W \vdash n : \Omega}{W \vdash \text{ offset } (n, \tau) : \Omega} \quad \text{wf_OFFSET} \\ \\ \frac{N, W \vdash \tau : \Omega}{W \vdash \text{ rec } \tau : \Omega} \quad \text{wf_REC} \\ \\ \frac{Y, W \vdash \tau : \Omega}{W \vdash \forall \tau : \Omega} \quad \text{wf_V} \quad \frac{Y, W \vdash \tau : \Omega}{W \vdash \exists \tau : \Omega} \quad \text{wf_J} \\ \\ \frac{W[n] = Y}{W \vdash n : \Omega} \quad \text{wf_INDEX} \quad \frac{W \vdash \text{ int}_{=}(i) : \Omega}{W \vdash \text{ int}_{=}(i) : \Omega} \quad \text{wf_CONST} \end{array}$$

For example, consider the term  $rec(\lambda\alpha(offset 3 \alpha))$ , written as  $rec(offset 3 \underline{0})$  using de Bruijn indices. This term is not well-formed since offset is not a contractive constructor. The derivation tree for the well-formedness of the term under context *W* is

$$\frac{\frac{???}{\mathbf{N}, W \vdash \underline{0} : \Omega} \qquad \overline{\mathbf{N}, W \vdash 3 : \Omega}}{\frac{\mathbf{N}, W \vdash \text{ offset } 3 \underline{0} : \Omega}{W \vdash \text{ rec(offset } 3 \underline{0}) : \Omega}} WF\_\text{REC}$$

After using the WF\_OFFSET rule, we use the WF\_CONST rule to show well-formedness of 3. Since  $\{\mathbf{N}, W\}[0] = \mathbf{N}$ , we cannot use the WF\_INDEX rule to show the well-formedness of the <u>0</u> subterm, and hence the derivation fails.

The integer-list data type of Figure 2a is,

and this can be proved well-formed with a syntax-directed derivation.

## 3.2 Subtyping

In our proofs for derivations of program safety, we are often required to prove that any value of type  $\phi$  is also a value of type  $\phi'$ . Subtyping provides a convenient way of expressing this, and we express the above condition as  $\phi \subset \phi'$ . For instructions, we often wish to say that an instruction t implements another instruction t' (like add implementing pointer arithmetic in Section 2.6), express this syntactically as  $t \subset_t t'$ . A few illustrative rules are given below. A larger set of rules is in Figure 12.

$$\begin{array}{c|c} \hline \hline \tau \subset \top & \subset \cdot \top \\ \hline \hline \mathsf{int}_{=}i \subset \mathsf{int}_{\geq}(i) & \subset \mathsf{INT}_{*}\pi & \overline{\mathsf{int}_{=}i \subset \mathsf{int}_{\leq}(i)} & \subset \mathsf{INT}_{*}\pi \\ \hline \hline \frac{\tau_{1} \subset \tau_{2}}{\mathsf{box} \tau_{1} \subset \mathsf{box} \tau_{2}} & \subset \mathsf{BOXED} \\ \hline \frac{\tau \subset \tau_{3} & \tau \subset \tau_{4}}{\tau \subset \tau_{3} \cap \tau_{4}} & \subset \mathcal{OR} \\ \hline \frac{\tau_{1} \subset \tau_{2}}{\{i:\tau_{1}\} \subset \{i:\tau_{2}\}} & \subset \mathsf{SINGLE} \\ \hline \overline{\tau \subset \tau \backslash i} & \overset{\mathsf{C}}{\sim} \land \\ \hline \frac{\tau_{1}' \subset \tau_{1} & \tau_{2}' \subset \tau_{2} & \tau_{3} \subset \tau_{3}'}{\mathsf{instr}(\tau_{1},\tau_{2},\tau_{3}) \subset _{*}\mathsf{instr}(\tau_{1}',\tau_{2}',\tau_{3}')} & \subset \mathsf{INSTR} \end{array}$$

#### 4. REAL MACHINE INSTRUCTIONS

We want to run real programs on real machines; we will take a usable subset of the SPARC architecture as our prototype example. We omit floating-point instructions, register windows, and privileged instructions; this means that we can still type-check (prove safe) integer-only programs generated by compilers that don't shift register windows, of which the FLINT compiler [13] is an example. But for this paper we illustrate only a subset of our subset.

For each machine, we must augment our type system with types and typing rules that are necessary to describe that machine. For SPARC, we need to add kinds and types necessitated by the architecture details.

#### 4.1 SPARC Instruction Syntax

For the subset of SPARC instructions we consider, there are three classes of instructions, the ALU instructions, branch instructions, and memory access instructions. In TML, we model these instructions as subtypes of instructions formed by the instr construct in TML, as shown in Figure 15. Wellformedness rules for SPARC instructions are given in Figure 14.

SPARC arithmetic instructions take the form  $s_1 \oplus s_2 \rightarrow d$ , where  $s_1$  must be one of the 32 registers,  $s_2$  may be a register or an immediate sign-extended 13-bit constant, and *d* must be a register. To describe the second operand we have a kind *RegImm*, which allows us to discriminate between register-mode RMode type arguments and immediate-mode IMode type instructions.

Figure 13 shows the kinds and constructors for operators. A general ALU instruction is specified by a|u(x, c, oper), where *oper* (of kind *Oper*) is the operation to be performed, x (0 or 1) specifies whether carry-in is to be performed as part of the operation, and c (0 or 1) specifies whether the condition codes are to be modified by the operation. Thus, for example, the standard add instruction is  $a|u(0, 1, add_oper) : INSTR$ .

Condition codes on architectures such as SPARC and Pentium, are modelled as if they live somewhere in the register bank, perhaps in one or more processor-status registers. The fact that the condition codes in some state (R, M) match the condition "not equal" is written  $M \vdash R$  : cc\_ne, and we see that cc\_ne occupies the position of a type map. The SPARC has sixteen predicates on the condition codes, of which "not equal" is one; thus, we have sixteen primitive type maps, of which cc\_ne is one. We also have a type map op-

erator  $\backslash^{cc}$ , which is analogous to our type map operator  $\backslash$ ; that is,  $\phi \backslash^{cc}$  is a type map similar to  $\phi$  but with any information about the condition-code settings removed, e.g.,  $\phi \subset \phi \backslash^{cc}$ .

We define the complement operator on condition-code type maps as follows:

Some arithmetic and logical operations on the SPARC yield conditioncode settings, and these settings depend not only on the result but in some cases on how the result is computed (i.e. on the arguments). We model this with an operator setcc(*oper*,  $n_1, n_2$ ) which computes the appropriate condition-code type maps. For example,

 $setcc(add\_oper, 3, -7) =$ 

cc\_a  $\cap$  cc\_ne  $\cap$  cc\_le  $\cap$  cc\_l  $\cap$ 

cc\_gu  $\cap$  cc\_cc  $\cap$  cc\_ne  $\cap$  cc\_vc

A conditional branch instruction is specified by

ibranch(cond, a, i) where *cond* is a condition-code type map, *a* specifies whether the delay-slot instruction should be annulled if the branch is not taken, and *i* is the displacement, an integer to be added to the program counter if the comparison yields true.

For SPARC load instructions of the form  $d_1 \leftarrow M[s_1 + c]$  or store instructions of the form  $M[s_1 + c] \leftarrow s_2$ , the *c* argument is formed with the IMode constructor. The store instruction has involved semantics which are tied to our allocation model. We describe this model and the store instruction in detail in Section 7.1.

## 5. SEMANTIC MODEL OF TYPES

In this section, we briefly describe the semantic model we use for types. Our model is built upon the model described by Appel and McAllester [5].

In this model, a *ty-pred* is a predicate on tuples (k, v). Informally, a value v belongs to a type  $\tau$  to index k,  $(v :_k \tau)$ , it is safe to run a program expecting a value of type  $\tau$  on argument v for k instructions. Since  $\tau$  is a predicate on values,  $\tau k v$  means that (k, v) belongs to the type  $\tau$ .

A value v is a tuple (s, x). The first component, s, describes the state of the machine. s itself is complicated and contains other relations like readable, writable, allocated, which describe the sets of readable, writable and allocated parts of the machine memory respectively. Since types involving memory references would require to know about the currently readable parts of memory, for example, we need to have the state as a component of the value.

The Appel-McAllester model is too weak to specify dependent types (e.g., relate). In our new model, the second component x of a value (s,x) is a vector of integers (i.e., a finite function on integers). For example, Figure 2a illustrates a value (s,r) where the memory component of s contains  $\{300 \mapsto 10, 304 \mapsto 130\}$  and the "root vector" r contains  $\{1 \mapsto 0, 2 \mapsto 300\}$ .

Ty-preds are written as predicates on the tuple (k, v). To continue our example, the value (s, r(2)) might or might not be a list, because in this illustration we don't know the contents of memory location 130; but to approximation 1 it's a list, meaning that if we execute for at most 1 instruction from (s, r(2)) the program can't notice that it fails to be a list. We write this as list\_ty(1, (s, r(2))).

The type expressions of TML are not simply modelled by typreds, because we must also represent open de Bruijn indices: an an open term is (implicitly) parameterized by a substitution providing values for all the unbound variables. So the type map of TML is

$$(N \rightarrow ty\text{-}pred) \rightarrow ty\text{-}pred$$

For example, a singleton type map constructor (call it  $\{1:\tau\}$ ) is defined in the following way:

{1

$$\begin{aligned} : \tau \} &= \lambda \sigma. \\ \lambda(k, (s_{\nu}, x_{\nu})). \\ \tau(\sigma) \ k \ (s_{\nu}, \lambda i. x_{\nu}(1)) \end{aligned}$$

Since  $\tau$  may contain de Bruijn indices, we provide an environment  $\sigma$  to interpret them. In this type map, we wish the first vector entry to satisfy the type  $\tau$ . Hence, we extract the first component of v, before again putting it into a tuple with the state. Then  $\tau$  is applied to the new tuple with index *k*.

As mentioned before, type scalars are also modelled as type maps. For example, the definition of the character type "char" is given below.

$$\mathsf{char} = egin{array}{c} \lambda \sigma. \lambda(k, (s_v, x_v)). \ 0 \leq (\lambda i. x_v \ 0) < 256 \end{array}$$

We apply 0 to  $x_v$  to convert it into a scalar (call it  $c_v$ ). The set defining "char" should only allow values between 0 and 256. Therefore, we check that  $c_v$  satisfies this condition.

A number  $n : \Omega$  can be represented by the type  $int_{=}(n) : \Omega$ .

In this model, we think of programs both as sequences of opcodes  $\Delta$  and as collections of code pointers  $\Gamma$ . Assuming a program starts at location 100 with an environment  $\phi$  describing the types of values in registers, we must show that location 100 has type codeptr( $\phi$ ) for any index *k*. We prove this by induction on *k*.

The base case is easy enough; every value has type  $\tau$  for any  $\tau$  to approximation 0. For the induction step, we prove that

$$\forall k, v. \ (\Delta \cap \Gamma)(k, v) \Rightarrow \Gamma(k+1, v).$$

That is, if the program satisfies predicate  $\Delta \cap \Gamma$  to approximation *k*, then it also satisfies  $\Gamma$  to degree k + 1. Appel and McAllester [5] explain in detail what this means.

To avoid mentioning indices k in the TML system, we abstract  $\forall k, v. \phi(k, v) \Rightarrow \phi'(k+1, v)$  as  $\phi \in \phi'$ . Thus, the previous formula is written as  $\Delta \cap \Gamma \in \Gamma$ , and we give syntactic rules for introducing and eliminating  $\in$  in Figure 6:

$$\begin{array}{cccc} \underline{\tau \in \tau' & \underline{\tau \in \tau''}} & \textcircled{\baselineskiplimits} & \underbrace{\tau \in \tau' & \underline{\tau \in \tau'}}_{\tau \in (\tau' \cap \tau'')} & \textcircled{\baselineskiplimits} & \underbrace{\tau \subset \{l : \mathsf{at}(\mathsf{instr}(\tau', \tau_1, \tau''), s)\}}_{\tau \subset (\{l : \mathsf{codeptr}(\tau'')\})} & \textcircled{\baselineskiplimits} & \underbrace{\tau \in (\{l : \mathsf{codeptr}(\tau'')\})}_{\tau \in (\{l : \mathsf{codeptr}(\tau_1)\})} & \textcircled{\baselineskiplimits} & \underbrace{\tau \cap \tau' \oplus \tau'}_{\tau \subset \tau'} & \textcircled{\baselineskiplimits} & \underbrace{\tau \cap \tau' \oplus \tau'}_{\tau \subset \tau'} & \textcircled{\baselineskiplimits} & \underbrace{\tau \cap \tau' \oplus \tau'}_{\tau \subset \tau'} & \textcircled{\baselineskiplimits} & \underbrace{\mathsf{Figure 6:} \textbf{-}\mathsf{Rules}} & \end{array}$$

#### 6. PROVING PROGRAM SAFETY

Let A be the allocated set which includes the part of the memory that has the program code and any allocated data. A is not primitive; it can be expressed as a predicate on the memory M. Registers  $r_a$  and  $r_l$  will point to the boundaries of the allocation area, that is,

avail(R) = {R(a), R(a) + 1,..., R(l) - 1} are locations available for future allocation. To check the safety of the program under memory M and register bank R, we have the following rule :

$$A; M \vdash R : \phi$$
(1) $A \subset readable$ (2) $avail(R) = \subset$  (readable  $\cap$  writable)(3) $A \cap avail(R) = \emptyset$ (4) $A; M \vdash R(pc) : codeptr(\phi)$ (5)safe(R, M)

This program could be just a function which is called by another piece of code with some parameters. Hence, the initial program location is a continuation with formal parameters  $\phi$ . Therefore, the first condition says that we start out with the environment  $\phi$ . We do not wish to have any assumptions about the type system of the calling program. We might like the called function to use any complex type system which the caller might not know about. Hence, we expect  $\phi$  to be very primitive in nature and easily provable. This condition ensures that we start out in the correct state.

Premises (2) says that code and allocated data should be readable, (3) says that free space should also be writable and (4) ensures that allocated set is disjoint from the free space.

For the example program in Figure 5, let the readable set be  $\{0...1000\}$ , the writable set be  $\{200...1000\}$ , and the allocated set be  $\{0...300\}$ . Proving (2),(3), and (4) is fairly easy.

The last condition says that we start at a location which is a valid code pointer of type  $\phi$ . It is this condition which captures the main proof of safety of the program.

$$\begin{array}{c|c} \displaystyle \frac{|\mathsf{oaded}(\Delta,\mathsf{M}) \quad \Delta \subset \Gamma \quad \Gamma \subset \mathsf{R}(i):\tau}{A;\mathsf{M} \vdash \mathsf{R}(i):\tau} \\ \\ \Delta = & \bigcap_{i \in \mathtt{dom} \ p} \ \{i: \mathtt{box}(\mathsf{int}_{=}p(i))\} \quad \forall i \in \mathtt{dom} \ p.\mathsf{M}(i) = p(i) \\ \hline & \\ \hline & \\ \displaystyle \mathsf{loaded}(\Delta,\mathsf{M}) \end{array}$$



Figure 7 lists the inference rules on values. The first rule shows that proving (5) involves proving three facts:

- We first show that the program loaded in memory is described by  $\Delta$
- The second condition involves decoding Δ to get TML instructions. Then we must show that these instructions respect the invariants mentioned in the code context Γ. The proof of this fact requires us to traverse the program and check that type safety is preserved at each point in the execution of the program. This proof captures progress by showing that there is always another instruction (the next instruction for non-branches, and the branch target instruction otherwise) that may be safely executed.
- Finally, for the program counter (*r<sub>pc</sub>*), we must show that the Γ from the second condition satisfies {*r<sub>pc</sub>* : codeptr(φ)}. This can be proved using the subtyping rules shown in Figure 12. For example, if

$$\begin{split} \Gamma &= \{0: \mathsf{codeptr}(\phi_0)\} \,\cap\, \{4: \mathsf{codeptr}(\phi_4)\} \,\cap \ldots \\ &\cap\, \{40: \mathsf{codeptr}(\phi_{40})\} \end{split}$$

then  $\Gamma \subset \{0 : \mathsf{codeptr}(\phi_0)\}$  is trivial.

Proving  $\Delta \subset \Gamma$  is nontrivial and we give a schematic description below. Figure 8 gives part of the proof tree for the second condition. The two most interesting stages in the proof are labelled S1 and S2. At stage S1, we need to prove facts like  $\Delta \cap \Gamma \oplus \{l : \text{codeptr}(\phi_l)\}$ for every location *l* in the program. This means that can always take another well-typed execution step at any point in the program. These are the typing judgements which implement the induction proof (mentioned in the earlier section) in our underlying model.

In stage S2 proving these facts for each location l is reduced to

- proving that some instruction is present at location *l* whose precondition is φ<sub>l</sub> specified by the environment at *l* and postcondition is some φ', and
- 2. proving that the next instruction is a codepointer which expects the environment to satisfy  $\phi'$ .

The connection to traditional Hoare-logic proofs is again evident. Below is the Hoare-logic style proof for our list sum example.

We start out with the  $\Delta$  and  $\Gamma$  as described in 2.5.

To prove the first fact, we begin with decoding the contents of  $\Delta$ . Each program code location, *l*, contains some number *n*. The first step involves proving that *n* decodes into a TML instruction  $\iota = instr(\Gamma, \phi, \phi')$ . We prove this using instruction decoding techniques described by Michael and Appel [8, 3]. After this, we prove that  $\phi_l \subset \phi$ . This allows us to execute  $\iota$  under the environment  $\phi_l$  safely.

Proving the second part involves showing that the resultant environment  $\phi'$  is compatible with the given environment at next location, l + 4 (for non branch instruction). This is done by showing that  $\phi' \subset \phi_{l+4}$ .

We list the environment supplied at each point in the program along with the decoded instructions.

|    | $\phi_0 = \{\texttt{o1}:   \texttt{ist} \}$                                                                                      |
|----|----------------------------------------------------------------------------------------------------------------------------------|
| 0  | SPARC_TST                                                                                                                        |
|    | $\phi_4 = \{ \texttt{o1} :   ist \}$                                                                                             |
| 4  | SPARC_MOV                                                                                                                        |
|    | $\phi_8 = \phi_4 \cap \{ o2: int_{32} \}$                                                                                        |
| 8  | SPARC_BRANCH                                                                                                                     |
|    | $\phi_{12} = \phi_8 \cap \{ o3: int_{32} \}$                                                                                     |
| 12 | SPARC_MOV                                                                                                                        |
|    | $\phi_{16} = \phi_{12} \cap \{ o1: int_{\neq} 0 \}$                                                                              |
| 16 | SPARC_LOAD                                                                                                                       |
|    | $\phi_{20} = \phi_{12} \cap \{o1: int_{\neq} 0\} \cap \{o2: int_{32}\}$                                                          |
| 20 | SPARC_LOAD                                                                                                                       |
|    | $\phi_{24} = \phi_{12} \cap \{ o2: int_{32} \}$                                                                                  |
| 24 | SPARC_TST                                                                                                                        |
|    | $\phi_{28} = \phi_{24}$                                                                                                          |
| 28 | SPARC_BRANCH                                                                                                                     |
| ~~ | $\phi_{32} = \phi_{28}$                                                                                                          |
| 32 | SPARC-ALU                                                                                                                        |
|    | $\phi_{36} = \{ \texttt{o1} :  \texttt{ist}\} \cap \{\texttt{o2} : \texttt{int}_{32}\} \cap \{\texttt{o3} : \texttt{int}_{32}\}$ |

We can always execute the instruction at location 0, since the precondition for SPARC\_TST has no constraints. From subtyping rule  $\subset$  REFL, we can prove that  $\phi_0 \subset \phi$ . The postcondition

$$\phi' = \phi_0 \cap \left( (\mathsf{cc\_ne} \cap \{\mathsf{o1}:\mathsf{int}_{\neq}0\}) \cup (\mathsf{cc\_e} \cap \{\mathsf{o1}:\mathsf{int}_{=}0\}) \right)$$

 $\phi'$  is stronger than  $\phi_4$ , and step (2) in stage S2 can be easily proven by subtyping rules. This postcondition helps us relate the value of  $\circ 1$  to the condition code when we take a branch depending on it. We can always execute the two mov instructions (at location 4 and at the delay slot at location 12) as there are no source register constraints. After the branch instruction at location 8, we have postcondition



Figure 8: Part of Syntactic Proof Tree for Program Safety

 $\phi_{ba} = \phi_4 \cap \{ \circ 2 : \mathsf{int}_{=} 0 \} \cap \{ \circ 3 : \mathsf{int}_{=} 0 \}$ 

This environment can be shown to be stronger than  $\phi_{28}$ , the target of the unconditional branch, i.e.  $\phi_{ba} \subset \phi_{28}$ . At this point, depending on the condition codes, we either go to location 12 or location 36 after executing the add instruction at location 32. Since types of  $\circ 2$  and  $\circ 3$  are int<sub>32</sub>, we can show that this SPARC\_ALU addition instruction in the delay slot can be executed.

The postcondition of the conditional branch, if taken, is

 $\phi_{v} = \phi_{0} \cap ((\mathsf{cc\_ne} \cap \{\mathsf{o1} : int_{\neq}0\}) \cap \{\mathsf{o2} : int_{32}\} \cap \{\mathsf{o3} : int_{32}\}$ 

This environment satisfies the condition of the pointer in o1 being non-null. We can prove that it is stronger than the  $\phi_{16}$ , the target of the conditional branch using rules in Figure 12.

Since we are using the untagged representation of a list, and we have  $\{o1: list_ty\}$  and  $\{o1: int_{\neq}0\}$ . We can infer  $\{o1: field 0 int_{32}\} \cap \{o1: field 4 \mid list_ty\}$ .

This would satisfy the precondition for the 1d instruction at locations at 20 and 24, since the types of 02 and 03 are the same, but the type of 01 is stronger than just list. The next load can be similarly shown to be safe.

If the branch at location 28 is not taken, we have the postcondition

$$\phi_n = \phi_0 \cap ((\mathsf{cc\_e} \cap \{\mathsf{o1} : int=0\}) \cap \{\mathsf{o2} : int=0\} \cap \{\mathsf{o3} : int_{32}\}$$

This is stronger than the requirement that we return with  $\{o3 : int_{32}\}$ . Thus the entire function can be shown to be safe.

## 7. NO ATOMIC OPERATIONS

In TML, we do not use opaque high-level instructions for allocation or array accesses which expand into multiple instructions on real machines. Our type system allows us to argue about intermediate machine states within these instructions. Below we explain how to perform safe memory allocation, array bounds check elimination, and sum type discriminations in TML.

#### 7.1 Memory Allocation

We assume that memory is allocated from a contiguous region in order. i.e. every location is allocated after all preceding locations have been allocated. As shown in figure 9, the region begins at "Start" and ends at "Limit", where all locations till "Boundary" are allocated.

Most typed assembly languages treat memory allocation as an atomic operation. This increases the trusted computing base, making us trust the safety of the allocation subroutine. This also disallows the compiler making certain optimisations or rearrangement of instructions in the code that involves memory allocation. In TML, we use bookkeeping registers to get rid of the atomicity of allocation. The contiguous allocation space starts at address pointed



**Figure 9: Memory Allocation** 

to by the register  $r_{ap}$ .  $r_b$  points to the beginning of the unallocated set of locations.i.e., all locations from  $r_{ap}$  to  $r_b - 1$  are allocated.  $r_l$  points to the end of the memory allocation region. All of these are virtual registers, and may not be used as register arguments to any instructions.

We wish to allocate a tuple of two integers. Let these integers be stored in  $r_1$  and  $r_2$  and let  $r_3$  point to the resultant tuple. Before allocation, we must start out with a guarantee that some space is avilable. Assume  $r_{ap}$  pointing to location 100,  $r_b$  pointing to location 120, and  $r_l$  pointing to location 200. The atomic malloc program would be split into the following steps:

- Check for the availability of space : The condition R(b) + 4 ≤ R(l) checks that we have space for two integers.
- Store contents of  $r_1$  at address  $\mathsf{R}(b)$ .
- Update R(b) to R(b+4).

For SPARC, the store instruction sto i, j, c allows us to split malloc. It uses the first type map to encodes the intial checks by having relates (offset 4, id)(b, l) This ensures space for one integer taking up one machine word. By having relate=((offset c), id)(i, b), we know the address for storing is exactly the same as the beginning of the unallocated space. The postcondition correctly assigns the type to i. It also updates the  $r_b$  to point to the next unallocated location, and maintains the " $\leq$ " relation between the  $r_b$  and the  $r_l$ . Thus, it guarantees that the store is safe. The second store is similarly guaranteed safe. At the end of the two stores, we have  $\{i: (field 4 int_{32})\}$  from the first sto and  $\{i: (field 8 int_{32})\}$  from the second. Taking the intersection of these two types, we have  $r_i$  having exactly the type for a tuple of two integers.

Our allocation method is not completely general. For example, we still have the restriction of making allocations in a linear order without holes.

#### 7.2 Array bounds check elimination

We can define *n*-length arrays in TML as

$$\operatorname{array}(n, \tau) = \forall i. \operatorname{field}(((4 * i) \cap \operatorname{int}_{>} 0 \cap \operatorname{int}_{<} n), \tau)$$

The allocation for arrays is similar to the example above. We assume a register  $r_i$  which has type  $\tau$ . We make up an array of type  $\tau$  and size *n* by initialising *n* locations using the scheme above to the value in  $r_i$  within a loop. Due to lack of space, we cannot go through a complete example.

We have a rich set of constuctors over integer values and singleton types in TML which allow us to perform safe array bounds check eliminations.

| Loc | Program Pseudo code                                 |
|-----|-----------------------------------------------------|
| 100 | loop:subcc %o2, %o4, %g $\mathbf{R}[o2] == R[o4]$ ? |
| 104 | be done; nop goto done                              |
| 112 | add %01, %04, %05 $R[o5] \leftarrow R[o1] + R[o4]$  |
| 116 | ld [%05], %06 $R[o2] \leftarrow M[R[o5]]$           |
| 120 | add %06, %03, %03 R $[o3] \leftarrow R[o3] + R[o6]$ |
| 124 | add %04, 4, %04 $R[o4] \leftarrow R[o4] + 4$        |
| 128 | ba foo; nop goto loop                               |
| 136 | done:                                               |

Consider the program above which adds the elements of an integer array. Register ol contains the pointer to the array, o2 contains length of the array. o3 is the accumulator for the sum of elements. o4 contains an integer used to index into the array. We start with the environment

$$\phi_{100} = \{ \mathbf{o1} : \operatorname{array} n \operatorname{int}_{32} \cap \operatorname{int}_{=} 4l \} \cap \{ \mathbf{o2} : \operatorname{int}_{=} 4n \} \cap \{ \mathbf{o4} : \operatorname{int}_{>} 0 \cap \operatorname{int}_{<} 4n \cap \operatorname{int}_{=} 4m \}$$

From the types in  $\phi_{100}$ , the starting point for the program, we can infer the fact that R(o4) < R(o2) and that R(o4) and R(o1) are divisible by 4. After the subcc instruction at 100, using SPARC\_ALU\_2 (which considers condition codes), instantiated with op = sub, i =o2, i = o4 and k = g0, we have the environment  $\phi_{104}$  corresponding to the postcondition having  $set_c(sub, R(o2), R(o4))$ . The branch instruction SP\_BRANCH requires the postcondition that the "equal" flag is not set for the fall-through instruction. Therefore, we can determine the precondition of instruction at 112 to include  $R(o2) \neq R(o4)$ . These two facts allow us to further infer that R(o4) < 4n. The add instruction at address 112 computes the address of the array element to be accessed. This access is safe only if the address is between the array base R(o1) and the array limit (= R(o1) + 4n), and if it is on a word boundary. Since  $0 < \mathsf{R}(\mathsf{o4}) < 4n$ , the effect of the add instruction is that  $\mathsf{R}(\mathsf{o1}) < 1$ R(o5) < R(o1) + 4n. The sum is also divisible by 4. We expect a medium-sized set of arithmetic lemmas to be sufficient for deriving most of the inferences required for such reasoning. All these conditions can be shown to be implied by the postcondition we get for the add instruction. Therefore, the next 1d instruction at location 116 can be shown to load from an address on a word boundary which is within the array limits. Hence the access is safe. After the add instruction at location 128, we can easily prove that R(o4)remains to be divisible by 4.

#### 7.3 Sum type discrimination

Consider the following ML datatype.

This has a machine level representation where the A and B cases are differentiated on the basis of a tag. A zero tag implies case A and the next two words contain integers. A nonzero tag implies B tag where it is safe to access the third word after the tag.

|             | Code           | Pseudocode                               |
|-------------|----------------|------------------------------------------|
| 100         | ld [%o1], %o2  | $R[o2] \leftarrow M(R(o1))$              |
| 104         | tst %o2        | if $o2 \neq 0$                           |
| 108         | bne b_case;nop | B tag case                               |
| 116         | ba a_case; nop | else A tag case                          |
| 124 b_case: | ld [%o1+12], % | $o\mathbb{R}[o3] \leftarrow M(R(o1)+12)$ |
| :           |                |                                          |
| 140 a_case: | ld [%o1+4], %o | $3R[o3] \leftarrow M(R(o1)+4)$           |
| :           |                |                                          |

Assume that register ol points to a foo cell. Instruction 100 loads the tag into register ol. If ol is not zero, (B case) then we branch to the b\_case label. Else, (A case) we go to the a\_case label. In instruction 124, we access the third field. This access may be ensured to be safe in the following way. The load instruction relates ol and ol such that M[ol] = ol (Fact 1). On jumping to b\_case after the test, we have another fact ol = 0 (Fact 2) due to the TML instruction SPARC\_TST. At location 124, both facts 1 and 2 continue to be true. From these two facts, we use simple arithmetic lemmas to conclude that  $M[ol] \neq 0$ . This combined with the fact  $\{ol : box(foo)\}$  allows us to conclude that ol has the B variant of foo. Therefore, the access in location 124 can be proven safe.

### 8. RELATED WORK

The PCC system described by Necula [11] lays the foundation for this research. However, it has a very large trusted computing base in terms of the type inference rules and the "VCgen" or the verification condition generator. By giving semantic model to types and machine semantics, we have a much smaller trusted computing base. Also, their implementations are specially geared towards 'C' or Java.

TAL [9] uses opaque high level operations to handle allocations and array accesses. TAL is also specialised for compilation to x86 architectures. DTAL seems to be more expressive than TAL, though it also handles allocation in an opaque way. DTAL also requires an extension to the system to handle sum type discrimination. However, due to the presence of dependent types, DTAL makes array accesses transparent and allows us to perform safe array-bounds check eliminations.

Finally, PCC and TAL have had no semantic models, only syntactic metatheorems.

## 9. FUTURE WORK

We are using Twelf [12] to encode TML. Though most of our model for types has been encoded, we still have to encode the model for instructions and many lemmas which are required to complete the proofs of program safety. Our allocation model is also not fully general and we hope to extend it to handle regions [14]. This could allow us to use provably safe garbage collection schemes as shown by Wang and Appel [15].

We are also using TML to construct a semantic model for a TALlike calculus, completing the end-to-end path from ML source code to a foundational safety proof.

#### **10. ACKNOWLEDGEMENTS**

We would like to thank Adriana Compagnoni, Amy Felty, Zhong Shao, Roberto Virga, David Walker and Dan Wang for many helpful comments and suggestions. We would also like to thank Roberto Virga for helping us design the wellformedness judgements.

- 11. **REFERENCES** [1] M. Abadi, L. Cardelli, P.-L. Curien, and J.-J. Levy. Explicit substitutions. In Seventeenth Annual ACM Symp. on Principles of Prog. Languages, pages 31-46. ACM Press, Jan 1990.
- [2] Amal Ahmed, Andrew W. Appel, and Roberto Virga. Semantics of general references by a hierarchy of Gödel numberings. July 2001.
- [3] Andrew W. Appel. Foundational proof-carrying code. In Symposium on Logic in Computer Science (LICS '01), pages 247-258. IEEE, 2001.
- [4] Andrew W. Appel and Amy P. Felty. A semantic model of types and machine instructions for proof-carrying code. In POPL '00: The 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 243-253. ACM Press, January 2000.
- [5] Andrew W. Appel and David McAllester. An indexed model of recursive types for foundational proof-carrying code. Technical Report TR-629-00, Princeton University, October 2000.
- [6] N. G. deBruijn. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation. Indag. Math., 34:381-92, 1972.
- [7] David MacQueen, Gordon Plotkin, and Ravi Sethi. An ideal model for recursive polymophic types. Information and Computation, 71(1/2):95-130, 1986.
- [8] Neophytos G. Michael and Andrew W. Appel. Machine instruction syntax and semantics in higher-order logic. In 17th International Conference on Automated Deduction, June 2000.
- [9] Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From System F to typed assembly language. In POPL '98: 25th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 85-97. ACM Press, January 1998.
- [10] George Necula. Proof-carrying code. In 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 106–119, New York, January 1997. ACM Press.
- [11] George Ciprian Necula. Compiling with Proofs. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, September 1998.
- [12] Frank Pfenning and Carsten Schürmann. System description: Twelf a meta-logical framework for deductive systems. In The 16th International Conference on Automated Deduction. Springer-Verlag, July 1999.
- [13] Zhong Shao. An overview of the FLINT/ML compiler. In Proc. 1997 ACM SIGPLAN Workshop on Types in Compilation, June 1997.
- [14] Mads Tofte and Jean-Pierre Talpin. Implementation of the typed call-by-value  $\lambda$ -calculus using a stack of regions. In Twenty-first ACM Symposium on Principles of Programming Languages, pages 188-201. ACM Press, January 1994.
- [15] Daniel C. Wang and Andrew W. Appel. Type-preserving garbage collectors. In POPL 2001: The 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 166-178. ACM Press, January 2001.
- [16] Hongwei Xi and Robert Harper. A dependently typed assembly language. Technical Report OGI-CSE-99-008, Computer Science and Engineering Department, Oregon Graduate Institute, July 1999.

## APPENDIX

$$\begin{split} \overline{W \vdash \tau:\Omega} \quad WF_{-}T \quad \overline{W \vdash \pm:\Omega} \quad WF_{-}L \quad \overline{W \vdash c(i):\Omega} \quad WF_{-}CONST \quad \frac{W \vdash i:\Omega}{W \vdash int_{\pi}i:\Omega} \quad WF_{-}INT_{-\pi} \\ \frac{W \vdash n_{1}:\Omega \quad W \vdash n_{2}:\Omega}{W \vdash n_{1}+n_{2}:\Omega} \quad WF_{-}+ \quad \frac{W \vdash \tau:\Omega}{W \vdash id\tau:\Omega} \quad WF_{-}ID \quad \frac{W \vdash \tau:\Omega}{W^{N} \vdash codeptr\tau:\Omega} \quad WF_{-}CPTR \\ \frac{W \vdash \tau:\Omega \quad W \vdash n:\Omega}{W \vdash offset(n,\tau):\Omega} \quad WF_{-}OFFSET \quad \frac{W \vdash \tau:\Omega}{W^{N} \vdash box\tau:\Omega} \quad WF_{-}BOXED \quad \frac{N,W \vdash \tau:\Omega}{W \vdash rec\tau:\Omega} \quad WF_{-}REC \\ \frac{W_{1},W_{2}^{Y} \vdash \tau_{1}:\Omega \quad W_{1}^{Y},W_{2} \vdash \tau_{2}:\Omega}{(W_{1},W_{2}) \vdash \tau_{1}\cap\tau_{2}:\Omega} \quad WF_{-}\cap \quad \frac{W \vdash \tau_{1}:\Omega \quad W \vdash \tau_{2}:\Omega}{W \vdash \tau_{1}\cup\tau_{2}:\Omega} \quad WF_{-} \\ \frac{Y,W \vdash \tau:\Omega}{W \vdash \forall\tau:\Omega} \quad WF_{-} \quad \frac{Y,W \vdash \tau:\Omega}{W \vdash \exists\tau:\Omega} \quad WF_{-} \quad \frac{W \vdash n}{W \vdash n} \\ \frac{W \vdash i:INSTR}{W \vdash \forall:\Omega:\Omega} \quad WF_{-} \quad \frac{W \vdash \tau:\Omega}{W \vdash m} \quad WF_{-} \quad \frac{W \vdash \tau;\Omega}{W \vdash t;INSTR} \quad WF_{-} \\ \frac{Y,W \vdash \tau:INSTR}{W \vdash \forall_{1}:INSTR} \quad WF_{-} \quad \frac{Y,W \vdash \tau:INSTR}{W \vdash \exists_{1}:INSTR} \quad WF_{-} \\ \frac{Y,W \vdash \tau:INSTR}{W \vdash \forall_{1}:INSTR} \quad WF_{-} \\ \frac{Y,W \vdash \tau:INSTR}{W \vdash \forall_{1}:INSTR} \quad WF_{-} \\ \frac{W \vdash t:INSTR}{W \vdash \forall_{1}:INSTR} \quad WF_{-} \\ \frac{W \vdash W}{W \vdash W} \quad WF_{-} \\ \frac{W \vdash W}{W \vdash W} \quad WF_{-} \\ \frac{W \vdash W}{W \vdash W} \quad WF_{-} \\ \frac{W \vdash W}{W} \quad WF_{-} \\ \frac{W$$



$$\begin{array}{rcl} Terms\,\tau,\iota & ::= & \underline{n} \mid \top \mid \bot \mid \mathsf{codeptr} \; \tau \mid \mathsf{offset}(n,\tau) \mid \mathsf{id} \; \tau \mid \mathsf{box} \; \tau \mid \\ & \tau \cap \tau' \mid \tau \cup \tau' \mid \forall \tau \mid \exists \tau \mid \mathsf{rec} \; \tau \mid \{n:\tau\} \mid \tau \mid n \mid \\ & \mathsf{relate}_{\pi}(\tau_{1},\tau_{2})(n_{1},n_{2}) \mid c \; n \mid \mathsf{int}_{32} \mid \mathsf{int}_{\pi}n \mid n_{1}+n_{2} \mid \\ & \mathsf{at}(t,n) \mid \mathsf{instr}(\tau,\tau',\tau'') \mid \forall_{t}\iota \mid \exists_{t}\iota \mid \tau[s] \\ \\ Substitutions\;r,s & ::= & \mathsf{id} \mid \uparrow \mid \tau \cdot s \mid r \circ s \\ \hline \underline{0}\;[\mathsf{id}] = \underline{\tau}\; \underline{0}\; \mathsf{es\_Varld}\; \quad \underline{0}\;[\tau \cdot s] = \underline{\tau}\; \tau \; \mathsf{es\_VarCons}\; \quad \overline{(\tau_{1}\tau_{2})[s] = \underline{\tau}\;(\tau_{1}[s])(\tau_{2}[s])} \; \mathsf{es\_App}\; \quad \overline{\forall\;\tau[s] = \underline{\tau}\;\tau[\underline{0}\;\cdot(s \circ \uparrow)]} \; \mathsf{es\_\forall} \\ \hline \overline{\tau[s_{1}][s_{2}] = \underline{\tau}\;\tau[s_{1}\;\circ s_{2}]} \; \mathsf{es\_Clos}\; \quad \overline{\tau[\mathsf{id} \circ s] = \underline{\tau}\;\tau[s]}\; \mathsf{es\_JdL}\; \quad \overline{\tau[\uparrow \circ \mathsf{id}] = \underline{\tau}\;\tau[\uparrow]} \; \mathsf{es\_ShiftId} \\ \hline \overline{\tau[\uparrow \circ (\tau' \cdot s)] = \underline{\tau}\;\tau[s]}\; \; \mathsf{es\_ShiftCons}\; \quad \overline{\tau_{1}\cap\tau_{2}[s] = \underline{\tau}\;\tau_{1}[s]\cap\tau_{2}[s]}\; \mathsf{es\_On}\; \quad \overline{\tau_{1}\cup\tau_{2}[s] = \underline{\tau}\;\tau_{1}[s]\cup\tau_{2}[s]}\; \mathsf{es\_On}} \; \\ \hline \overline{\tau[(\tau' \cdot s) \circ s'] = \underline{\tau}\;\tau[\tau'[s'] \cdot (s \circ s')]}\; \; \mathsf{es\_Map}\; \quad \overline{\tau[s \circ (s' \circ s'')] = \underline{\tau}\;\tau[(s \circ s') \circ s'']}\; \; \mathsf{es\_Ass}} \\ \hline \overline{\tau[t_{\pi}\tau[s] = \underline{\tau}\;\mathsf{int}_{\pi}(\tau[s])}\; \; \mathsf{es\_int}_{\pi}\; \quad \overline{\mathsf{int}_{32}[s] = \underline{\tau}\;\mathsf{int}_{32}\; \mathsf{es\_int}_{32}\; \; \mathsf{cn}\;\mathsf{sc}}\; \; \mathsf{cn}\;\mathsf{sc}\;\mathsf{sc} \\ \mathbf{Figure 11: Static Semantics : Explicit Substitution Rules} \end{array}$$

$$\frac{\tau \subset \tau' \quad \tau'' \subset \tau'}{\tau \subset \tau} \subset \operatorname{REFL} \quad \frac{\tau \subset \tau'' \quad \tau'' \subset \tau'}{\tau \subset \tau'} \subset \operatorname{LTRANS} \quad \frac{\tau \subset \tau' \quad \tau' \subset \tau}{\tau = \tau \tau'} \subset \operatorname{LEQ}$$

$$\frac{\tau \subset \tau}{\tau \subset \tau} \subset \operatorname{L} \quad \frac{\tau \subset \tau}{\tau \subset \tau} \subset \operatorname{L} \quad \operatorname{int} = i \subset \operatorname{int} \geq i \subset \operatorname{LNT} \pi \quad \operatorname{int} = i \subset \operatorname{int} \leq i \subset \operatorname{LNT} \pi$$

$$\frac{\tau_1 \subset \operatorname{int} \leq n_1 \quad \tau_2 \subset \operatorname{int} \leq n_2 \quad c(n_2 + n_2) \subset \operatorname{int} \sim 1}{+(\tau_1, \tau_2) \subset \operatorname{int} \sim 1} \subset \operatorname{L} \quad \frac{\tau_1 \subset \tau_2}{\operatorname{int} \tau_1 \subset \operatorname{int} = \tau_2} \subset \operatorname{LD}$$

$$\frac{\psi_2 \subset \psi_1}{\operatorname{codeptr}(\psi_1) \subset \operatorname{codeptr}(\psi_2)} \subset \operatorname{LCPTR} \quad \frac{\tau_1 \subset \tau_2}{\operatorname{offset}(i) \tau_1 \subset \operatorname{offset}(i) \tau_2} \subset \operatorname{coFFSET} \quad \frac{\tau_1 \subset \tau_2}{\operatorname{box} \tau_1 \subset \operatorname{box} \tau_2} \subset \operatorname{LOSED}$$

$$\frac{\tau_1 \cap \tau_2 \subset \tau_1}{\tau_1 \subset \tau_1 \subset \tau_2} \subset \operatorname{cull} \quad \frac{\tau_1 \cap \tau_2 \subset \tau_2}{\tau_2 \subset \tau_1 \subset \tau_2} \subset \operatorname{cull} \quad \frac{\tau_1 \subset \tau_2}{\tau_1 \subset \tau_2 \subset \tau_1} \subset \operatorname{cull}$$

$$\frac{\forall x. \ (\tau_1 x) \subset (\tau_2 x)}{\forall \tau_1 \subset \forall \tau_2} \subset \operatorname{cull} \quad \frac{\forall x. \ ((\tau_1 x) \subset (\tau_1 x))}{\exists \tau_1 \subset \exists \tau_2} \subset \operatorname{cull} \quad \frac{\tau_1 \subset \tau_1}{\tau_1 \subset \tau_2} \subset \operatorname{cull} \quad \frac{\tau_1 \subset \tau_1}{\tau_1 \subset \tau_2 \times \tau_1} \subset \operatorname{cull}$$

$$\frac{\forall x. \ ((\iota_1 x) \subset (\iota_2 x))}{\forall \tau_1 \subset \forall_1 \tau_2} \subset \operatorname{cull} \quad \frac{\forall x. \ ((\iota_1 x) \subset (\iota_1 x))}{\exists \tau_1 \subset \tau_1 \subset \tau_2} \subset \operatorname{cull} \quad \frac{\tau_1 \subset \tau_1 \quad \tau_2' \subset \tau_2 \quad \tau_3 \subset \tau_3'}{\tau_1 \subset \tau_1 \times \tau_2 \subset \tau_1 \times \tau_1} \subset \operatorname{cull}$$

$$\frac{\forall x. \ ((\iota_1 x) \subset (\iota_2 x))}{\forall \tau_1 \subset \forall_1 \tau_2} \subset \operatorname{cull} \quad \operatorname{cull$$

| Figure 12: S | Static Semantio | es : | Subtyping | Rules |
|--------------|-----------------|------|-----------|-------|
|--------------|-----------------|------|-----------|-------|

|                |     | SPARC Kinds                               |                               |
|----------------|-----|-------------------------------------------|-------------------------------|
| κ              | ::= | Oper                                      | ALU operators                 |
|                |     | RegImm                                    | instruction modes             |
|                | İ   | Ω                                         | types                         |
|                |     | SPARC ALU Operators ( <i>Oper</i> )       |                               |
| ор             | ::= | add(add_cc)                               | integer add (set codes)       |
|                |     | addc(addc_cc)                             | logical add/carry (set codes) |
|                |     | and(and_cc)                               | logical AND (set codes)       |
|                |     | andn(andn_cc)                             | logical NAND (set codes)      |
|                |     | xor(xor_cc)                               | logical XOR (set codes)       |
|                |     | xnor(xnor_cc)                             | logical XNOR (set codes)      |
|                |     | SPARC Instruction Modes ( <i>RegImm</i> ) |                               |
| ri             | ::= | RMode <i>n</i>                            | Register mode                 |
|                |     | IMode <i>n</i>                            | Immediate mode                |
|                |     | $SPARC$ types $(\mathbf{\Omega})$         |                               |
| $\tau_{s}$     | ::= | $calc(op, n_1, n_2)$                      | ALUop result                  |
|                |     | cc_a   cc_n   cc_ne                       | Condition codes               |
|                |     | /cc                                       | Restrict on condition codes   |
|                |     | $set\_cc(op, n_1, n_2)$                   | ALUop cc result               |
|                |     | SPARC instructions (INSTR)                |                               |
| ι <sub>s</sub> | ::= |                                           | ALU instruction               |
|                |     | $ibranch(\mathbf{	au_s}, a, n_1)$         | integer branch instruction    |
|                |     | load(i,ri,j)                              | load instruction              |
|                |     | store(i, ri, j)                           | store instruction             |

| Figure 13: TML Syntax : SPARC Kinds and Types | Figure 13: | TML Syntax : | SPARC Kinds | and Types |
|-----------------------------------------------|------------|--------------|-------------|-----------|
|-----------------------------------------------|------------|--------------|-------------|-----------|

$$\frac{1 \le i < 32}{\vdash \text{ rmode}(i) : \Omega_{regimm}} \qquad \frac{-2^{12} \le i < 2^{12}}{\vdash \text{ imode}(i) : \Omega_{regimm}}$$
$$\frac{x \in \{0, 1\} \quad c \in \{0, 1\} \quad \vdash \text{ oper} : Oper \quad \vdash \text{ regimm} : RegImm \quad i, k : \Omega_{num}}{\vdash \text{ alu}(x, c, oper)(i, regimm, k) : \Omega_1}$$
Figure 14: Wellformedness of SPARC Instructions



Figure 15: Static Semantics : SPARC Instructions in TML