Protection Against Untrusted Code
by Andrew W. Appel

(originally appeared in IBM Developer Works, September 1999)

Abstract

In principle, untrusted Java applets and class libraries are safe to execute, because the bytecode verifier prevents many kinds of malicious attacks. But the Just-In-Time compiler (JIT) inside your Java Virtual Machine can be a security weakness. Recent research has shown how to build secure JITs, but until they are available, you should not try to build secure systems that run untrusted code.

Introduction

There are two basic ways to ensure that it is safe to run someone else's program: by trust and by protection mechanisms. When you buy shrink-wrapped software you are relying on trust: the software vendors preserve their good name by quality control to ensure that no malicious viruses are embedded on the disk. When you download a Java applet, you are relying on a protection mechanism: the Java bytecode verifier ensures that the applet cannot access the private variables of the browser, or put arbitrary machine code in memory and then jump to it.

Traditional protection mechanisms use hardware: the virtual memory subsystem of a computer is used by an operating system to ensure that one process cannot "attack" the memory space of another process. But when two processes that don't quite trust each other need to communicate -- as when your browser sends events to the applet, and the applet draws in its browser window -- hardware protection is clumsy, and an object-oriented interface if much more efficient and expressive.

Computer scientists have long researched the use of programming-language type-checking as a protection mechanism that doesn't need hardware support. Java is the most widely used language to work this way, but what I'll say also applies to other languages such as Modula-3 and ML.

Perhaps applets are merely toys, but in the real world it's very common to build applications from components, where you don't have control over all the components. What can you say about the security of a system you build from untrusted components?

In a type-safe language, if it type-checks then it can't go wrong. The technical term "go wrong" means to mistake an integer for a pointer, to dereference a pointer at the wrong type, to access a private variable from outside its class, and so on. If a Java class doesn't go wrong, then it can't trash the private variables of other classes to which it is linked. If you link to Java classes, they may have bugs, but you can still rely on your own classes' private variables. So you may not be able to guarantee the correctness of the system you build (if there's a bug in the arithmetic component it might not compute the customer's taxes right) but you can hope to guarantee the safety (it won't trash the customer's disk).

The JIT security hole

Most JVM's don't interpret the Java bytecodes directly -- they Just-In-Time compile the byte codes to native machine code, and execute the native code, which is about ten times faster. The problem is that the Java verifier type-checks the bytecodes, not the native code. So if there's a bug in the JIT, then the native code might go wrong.

But you trust the JIT, right, because it comes from a major software vendor? Unfortunately, the JIT is a large, complicated program -- an optimizing compiler -- and it will inevitably have a bug. Some JITs contain a million lines of code! You can trust that the JIT vendor has not put malicious attacks inside the JIT, and has kept viruses out of the JIT, but you can't realistically trust that there are no bugs at all.

For most software, it's all right if there are a few nonmalicious bugs. But bugs in the JIT can be exploited by a malicious attacker who sends you an applet (or provides a class that you want to use as a component of your software). A clever attacker can tweak his code on purpose to make the JIT compiler produce code that goes wrong, mistaking an integer of his choice for an array of ints. Then all bets are off: he can put whatever bits he wants in memory at whatever location he wants, and you have no security left at all.

Plugging the hole with computer science

Recent research in compiler technology has provided a promising solution to the JIT security hole. First I'll explain how traditional compilers for type-safe languages (including all existing JITs) work. (See Figure 1.)

The compiler front end translates the source program into a high-level intermediate representation (IR) annotated with type declarations. The type-checker runs on this IR, rejecting any unsound programs. Then the type declarations are discarded, and the optimizer does analyses and transformations on the IR to make the program faster; the code generator translates into a machine-specific IR, the register allocator fills in some details, and out comes a machine-language program.

If there's a bug in the optimizer, or the code generator, or the register allocator, etc. then the program may crash, or a maliciously designed program may cause intentional harm.

The new technology uses typed intermediate languages -- at each level, the IR still has type declarations, and a type-checker can be run to verify the soundness of the program. Of course, if the compiler has no bugs, then it's unnecessary to run the typechecker at each level -- once the source program is type-checked at the first IR, then each lower-level IR ought to type-check. But the untrusting user will want to type-check the machine-language program that comes out the end of the compiler. (See Figure 2.)

The lower-level IR's can be a good deal more complicated than the source language (e.g., Java). Only in the last few years have computer scientists designed type systems capable of dealing with lower levels of the compiler. In fact, the type checking of a machine-language program looks very much like a mathematical proof.

This has led to the notion of proof-carrying code. The compiler produces a machine-language program and a proof that the program is safe to execute. Checking this proof is a simple -- but long and tedious -- process, ideally suited for computers. Your trusted vendor will give you a proof-checker, and your JVM can run it on the output of the JIT. (See Figure 3.)

Trusting the protection mechanism

How can you trust the proof checker if you can't even fully trust the JIT? The proof-checker is a 1000-line program that can be subjected to meticulous scrutiny by the vendor or by an independent third party. The JIT is 100,000 lines or more of complex, proprietary software that the vendor won't want to release to the world in source code. So it's very reasonable to believe that the proof-checker can be free of bugs. Once you can trust the proof-checker, you can leverage that trust by using it to check the safety of many other programs from people you don't trust.

Prescriptions

Type-safe languages -- such as Java -- have the potential to allow you to construct safe software from untrusted components. But the technology in JITs now on the market is not completely bulletproof, so you would be wise -- this year -- to know where your components (and applets) are coming from. Help is on the way, and in a couple of years we can expect to see JIT products that use typed-intermediate-language and proof-carrying-code technologies for higher-assurance safety.

Resources

Typed intermediate languages were pioneered by Robert Harper and Greg Morrisett in the TILT and TALC research compilers. FLINT is a research compiler for ML and Java that uses typed-intermediate-language technology.

Proof-carrying code was invented by George Necula and Peter Lee for the Touchstone compiler for a safe subset of C. The Secure Internet Programming project at Princeton University is conducting research on many aspects of computer security, including proof-carrying code.

A start-up company called Cedilla Systems is adapting the Touchstone technology to Java, and expects to release a high-assurance Java compiler in early 2000.

The book Securing Java by Gary McGraw and Ed Felten has comprehensive advice about interacting with untrusted Java classes. Ken Thompson's classic Turing Award lecture, Reflections on Trusting Trust, shows the limitations of working with untrusted code.

About the author

Andrew W. Appel is Professor of Computer Science at Princeton University. He has done research in the optimizing compilation of type-safe programming languages and efficient algorithms for garbage collection. His current research is in computer security, including proof-carrying code, distributed authentication frameworks, safe runtime systems, and software-engineering approaches to security. He is the author of Modern Compiler Implementation in Java, a widely-used textbook.