Research Projects

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z


3D Content Creation Made Easy Through Non-Photorealism [ archived: no longer active ]

We are developing tools for stylized content creation, via interfaces for (1) easily sketching general free-form shapes, and then (2) directly annotating those shapes with hand-drawn strokes resembling pencil, pen, pastel, or other media. The resulting 3D scene will look much like a drawing, even as it is animated or viewed interactively. Applications of this technology include technical illustration, architecture, education, virtual reality, animation, advertising, and games – any context in which dynamic imagery is used to tell stories, communicate, explain, or inform.

People: Adam Finkelstein

3D Shape-Based Retrieval and Analysis

Our goal is to investigate issues in shape-based retrieval and analysis of 3D models. As a first step, we have developed a search engine for 3D polygonal models. The main research issues are to develop effective shape representations and query interfaces.

People: David Dobkin, Thomas Funkhouser, Adam Finkelstein, Szymon Rusinkiewicz


Acoustic Modeling for Virtual Environments [ archived: no longer active ]

The primary challenge in acoustic modeling is computation of reverberation paths from a sound's source position to a listener's receiving position. We have developed data structures and algorithms to enable interactive simulation of acoustic effects in large 3D virtual environments. Our approach is to precompute and store a spatial data structure that can be later used during an interactive session for evaluation of reverberation paths.

People: Thomas Funkhouser

AMPL [ archived: no longer active ]

Modeling language for mathematical optimization, AMPL is a comprehensive and powerful algebraic modeling language for linear and nonlinear optimization problems, in discrete or continuous variables. Developed at Bell Laboratories, AMPL lets you use common notation and familiar concepts to formulate optimization models and examine solutions, while the computer manages communication with an appropriate solver. AMPL's flexibility and convenience render it ideal for rapid prototyping and model development, while its speed and control options make it an especially efficient choice for repeated production runs.

Anycast Services [ archived: no longer active ]

Anycast Services

People: Michael Freedman

Aphasia [ archived: no longer active ]

The Aphasia Project is a multi-disciplinary research project investigating how technology can be designed to support individuals with aphasia in their daily life. Aphasia is a cognitive deficit impairing language abilities ranging from speaking, reading, writing to comprehending speech. It usually results from stroke or head trauma and affects about one million people in the United States. We work with speech pathologists and people who have aphasia to explore communication and high-level applications that combine images, text, and sound.

People: Maria Klawe

Applications of data mining and mechanisms in healthcare

Applications of data mining and mechanisms in healthcare

People: Mark Braverman

Aspect-Oriented Functional Programming (AspectML) [ archived: no longer active ]

This project explores the semantics and implementation of aspect-oriented programming language features in the context of typed functional languages such as ML. Our achievements so far include definition of the first minimalist, typed calculus of aspect-oriented programs and proof of type safety. We have also extended the basic calculus to allow for polymorphic and type-directed programming. Finally, we have defined a new concept called "harmless advice." Harmless advice is aspect-oriented advice that cannot interfere with the mainline computation. We have a prototype implementation of harmless advice in a typed functional language and have proven a strong noninterference property for it.

People: David Walker

Audicle [ archived: no longer active ]

On-the-fly, smart audio programming environ/mentality. We present a new type of audio programming environment that integrates the programmability of the development environment with elements of the runtime environment. The result, called the Audicle, is a novel integration of a concurrent smart editor, compiler, virtual machine, and debugger, all running in the same address space, sharing data, and working together at runtime.

People: Perry Cook


Big data: anonymity, privacy, ethics

Big data: anonymity, privacy, ethics

People: Arvind Narayanan

Bioinformatics & Functional Genomics

The new era of large-scale experimental methods in molecular biology has transformed it into an information-based science, making bioinformatics an integral part of genomic research. The research focus of the Laboratory of Bioinformatics and Functional Genomics is the development of integrated computational and experimental technologies for the study of gene function and regulation in biological systems through analysis, modeling, and visualization of heterogeneous biological data. The is is a joint laboratory with the Department of Computer Science and the Lewis-Sigler Institute for Integrative Genomics.

People: Olga Troyanskaya

Biological Process Inference from Experimental Interaction Evidence (BioPIXIE) [ archived: no longer active ]

Identification of Biological Networks from Diverse Genomic Data This work addresses a key issue in systems biology research of how to integrate the myriad of genome-wide data being generated by the research community into meaningful biological pathway and network predictions. The BioPIXIE system harnesses the vast amounts of gene expression and functional genomic data to extract information about biological networks specific to particular biological processes.

People: Olga Troyanskaya

Bitcoin and Cryptocurrencies

Bitcoin and Cryptocurrencies

People: Arvind Narayanan

Boosting [ archived: no longer active ]

Boosting is a general method of producing a very accurate prediction rule by combining rough and moderately inaccurate "rules of thumb." Much recent work has been on the "AdaBoost" boosting algorithm and its extensions.

People: Robert Schapire


CASS: Content-Aware Search System

This project investigates how to build an efficient, high-quality content-based similarity search engine for feature-rich (non-text) data, which has dominated the increasing volume of digital information. The research topics include sketch construction, indexing for similarity search, distance functions for different feature-rich data types, integration with attribute-based search tools, content-addressable and searchable storage system, and Memex systems. The current toolkit is used to construct search systems for four data types including audio recordings, digital photos, 3D shapes, and genomic micro-array data.

People: Moses Charikar, Perry Cook, Kai Li, Jennifer Rexford, Olga Troyanskaya

CertiCoq: Principled Optimizing Compilation of Dependently Typed Programs

The CertiCoq project aims to build a proven-correct compiler for dependently-typed, functional languages, such as Gallina—the core language of the Coq proof assistant. A proved-correct compiler consists of a high-level functional specification, machine-verified proofs of important properties, such as safety and correctness, and a mechanism to transport those proofs to the generated machine code. The project exposes both engineering challenges and foundational questions about compilers for dependently-typed languages.

People: Andrew Appel

ChucK [ archived: no longer active ]

A concurrent, on-the-fly audio programming language ChucK is a new audio programming language for real-time synthesis, composition, and performance, which runs on MacOS X, Windows, and Linux. ChucK presents a new time-based concurrent programming model, which supports a more precise and fundamental level of expressiveness, as well as multiple, simultaneous, dynamic control rates, a precise and straightforward concurrent programming model, and the ability to add, remove, and modify code, on-the-fly, while the program is running, without stopping or restarting.

People: Perry Cook

CoDeeN [ archived: no longer active ]

CoDeeN is an academic testbed Content Distribution Network (CDN) built on top of PlanetLab. This testbed CDN consists of a network of high-performance proxy servers that behave both as request redirectors and server surrogates. They cooperate with each other and collectively provide a fast and robust web content delivery service to CoDeeN users. The CoDeeN project page provides links to a number of associated projects including: CoBlitz, a scalable Web-based distribution service for large files; CoDeploy, an efficient synchronization tool for PlanetLab slices; CoDNS, a fast and reliable name lookup service; CoTop, a command-line activity monitoring toolfor PlanetLab; CoMon, a Web-based general node/slice monitor that monitors most PlanetLab nodes; and CoTest, a login debugging tool that tries to be human-friendly.

People: Vivek Pai, Larry Peterson

Computational Molecular Biology

My group develops algorithms for a diverse set of problems in computational molecular biology. We are particularly interested in predicting specificity in protein interactions and uncovering how molecular interactions and functions vary across context, organisms and individuals. We leverage high-throughput biological datasets in order to develop data-driven algorithms for predicting protein interactions and specificity; for analyzing biological networks in order to uncover cellular organization, functioning, and pathways; for uncovering protein functions via sequences and structures; and for analyzing proteomics and sequencing data. An appreciation of protein structure guides much of our research.

People: Mona Singh

Computational Neuroscience

The Seung Lab uses techniques from machine learning and social computing to extract brain structure from light and electron microscopic images.

People: Sebastian Seung

Computational universality vs. noise

Computational universality vs. noise

People: Mark Braverman

Concurrent C Minor [ archived: no longer active ]

Concurrent C Minor is an international research project to connect machine-verified source programs in sequential and concurrent programming languages to machine-verified optimizing compilers.

People: Andrew Appel

COPS and Eiger

Scalable causal consistency for wide-area data replication

People: Michael Freedman


DecoBrush: Drawing Structured Decorative Patterns by Example

Structured decorative patterns are common ornamentations in a variety of media like books, web pages, greeting cards and interior design. Creating such art from scratch using conventional software is time consuming for experts and daunting for novices. We introduce DecoBrush, a data-driven drawing system that generalizes the conventional digital ``painting' concept beyond the scope of natural media to allow synthesis of structured decorative patterns following user-sketched paths. The user simply selects an example library and draws the overall shape of a pattern. DecoBrush then synthesizes a shape in the style of the exemplars but roughly matching the overall shape. If the designer wishes to alter the result, DecoBrush also supports user-guided refinement via simple drawing and erasing tools. For a variety of example styles, we demonstrate high-quality user-constrained synthesized patterns that visually resemble the exemplars while exhibiting plausible structural variations.

People: Adam Finkelstein

Detection and Analysis of Chromosomal Abnormalities [ archived: no longer active ]

Chromosomal copy number changes play an important role in cancer and in molecular evolution, and we are developing robust algorithms for identifying chromosomal abnormalities accurately on genomic scale. In collaboration with biologists at the Lewis-Sigler Institute for integrative genomics, we are using these algorithms to study chromosomal abnormalities in the context of molecular evolution and cancer. Results of these experiments may shed light on how chromosomal aberrations are involved in carcinogenesis. Our goal is developing both technologies that can uncover fundamental biology and also methods that can be routinely applied clinically to identify medically relevant functional copy number changes.

Discovery, Analysis and Dissemination of Information (DADI) [ archived: no longer active ]

DADI is an integrative research project at Princeton on discovery, analysis and dissemination of information on the Internet. It focuses on both systems and driving applications for a new model of communication and coordination on the Internet: the asynchronous, event-based model, with particular emphasis on decoupled, content-based publish-subscribe (pub-sub) communication at Internet scale.

People: Andrea LaPaugh, Jaswinder Singh


E2E Protocol Design [ archived: no longer active ]

The goal of this work is to make end-to-end protocols (TCP in particular) more effective. Our approach is to both investigate opportunities to improve the congestion control algorithm used by transport protocols, and to explore the use of alternative end-to-end paths across the Internet.

Epigenome-wide association studies

We are currently developing methods for performing epigenome-wide scans for association of methylation status with phenotypes of interest.

People: Barbara Engelhardt

Expert-Driven Data Analysis and Visualization in Genomics [ archived: no longer active ]

Effective visualization-based analysis is critical to unlocking the full potential of genomic data and to support collaborative research that is commonplace in genomics. Currently available methods are designed to visualize a single dataset in limited ways and are often hampered by the limited resolution and size of traditional displays. We are developing methodologies that enables experts to drive analysis through visualization and iterative feedback. These methods are dynamic and scalable: they can be used on either desktop screens or on large wall-size displays thereby supporting both individual and collaborative analysis by groups of investigators.

Extensible Router [ archived: no longer active ]

The goal of this project is to build a prototype router that (1) is easily extended to support new network services (including overlay and peer-to-peer networks, firewalls, media gateways, proxies, and cluster-based servers), and (2) exploits commercially available hardware components (including commodity processors, network processors, and high-speed system area networks).


EyeWire is a game to map the brain from Seung Lab at MIT. Anyone can play and you need no scientific background. Over 130,000 people from 145 countries already do. Together we are mapping the 3D structure of neurons; advancing our quest to understand ourselves.

People: Sebastian Seung


Fault-tolerant Datacenter Services [ archived: no longer active ]

Fault-tolerant Datacenter Services

FCMA: Full Correlation Matrix Analysis of Human Brains

FCMA: Full Correlation Matrix Analysis of Human Brains

People: Kai Li

Foundational Proof-Carrying Code [ archived: no longer active ]

Proof-carrying code (PCC) is a promising new technology, developed in a proof-of-concept prototype by Necula and Lee. We propose to fully realize the ultimate impact of this technology by scaling it up to real programming languages, production compilers, and sophisticated security policies. We will pay particular attention to scalability, interoperability, efficiency, and the principled interaction of security policies with containment mechanisms.

People: Andrew Appel, David Walker


Programming language for software-defined networks

People: Michael Freedman

FunctionalFlow [ archived: no longer active ]

This project develops algorithms for analyzing protein interaction graphs in order to predict protein function.


GeneVand [ archived: no longer active ]

Visualization-based Analysis of Microarray data This visualization system enables scientists to visually analyze microarray datasets in multiple ways to examine data quality and relationships among various clusters or biologically relevant groups of genes. GeneVAND is dynamic and is scalable from regular desktop displays to large display walls.

Gretchen [ archived: no longer active ]

Sound/video installation.

People: Perry Cook


Illustration and Analysis of Images with Normals [ archived: no longer active ]

We are investigating acquisition, analysis, and rendering techniques that
begin with datasets inherently more powerful than single images, yet easier
to acquire with high quality than full 3D scans. In particular, we are
interested in the power of images with a per-pixel color and normal
("RGBN" images), and recent work shows how to adapt signal processing,
segmentation, and stylized depiction algorithms to this new datatype.

People: Adam Finkelstein, Szymon Rusinkiewicz


ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.

People: Kai Li

Immune System Modeling & Simulation (IMMSIM) [ archived: no longer active ]

IMMSIM is an in machina cellular automata model of the humoral and cellular immune responses. When a foreign substance is introduced into our bodies, our immune system acts to neutralize that substance. This is a complex process involving the collective and coordinated response of approximately 10^12 cells. This project is to fit detailed experimental findings into a comprehensive model of the immune system by using computer models and simulations.

Incrementally Deployable Secure Inter-domain Routing [ archived: no longer active ]

The Internet's interdomain routing system is notoriously vulnerable to malicious attacks and configuration mistakes. Proposals for a secure interdomain-routing protocol have been stymied, at least in part, by the inability to have a "flag day" on which routers throughout the Internet upgrade to the new protocol. In this project, we investigate incrementally deployable techniques for improving interdomain routing security, building on the Routing Control Platform (RCP) that selects routes on behalf of each router in a network, while remaining backwards compatible with the legacy equipment. The RCP provides a natural place to run anomaly-detection algorithms (to avoid selecting suspicious routes), apply network-wide routing policies, and upgrade a network to a more secure routing protocol.

People: Jennifer Rexford

Information Plane [ archived: no longer active ]

The goal of this project is to architect and demonstrate an information plane: a distributed system running throughout the network that collects, stores, propagates, aggregates, analyzes and reacts to observations about the network's behavior. An information plane can be used to manage networks and network services; detect and defend against anomalies; and diagnose and recover from failures.

Interactive Architectural Walkthroughs [ archived: no longer active ]

Interactive computer programs that simulate the experience of "walking" through a building interior are useful for visualization and evaluation of building models before they are constructed. The challenge is to identify the relevant portions of the model, swap them into memory and render them at interactive frame rates (at least fifteen frames per second) as the observer's viewpoint is moved under user control. This project develops systems that support interactive walkthroughs of such large, fully furnished building models.

People: Thomas Funkhouser

Interactive coding theory

Interactive coding theory

People: Mark Braverman


Learning with partial feedback

Numerous decision problems reveal noisy, corrupt or otherwise partial feedback on choices. The Bandit Convex Optimization framework is a general methodology for tackling such missing-feedback problems, including the famed multi-armed-bandit problem and its generalizations. Applications include online routing, rank aggregation, matrix completion and more. Our research addresses efficient algorithms for bandit convex optimization, both in the form of general methodologies for exploration, as well as new techniques from optimization, and variance exploitation.

People: Elad Hazan

Liberty Research

The Liberty Computer Architecture Research Group exploits unique opportunities exposed by considering the interaction of compilers and architectures to increase performance, to improve reliability, to reduce cost, to lower power, and to shorten the time to market of microprocessor systems. This objective is accomplished by providing critical computer architecture and compiler research, expertise, and prototypes to the community.

People: David August


Measurement and representation of realistic surface appearance [ archived: no longer active ]

The appearance of a surface can be characterized by the amount of light it reflects at each surface position, for each pair of directions of incident and reflected light. We are working on methods for capturing and representing these high-dimensional, multi-gigabyte reflectance datasets using techniques including reparameterization and dimensionality reduction based on factorization. The goal is to enable efficient, compact representations that permit high-quality rendering and editing of complex materials with high visual fidelity.

People: Szymon Rusinkiewicz

Mechanism design without money

Mechanism design without money

People: Mark Braverman

Modeling Science [ archived: no longer active ]

This is a browsable 100-topic model estimated from 15,745 articles from the Journal Science.

People: David Blei

Modularity and Secure Linking [ archived: no longer active ]

The goal of the project is to create methods and tools that will make it easier for programmers to write software components that will function securely when linked with potentially hostile components. Our research combines research results on several topics: information hiding and language design, hierarchical modularity, dynamic linking, and access control.

MoSievius - Real-time feature-based audio mosaicing [ archived: no longer active ]

Mosievius is a system to create audio mosaics in real time. It is mainly designed to allow interactive target specification as well as interactive source selection during the mosaicing process. Interactive source selection is achived with the use of the Sound Sieve, a new concept used to isolate segments of sound which exhibit specific qualities. Instead of implementing one particular mosaicing techinique, mosievius provides the tools needed to implement any number of different mosaicing schemes.

People: Perry Cook

Multi-User Virtual Environments [ archived: no longer active ]

Multi-user virtual environment applications incorporate computer graphics, sound simulation, and networks to simulate the experience of real-time interaction between multiple users in a shared three-dimensional virtual world. Each user is represented in the shared virtual environment by an entity rendered on every other user's computer. Multi-user interaction is supported by matching user actions to entity updates (e.g., motion/sound generation) in the shared virtual environment. This project develops network services that support multi-user virtual environments via a client-server design.

People: Thomas Funkhouser

Multivariate Polynomial Factorization for restricted models

Multivariate polynomial factorization is a fundamental algorithmic tool, used in various contexts. In theoretical computer science, it is used to prove hardness-randomness tradeoffs for arithmetic computations. In general, one wants to show that, if a polynomial has bounded complexity (say, it is computed by a small arithmetic formula or circuit), then all of its factors are also of bounded complexity. Many questions of this form are still wide open and this projects aims to close the gaps in our understanding. One recent result of PhD student Oliveira shows that factors of bounded depth circuits are also computable by bounded depth circuits.

People: Zeev Dvir

Music Analysis, Retrieval and Synthesis for Audio Signals (Marsyas) [ archived: no longer active ]

Marsyas is a software framework for rapid prototyping and experimentation with audio analysis and synthesis with specific emphasis to music signals and Music Information Retrieval.

People: George Tzanetakis


Natural Algorithms

For much of my professional life, designing algorithms had been my thing. Then, one day, I watched a majestic flock of geese fly over Carnegie Lake and it dawned upon me that it had been their thing, too. Having been at it for 100 million years, even longer than I had, naturally their algorithmic genius surpassed mine. Undaunted, I resolved to catch up. The premise of my current research is that interpreting biological or social self-organized systems as "natural algorithms" brings upon them a fresh, new perspective ripe for inquiry. I believe that only the algorithm has the expressive power to model complex self-adaptive systems at the right levels of abstraction. Algorithms are the differential equations of the 21st century. Beyond its trite catchiness, this line serves to remind us that mankind's grasp of PDEs vastly exceeds its mastery of algorithms. The first order of business, therefore, is to build new analytical tools for natural algorithms.

People: Bernard Chazelle

Network Servers [ archived: no longer active ]

This project focuses on techniques to improve network servers and networking software. We consider performance optimizations, improved behavior under heavy load, and more robustness to abuse. Much of this work is at the boundary between networking and operating systems.

Network X-ities for Routing Protocols

Given society's increasing reliance on communication networks such as the Internet, it is becoming increasingly important that these networks not only provide good performance, but do so in the face of a complex, uncertain, error-prone, and ever-changing environment. The need for such "robust" network operation leads to a set of design considerations that we refer to as the network X-ities (since they all end in "ity"): non-fragility, manageability, diagnosability, optimizability, scalability, and evolvability. Although these X-ities are crucially important in designing and analyzing robust networks and protocols, they often lack theoretical foundations, quantitative frameworks, or even well-defined metrics and meaning. The goal of this project is to begin to build a solid, quantitative foundation for explicitly considering the X-ities in the design and analysis of network architectures and protocols. We do so by considering a number of specific problems, broadly in the area of routing protocols, that allow us to concretely address several of the X-ities and to begin to draw larger lessons from commonalities among the problems studied.

People: Jennifer Rexford

Network-wide control and Management [ archived: no longer active ]

Today's data networks are surprisingly fragile and difficult to manage. We argue that the root of these problems lies in the complexity of the control and management planes--the software and protocols coordinating network elements--and particularly the way the decision logic and the distributed-systems issues are inexorably intertwined. We advocate a complete refactoring of the functionality and propose three key principles--network-level objectives, network-wide views, and direct control--that we believe should underlie a new architecture. Following these principles, we identify an extreme design point that we call "4D," after the architecture's four planes: decision, dissemination, discovery, and data. The 4D architecture completely separates a network's decision logic from protocols that govern the interaction among network elements. The Routing Control Platform (RCP) is one concrete realization of these ideas, with an emphasis on network-wide control of the routing in a single domain, while remaining backwards compatible with legacy routers.

People: Jennifer Rexford

New techniques for range scanning [ archived: no longer active ]

Although 3D scanning has become a widely-used technique, most commercially available systems acquire data at relatively low rates and require careful control over the setup and calibration of cameras and/or active light sources. We are working on making range scanning systems faster, more flexible, more robust, and easier to use by developing novel scanner designs and 3D scan processing algorithms, including combinations of structured light, space-time stereo, photometric stereo, and efficient registration algorithms.

People: Szymon Rusinkiewicz


On-the-fly programming [ archived: no longer active ]

Using code for runtime expressive control On-the-fly programming (or live coding) is a style of programming in which the programmer/performer/composer augments and modifies the program while it is running, without stopping or restarting, in order to assert expressive, programmable control for performance, composition, and experimentation at run-time.

People: Perry Cook

Online Convex Optimization

In recent years convex optimization and the notion of regret minimization in games have been combined and applied to machine learning in a general framework called online convex optimization. For more information see the survey on the convex optimization approach to regret minimization, or this draft of a course book on online convex optimization in machine learning.

People: Elad Hazan


PADS [ archived: no longer active ]

An ad hoc data source is any data source represented in a format for which data processing tools are not readily available. The PADS language allows users to describe the format of their ad hoc data sources at a high level of abstraction. When given a format description, the PADS compiler automatically generates a set of libraries for manipulating the data including parser, printer and verifier as well as a collection of standalone tools such as an ad hoc-to-XML converter, histogram generator, and statistical analyzer. In addition to a solid implementation, we have recently formulated a semantics for PADS as a novel interpretation of dependent type theory.

People: David Walker

Painting with Triangles

Although vector graphics offer a number of benefits, conventional vector painting programs offer only limited support for the traditional painting metaphor. We propose a new algorithm that translates a user’s mouse motion into a triangle mesh representation. This triangle mesh can then be composited onto a canvas containing an existing mesh representation of earlier strokes. This representation allows the algorithm to render solid colors and linear gradients. It also enables painting at any resolution. This paradigm allows artists to create complex, multi-scale drawings with gradients and sharp features while avoiding pixel sampling artifacts.

People: Adam Finkelstein

Parallel Computing [ archived: no longer active ]

Applications, Architecture, Programming Models, Workload-driven evaluation

Parallel Rendering With Clusters of PCs [ archived: no longer active ]

Our project focuses on designing high-performance rendering systems by clustering a number of low-cost commodity PCs, each of which contains a commodity PC graphics accelerator. As a result of our systems being comprised purely of commodity parts, we are able track the technology curve since we can easily upgrade components of our system. Our research is focuses on designing algorithms that work well in such an environment. The key factors that set our system apart from traditional parallel rendering systems is the loosely coupled nature of our rendering systems and the high costs of communication between the nodes in the system.


The Princeton Application Repository for Shared-Memory Computers (PARSEC) is a benchmark suite composed of multithreaded programs. The suite focuses on emerging workloads and was designed to be representative of next-generation shared-memory programs for chip-multiprocessors.

People: Kai Li


Inferring and Enforcing Security Policies in Web Applications

People: Michael Freedman

Peer-to-peer CDNs [ archived: no longer active ]

Peer-to-peer CDNs

Physically Inspired Stochastic Event Modeling (PhISEM) [ archived: no longer active ]

The PhISEM project provides support for the "new algorithms lead to new controllers lead to new algorithms..." principles. This work on the synthesis particle-type percussion and real-world sounds led to set of new instruments, not only for control of shaker/scraper sounds and sound effects, but also for algorithmic interactive music.

People: Perry Cook


Multi-tenant resource fairness for shared datacenter services

People: Michael Freedman


The PlanetLab Consortium is a collection of academic, industrial, and government institutions cooperating to support and enhance the PlanetLab overlay network. It is responsible for overseeing the long-term growth of PlanetLab's hardware infrastructure; designing and evolving its software architecture; providing day-to-day operational support; and defining policies that govern appropriate use. The PlanetLab Consortium is managed by Princeton University, the University of California at Berkeley, and the University of Washington. Princeton currently hosts the Consortium.

People: Larry Peterson

Plato - Defining Modern Logics for Modern Programmers [ archived: no longer active ]

The goal of the Plato project is to develop software engineering tools and techniques that help modern programmers create more reliable and secure component software systems. In order to accomplish our goal, we are looking at ways modern logics such as linear logic and modal logic can be used to specify and check common program properties. For instance, over the last several years we have shown how modal logic can be used as a type system for distributed programs, used linear and hybrid logics to check stack and region allocation invariants, and used linear logic again for checking memory safety in programs with heap allocation.

People: David Walker


We are developing a scalable routing underlay that can be used by overlay networks to discover the underlying Internet topology.

Polymer - A Language for Composing Run-Time Security Policies [ archived: no longer active ]

The Polymer project studies all facets of the theory, design, and implementation of software program monitors and monitor-specification languages. On the theoretical side, we are particularly interested in understanding the range of security properties that can be enforced by program monitors. Recently, we have shown how to define program monitors as formal automata that edit the behavior of the software they monitor. Surprisingly, we have proven that these monitors can enforce a new class of non-safety properties called "Infinite Renewel Properties." On the design and implementation side, we have developed an extension of Java that facilitates the implementation of program monitors. Our language (also called Polymer) provides high-level constructs for reacting to dangerous program actions as well as useful combinators that allow us to construct complex monitors from simpler ones.

People: David Walker

Population Structure and Matrix Factorization

Matthew Stephens and I considered the problem of identifying latent structure in a population of individuals. We considered the two methods most commonly applied to this problem, namely, admixture models and principal components analysis (PCA), in the framework of matrix factorization methods with different matrix constraints. Within this framework, we described a sparse factor analysis model (SFA) that encouraged sparsity on the factor loadings through an automatic relevance determination prior. Results from SFA bridged the gap between admixture models and PCA: SFA did not over-regularize the data like admixture models tend to do, but, unlike PCA, sparsity enabled well-separated populations to each be associated with a single factor, making the results interpretable as with admixture models. However, we found that the methods produced similar results for continuous populations; a sample of 1387 individuals with approximately 200,000 SNPs from Europe mapped to two factors captured the geography of the sample well in all three methods. We are currently developing factor analysis models that have effective sparsity-inducing priors that go beyond automatic relevance determination priors and have better conjugacy properties the traditional spike-slab type priors.

People: Barbara Engelhardt

Postmanet [ archived: no longer active ]

We are developing a high-bandwidth/delay-tolerant network that seamlessly integrates traditional networks, storage media, and the postal delivery infrastructure. The use of Postmanet is a distance learning system that would allow resource-starved village schools in rural India to benefit from the better human and content resources available in the urban environments. The proposed Digital StudyHall system has two other novel key components besides Postmanet: a mechanism that turns regular TV screens into ``networked thin client displays," and a web repository that collects education content, and connects learners and teaching staff across time and space, so staff in urban schools and volunteers (potentially from overseas) can contribute in a way that allows them to make flexible time and location commitments.

People: Randy Wang

Prediction of Protein Function and Regulation [ archived: no longer active ]

Now that genomic sequencing of whole organisms is a routine technology, the next key challenge in genomics is understanding gene function and regulation. Functional genomics aims to determine what these genes do (gene function) and how they are controlled inside the cell (regulation). Experimental approaches to these problems have led to an explosion of functional genomics data, but these datasets are large, very noisy, and highly heterogeneous, making accurate analysis by existing computational methods impossible. Novel computing methodologies developed specifically for biological data are essential to realize the potential of functional genomics. We are developing such methodologies based on machine learning, statistical, and data mining techniques.

Princeton Laptop Orchestra (Plork)

The (completely insane) Princeton laptop orchestra.

People: Perry Cook

Princeton S* Network Systems (SNS) group

The Princeton S* Network Systems (SNS) group within Princeton’s Computer Science Department. The undefined S* — Scalable, Secure, Self-Organizing, Self-Managing, Service-centric, Storage-based — characterizes the broad scope of our research.

People: Michael Freedman

Princeton Shape Benchmark [ archived: no longer active ]

The Princeton Shape Benchmark provides a repository of 3D models and software tools for evaluating shape-based retrieval and analysis algorithms. The motivation is to promote the use of standardized data sets and evaluation methods for research in matching, classification, clustering, and recognition of 3D models.

People: Thomas Funkhouser

Private Information Retrieval

A Private Information Retrieval (PIR) Scheme allows a user to retrieve an entry from a database stored on 2 non communicating servers in such a way that no individual server learns any information about the entry the user requested. A PIR scheme is measured by its `communication cost' or the total number of bits communicated, as a function of the database size, n. This project already resulted in a dramatic improvement to the current state of the art schemes, reducing their communication cost from n^{0.333} to sub-polynomial in n. Joint work with Sivakanth Gopi.

People: Zeev Dvir

Projection free learning

The computational bottleneck in applying state-of-the-art iterative methods to ML/optimization is often the so-called "projection step". We design projection-free optimization algorithms that replaces projections by more efficient linear optimization steps. Recent results include a projection-free algorithm for online learning and the first linearly convergent projection-free algorithm.

People: Elad Hazan

Proof-Carrying Authorization [ archived: no longer active ]

We have developed a general and powerful distributed authentication framework based on higher-order logic. Authentication frameworks --- including Taos, SPKI, SDSI, and X.509 --- have been explained using logic. We show that by starting with the logic, we can implement these frameworks, all in the same concise and efficient system. Because our logic has no decision procedure --- although proof checking is simple --- users of the framework must submit proofs with their requests.

People: Andrew Appel, Edward Felten

Protein molecular function prediction

As a graduate student with Dr. Michael Jordan, collaborating with Dr. Steven Brenner, I created a statistical methodology, SIFTER (Statistical Inference of Function Through Evolutionary Relationships), to capture how protein molecular function evolves within a phylogeny in order to accurately predict function for unannotated proteins, improving over existing methods that use pairwise sequence comparisons. We relied on the assumption that function evolves in parallel with sequence evolution, implying that phylogenetic distance is the natural measure of functional divergence. In SIFTER, molecular function evolves as a first-order Markov chain within a phylogenetic tree. Posterior probabilities are computed exactly using message-passing, with an approximate method for large or functionally diverse protein families; model parameters are estimated using generalized expectation maximization. Functional predictions are extracted from protein-specific posterior probabilities for each function. I applied SIFTER to a genome-scale fungal data set, which included families of proteins from 46 fully-sequenced fungal genomes, and SIFTER substantially outperformed state-of-the-art methods in producing correct and specific predictions.

People: Barbara Engelhardt

ProteinPathGrep [ archived: no longer active ]

This project develops a system for interrogating protein interaction networks in order to retrieve matches to putative functional modules.


QSplat [ archived: no longer active ]

QSplat is a program for displaying large geometric models in real time. It was originally designed during the course of the Digital Michelangelo Project , to render the hundred-million-polygon models we were producing. It features, fast startup and progressive loading, real-time interactive display with a user-selectable frame rate, refinement to a high-quality rendering when idle, point-based multiresolution representation with splat rendering, level-of-detail control, and hierarchical visibility culling, compact representation - 6 bytes per input point including normals, 9 bytes including color, and fast preprocessing.

People: Szymon Rusinkiewicz


Radiosity Lighting Simulation [ archived: no longer active ]

Radiosity methods accurately simulate diffuse indirect illumination and shadows, and thus are used to generate realistic-looking lighting models for a variety of virtual environments, including building interiors. A difficult challenge for radiosity systems is managing the algorithmic complexity (O(n^2)) and massive data storage requirements (GBs) typical in such computations. We have developed a radiosity system that computes radiosity solutions for very large polygonal models.

People: Thomas Funkhouser

RealPigment: Paint Compositing by Example

The color of composited pigments in digital painting is generally computed one of two ways: either alpha blending in RGB, or the Kubelka-Munk equation (KM). The former fails to reproduce paint like appearances, while the latter is difficult to use. We present a data-driven pigment model that reproduces arbitrary compositing behavior by interpolating sparse samples in a high dimensional space. The input is an of a color chart, which provides the composition samples. We propose two different prediction algorithms, one doing simple interpolation using radial basis functions (RBF), and another that trains a parametric model based on the KM equation to compute novel values. We show that RBF is able to reproduce arbitrary compositing behaviors, even non-paint-like such as additive blending, while KM compositing is more robust to acquisition noise and can generalize results over a broader range of values.

People: Adam Finkelstein

Reassembling the Thera Wall Paintings [ archived: no longer active ]

The archaeological site of Akrotiri on the island of Thera (modern-day
Santorini) has proven a treasure trove of information about Aegean
civilization and culture. Among its most valued artifacts are wall
paintings (frescoes), which have been preserved in the volcanic ash since
the seventeenth century BC. The frescoes, however, are typically recovered in fragments of a few centimeters to a few tens of centimeters in length, and reconstructing complete wall sections occupies a major portion of the
effort at Akrotiri.

We are engaged in a project to assist archaeologists and conservators by
digitizing excavated fragments and using computer algorithms to
automatically propose matches on the basis of 3D edge profile, color, and
other cues. An intuitive user interface will allow conservators to see and evaluate matches on the basis of any or all of the above criteria. We hope to greatly reduce the time that is currently spent manually testing large numbers of fragments against each other in the search for matches.

People: Thomas Funkhouser, Szymon Rusinkiewicz


Scalable Display Wall [ archived: no longer active ]

This project explores research issues on how to build and use immersive computer systems. In particular, we are interested in building immersive systems for users to collaborate across space and time. Current research topics include, seamless imaging, parallel rendering, shared data visualization, spatialized sound, camera-based tracking, and design methodologies.

People: Kai Li

Sea of Images - Image-Based Rendering of Large Environments [ archived: no longer active ]

"Sea of Images" is a practical approach to dense sampling, storage, and reconstruction of the plenoptic function in large, complex indoor environments. We use a motorized cart to capture omnidirectional images every few inches on a eye-height plane throughout an environment. The captured images are compressed and stored in a multi-resolution hierarchy suitable for real-time pre-fetching during an interactive walkthrough. Later, novel images are reconstructed for a simulated observer by re-sampling nearby captured images.

People: Thomas Funkhouser

Secure Applications for handheld Devices [ archived: no longer active ]

We are studying how to make smartcard-like devices more secure by exploiting the fact that they may be able to interact with the user directly.

People: Edward Felten

SEEK: Search Engine for Heterogeneous Human Gene-Expression Compendia

SEEK: Search Engine for Heterogeneous Human Gene-Expression Compendia

People: Kai Li


Service-centric networking

People: Michael Freedman

Service-centric networks [ archived: no longer active ]

Service-centric networks

Sndtools [ archived: no longer active ]

Real-time audio DSP and 3d visualization.

People: Perry Cook

Spasm [ archived: no longer active ]

A synthetic vocal tract model Synthesis of the Singing Voice Using a Waveguide Articulatory Vocal Tract Model.

People: Perry Cook

SPORC and Frientegrity

Untrusted cloud storage and social networks

People: Michael Freedman


SSDAlloc is a system that allows applications to use solid state memory, such as NAND Flash and Solid-State Disks (SSDs) as transparent extensions to the DRAM in a system, instead of just as a disk replacement technology. Solid state memory uses little power, costs about one-tenth as much as DRAM, and is available in multi-terabyte (TB) capacities. In comparison, few servers even support a single terabyte of DRAM, with most supporting only a few hundred gigabytes at most. What this means in practical terms is that companies that use hundreds of servers to support large in-memory workloads can drastically cut their server count, power, and cooling costs by using SSDAlloc to expand the memory per server.

People: Vivek Pai

Standard ML of New Jersey [ archived: no longer active ]

A compiler for the type-safe functional programming language ML.

People: Andrew Appel

Statistical Analysis of Genetic Association Studies

Survey-based GWAS. Genome-wide association studies (GWAS) identify genetic variants that are associated with the occurrence of a complex phenotype or disease in a set of individuals. Many phenotypes are difficult to quantify with a single measure. I am building methods for conducting GWAS using survey data as the phenotype. Standard dimensionality reduction techniques are not effective for scaling down the size of the data because the resulting phenotype summaries were not interpretable. In prior work, we applied SFA and found that the sparse solution had phenotypic interpretations for all of the factors, and genetic associatons for a number of phenotypes. Our current work goes well beyond this model for greater robustness and inference of the number of factors from the underlyng data.

People: Barbara Engelhardt

Structural Modeling

Prevalent computer architecture modeling methodologies are prone to error, make design-space exploration slow, and create barriers to collaboration. The Structural Modeling Project addresses these issues by providing viable structural modeling methodologies to the community. The Liberty Simulation Environment showcases this approach and serves as the core of a new international standardization effort called Fraternité.

People: David August

Stylized Keyframe Animation of Fluid Simulations

We present a method that combines hand-drawn artwork with fluid simulations to produce animated fluids in the visual style of the artwork. Given a fluid simulation and a set of keyframes rendered by the artist in any medium, our system produces a set of in-betweens that visually matches the style of the keyframes and roughly follows the motion from the underlying simulation. Our method leverages recent advances in patch-based regenerative morphing and image melding to produce temporally coherent sequences with visual fidelity to the target medium. Because direct application of these methods results in motion that is generally not fluid-like, we adapt them to produce motion closely matching that of the underlying simulation. The resulting animation is visually and temporally coherent, stylistically consistent with the given keyframes, and approximately matches the motion from the simulation. We demonstrate the method with animations in a variety of visual styles.

People: Adam Finkelstein

Sublinear Optimization

In many modern optimization problems, particularly those arising in machine learning, the amount data is too large to apply standard convex optimization methods. We develope new optimization algorithms that make use of randomization to prune the data produce a correct solution albeit running in time which is smaller than the data representation, i.e. sublinear running time. Such sublinear-time algorithms are applied to linear clasiffication, training support vector machines, semidefinite optimization and other problems. These new algorithms are based on a primal-dual approach, and use a combination of novel sampling techniques and the randomized implementation of online learning algorithms. Results are many times accompanied by information-theoretic lower bounds that show our running times to be nearly best possible in the unit-cost RAM model.

People: Elad Hazan

Suggestive Contours [ archived: no longer active ]

When artists design imagery to portray a visual scene, they need not just render visual information veridically. They can select the visual cues to portray, and adapt the information each carries. Their results can depart dramatically from natural scenes, but can nevertheless convey visual information effectively, because viewers' perceptual inferences still work flexibly to arrive at a consistent understanding of the imagery. We suggest that lines in line-drawings can convey information about shape in this indirect way, and work to develop tools for realizing such lines automatically in non-photorealistic rendering. In the figure above, the picture on the left renders silhouettes.

People: Adam Finkelstein, Szymon Rusinkiewicz

Synthesis ToolKit in C++ (STK) [ archived: no longer active ]

STK is a set of open source audio signal processing and algorithmic synthesis classes written in the C++ programming language. STK was designed to facilitate rapid development of music synthesis and audio processing software, with an emphasis on cross-platform functionality, realtime control, ease of use, and educational example code.

People: Perry Cook



As chip densities and clock rates increase, processors are becoming more susceptible to error-inducing transient faults. In contrast to existing techniques, the THRIFT Project advocates adaptive approaches that match the changing reliability and performance demands of a system to improve reliability at lower cost. This project introduced the concept of software-controlled fault tolerance.

People: David August

Typed Assembly Language (TAL) [ archived: no longer active ]

TAL extends traditional untyped assembly languages with typing annotations, memory management primitives, and a sound set of typing rules. These typing rules guarantee the memory safety, control flow safety, and type safety of TAL programs. Moreover, the typing constructs are expressive enough to encode most source language programming features including records and structures, arrays, higher-order and polymorphic functions, exceptions, abstract data types, subtyping, and modules. Just as importantly, TAL is flexible enough to admit many low-level compiler optimizations. Consequently, TAL is an ideal target platform for type-directed compilers that want to produce verifiably safe code for use in secure mobile code applications or extensible operating system kernels. We have implemented a variant of TAL for Intel's IA32 architecture called TALx86, and have written a compiler for a safe C-like language called Popcorn to TALx86.

People: David Walker


Understanding how eQTLs work by looking across eQTL studies, cell types, and regulatory element data

As part of the GTEx consortium, and in collaboration with Casey Brown, we have conducted large-scale replication studies across eleven studies in seven tissue types. We have overlaid these results onto regulatory element data to enable a much more profound mechanistic understanding of eQTL data by looking at where the eQTLs and also the cell type specific eQTLs are co-located with specific cis-regulatory elements.
We are currently developing statistical models for understanding eQTLs and variants that influence mRNA isoform levels in RNA-seq data. We are also working on predictive models for eQTLs across tissue types and models that consider replication in trans-eQTLs.

People: Barbara Engelhardt



The VELOCITY Compiler Project aims to address computer architecture problems with a new approach to compiler organization. This compiler organization, embodied in the VELOCITY Compiler (and derivative run-time optimizers), enables true whole-program scope, practical iterative compilation, and smarter memory analysis. These properties make VELOCITY better at extracting threads, improving reliability, and enhancing security.

People: David August

Verified Software Toolchain

The software toolchain includes static analyzers to check assertions about your program; optimizing compilers to translate your program to machine language; operating systems and libraries to supply context for your program. The Verified Software Toolchain project assures with machine-checked proofs that the assertions claimed at the top of the toolchain really hold in the machine-language program, running in the operating-system context.

People: Andrew Appel

Vision Group

We study vision science ‒ the computational principles underlying computer vision, robot vision, and human vision. We are interested in building computer systems that automatically understand visual scenes, both inferring the semantics and extracting 3D structure for a large variety of environments. Our research is also closely related to computer graphics, perception and cognition, cognitive neuroscience, machine learning, HCI, NLP and AI in general.

At the moment, we focus on leveraging Big 3D Data for Visual Scene Understanding (e.g. RGB-D sensors, CAD models, depth, multiple viewpoints, panoramic fields of view), to look for the right representations of visual scenes that realistically describe the world. We believe that it is critical to consider the role of a computer as an active explorer in a 3D world, and learn from rich 3D data that is close to the natural input that humans have.

People: Jianxiong Xiao


Web Privacy

Web Privacy

People: Arvind Narayanan


WordNet is a resource used by researchers attempting to get computers to understand English (and any other language for which a WordNet exists).
Viewing a human language as a very large graph provides a theoretical framework for creating algorithms for understanding the meaning of words
in a text (e.g. determining if "fly" is an insect or a ball hit into left field) and translating documents between languages. We are working on both making WordNet more effective for these tasks and creating new approaches that use WordNet.

People: Moses Charikar, Christiane Fellbaum, Robert Schapire


Zap [ archived: no longer active ]

Managing reliable software systems on unreliable hardware

People: David Walker

ZebraNet [ archived: no longer active ]

The Princeton ZebraNet Project is an inter-disciplinary effort with thrusts in both Biology and Computer Systems. On the computer systems side, ZebraNet is studying power-aware, position-aware computing/communication systems. Namely, the goals are to develop, evaluate, implement, and test systems that integrate computing, wireless communication, and non-volatile storage along with global positioning systems (GPS) and other sensors. On the biology side, the goal is to use systems to perform novel studies of animal migrations and inter-species interactions.

People: Margaret Martonosi