Research Projects
Population Structure and Matrix Factorization
Matthew Stephens and I considered the problem of identifying latent structure in a population of individuals. We considered the two methods most commonly applied to this problem, namely, admixture models and principal components analysis (PCA), in the framework of matrix factorization methods with different matrix constraints. Within this framework, we described a sparse factor analysis model (SFA) that encouraged sparsity on the factor loadings through an automatic relevance determination prior. Results from SFA bridged the gap between admixture models and PCA: SFA did not over-regularize the data like admixture models tend to do, but, unlike PCA, sparsity enabled well-separated populations to each be associated with a single factor, making the results interpretable as with admixture models. However, we found that the methods produced similar results for continuous populations; a sample of 1387 individuals with approximately 200,000 SNPs from Europe mapped to two factors captured the geography of the sample well in all three methods. We are currently developing factor analysis models that have effective sparsity-inducing priors that go beyond automatic relevance determination priors and have better conjugacy properties the traditional spike-slab type priors.
Princeton Advanced Wireless Systems (PAWS) Group
The Princeton Advanced Wireless Systems (PAWS) research group builds, experiments, and evaluates wireless systems that enable data networking, the localization of people, objects, and devices, and intuitive interaction with machines. Our work covers all aspects of wireless computer networks, from the basic architecture of the wireless physical layer to the reliable flow of data between Internet endpoints.
Princeton S* Network Systems (SNS) group
The Princeton S* Network Systems (SNS) group within Princeton’s Computer Science Department. The undefined S* — Scalable, Secure, Self-Organizing, Self-Managing, Service-centric, Storage-based — characterizes the broad scope of our research.
Prof. Martonosi Computer Architecture Research
Prof. Martonosi and her group engage in a range of computer architecture research projects in the areas of Heterogeneous Parallelism, Verifiable and Secure Memory Models, and Quantum Computing. Their work has led to the top-cited papers in several major conferences, as well as real-world impact through deployments.
Pronto
Project Pronto is building and deploying a beta-production end-to-end 5G connected edge cloud leveraging a fully programmable network empowered by unprecedented visibility, verification and closed-loop control capabilities to fuel innovation while helping to secure future network infrastructure. Universities are executing a research agenda enabled by deep programming methods to explore and create verification and closed-loop control.
ONF’s Aether (an open source Private 4G/5G Connected-Edge-Cloud as a Service platform) is being used as the foundation for the Pronto research. The research will be iteratively upstreamed back into the Aether platform to help move the industry towards robust and secure programmable networks. Universities are executing a research agenda enabled by deep programming methods to explore and create verification and closed-loop control.
Stanford University, Cornell University, Princeton University and the Open Networking Foundation, are jointly collaborating on Pronto, which is in part funded by a $30M grant from DARPA.
Protein molecular function prediction
As a graduate student with Dr. Michael Jordan, collaborating with Dr. Steven Brenner, I created a statistical methodology, SIFTER (Statistical Inference of Function Through Evolutionary Relationships), to capture how protein molecular function evolves within a phylogeny in order to accurately predict function for unannotated proteins, improving over existing methods that use pairwise sequence comparisons. We relied on the assumption that function evolves in parallel with sequence evolution, implying that phylogenetic distance is the natural measure of functional divergence. In SIFTER, molecular function evolves as a first-order Markov chain within a phylogenetic tree. Posterior probabilities are computed exactly using message-passing, with an approximate method for large or functionally diverse protein families; model parameters are estimated using generalized expectation maximization. Functional predictions are extracted from protein-specific posterior probabilities for each function. I applied SIFTER to a genome-scale fungal data set, which included families of proteins from 46 fully-sequenced fungal genomes, and SIFTER substantially outperformed state-of-the-art methods in producing correct and specific predictions.
RealPigment: Paint Compositing by Example
The color of composited pigments in digital painting is generally computed one of two ways: either alpha blending in RGB, or the Kubelka-Munk equation (KM). The former fails to reproduce paint like appearances, while the latter is difficult to use. We present a data-driven pigment model that reproduces arbitrary compositing behavior by interpolating sparse samples in a high dimensional space. The input is an of a color chart, which provides the composition samples. We propose two different prediction algorithms, one doing simple interpolation using radial basis functions (RBF), and another that trains a parametric model based on the KM equation to compute novel values. We show that RBF is able to reproduce arbitrary compositing behaviors, even non-paint-like such as additive blending, while KM compositing is more robust to acquisition noise and can generalize results over a broader range of values.
Resource allocation for cloud services
Multi-tenant resource fairness for shared datacenter services
SEEK: Search Engine for Heterogeneous Human Gene-Expression Compendia
SEEK: Search Engine for Heterogeneous Human Gene-Expression Compendia
Service-centric networking
Service-centric networking with Serval
Software-defined networking
Software-defined networking
Statistical Analysis of Genetic Association Studies
Survey-based GWAS. Genome-wide association studies (GWAS) identify genetic variants that are associated with the occurrence of a complex phenotype or disease in a set of individuals. Many phenotypes are difficult to quantify with a single measure. I am building methods for conducting GWAS using survey data as the phenotype. Standard dimensionality reduction techniques are not effective for scaling down the size of the data because the resulting phenotype summaries were not interpretable. In prior work, we applied SFA and found that the sparse solution had phenotypic interpretations for all of the factors, and genetic associatons for a number of phenotypes. Our current work goes well beyond this model for greater robustness and inference of the number of factors from the underlyng data.
Structural Modeling
Prevalent computer architecture modeling methodologies are prone to error, make design-space exploration slow, and create barriers to collaboration. The Structural Modeling Project addresses these issues by providing viable structural modeling methodologies to the community. The Liberty Simulation Environment showcases this approach and serves as the core of a new international standardization effort called Fraternité.
Stylized Keyframe Animation of Fluid Simulations
We present a method that combines hand-drawn artwork with fluid simulations to produce animated fluids in the visual style of the artwork. Given a fluid simulation and a set of keyframes rendered by the artist in any medium, our system produces a set of in-betweens that visually matches the style of the keyframes and roughly follows the motion from the underlying simulation. Our method leverages recent advances in patch-based regenerative morphing and image melding to produce temporally coherent sequences with visual fidelity to the target medium. Because direct application of these methods results in motion that is generally not fluid-like, we adapt them to produce motion closely matching that of the underlying simulation. The resulting animation is visually and temporally coherent, stylistically consistent with the given keyframes, and approximately matches the motion from the simulation. We demonstrate the method with animations in a variety of visual styles.
SyLVer: Synthesis, Learning, and Verification
Algorithmic verification techniques have made tremendous progress by leveraging advancements in decision procedures based on SAT/SMT solvers. The project aims to develop techniques that improve their scalability for program verification and synthesis, by combining deductive learning with learning on data and examples.
THRIFT
As chip densities and clock rates increase, processors are becoming more susceptible to error-inducing transient faults. In contrast to existing techniques, the THRIFT Project advocates adaptive approaches that match the changing reliability and performance demands of a system to improve reliability at lower cost. This project introduced the concept of software-controlled fault tolerance.
Understanding how eQTLs work by looking across eQTL studies, cell types, and regulatory element data
As part of the GTEx consortium, and in collaboration with Casey Brown, we have conducted large-scale replication studies across eleven studies in seven tissue types. We have overlaid these results onto regulatory element data to enable a much more profound mechanistic understanding of eQTL data by looking at where the eQTLs and also the cell type specific eQTLs are co-located with specific cis-regulatory elements.
We are currently developing statistical models for understanding eQTLs and variants that influence mRNA isoform levels in RNA-seq data. We are also working on predictive models for eQTLs across tissue types and models that consider replication in trans-eQTLs.
Untrusted cloud services
Untrusted cloud storage and social networks
VELOCITY Compiler
The VELOCITY Compiler Project aims to address computer architecture problems with a new approach to compiler organization. This compiler organization, embodied in the VELOCITY Compiler (and derivative run-time optimizers), enables true whole-program scope, practical iterative compilation, and smarter memory analysis. These properties make VELOCITY better at extracting threads, improving reliability, and enhancing security.
Verified Software Toolchain
The software toolchain includes static analyzers to check assertions about your program; optimizing compilers to translate your program to machine language; operating systems and libraries to supply context for your program. The Verified Software Toolchain project assures with machine-checked proofs that the assertions claimed at the top of the toolchain really hold in the machine-language program, running in the operating-system context.