Quick links

FPO

Kathy Chen FPO

Date and Time
Tuesday, April 9, 2024 - 10:00am to 12:00pm
Location
Carl Icahn Lab 280
Type
FPO

Kathy Chen will present her FPO "Decoding the sequence basis of gene regulation" on Tuesday, April 9, 2024 at 10:00 AM in Icahn 280 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/95565860844?pwd=LzNkcHZ6bnVackxZRnZJSitXdW9NUT09

The members of Kathy’s committee are as follows:
Examiners: Olga Troyanskaya (Adviser), Mona Singh, Kai Li
Readers: Ryan Adams, Jian Zhou (UT Southwestern)

Everyone is invited to attend her talk.  

Abstract follows below:

Deciphering the regulatory code of gene expression is a critical challenge in human genetics, instrumental to unlocking the potential of personalized medicine. Modern experimental technologies have resulted in an abundance of high-dimensional genome-wide data, revealing the complex system of epigenetic interactions encoded in the genome. The development of computational approaches which can leverage this vast data to model chromatin interactions globally offer a new understanding of how genomic sequences specify regulatory functions. Specifically, sequence-based deep learning models have become the de facto standard for learning the functional properties encoded in DNA sequences based on large sequencing datasets. These models are powerful tools for interpreting molecular and phenotypic effects, capable of predicting the impact of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterizing their consequences beyond what is tractable from experiments and quantitative genetics alone.

In this thesis, we present two deep learning-based sequence models, which predict different epigenetic properties of the genome that contribute to transcriptional regulation. First, Sei is a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequences and variants based on diverse regulatory activities, such as cell type-specific enhancers.

Next, we developed a model Hedgehog, which enables the quantification of variation on methylation sites. Hedgehog predicts 296 continuous-valued methylation profiles across a range of cell types and tissues. Hedgehog is complementary to Sei and reveals new insights into the relationship between DNA methylation and other epigenetic modifications.

Finally, we show how deep learning-based methods can be applied to elucidate the regulatory basis of human health and disease. Specifically, we use Sei to study the contribution of noncoding mutations in cancer. Collectively, we demonstrate novel frameworks for modeling the sequence dependencies of the epigenome and the capability of such approaches to delineate the regulatory mechanisms underlying complex diseases.

Angelina Wang FPO

Date and Time
Monday, May 6, 2024 - 2:30pm to 4:30pm
Location
Computer Science 402
Type
FPO

Angelina Wang will present her FPO "Operationalizing Responsible Machine Learning: From Equality Towards Equity" on Monday, May 6, 2024 at 2:30 PM in CS 402.

Location: CS 402

The members of Angelina’s committee are as follows:
Examiners: Olga Russakovsky (Adviser), Arvind Narayanan, Solon Barocas (Cornell)
Readers: Aleksandra Korolova, Janet Vertesi

A copy of her thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend her talk.

Abstract follows below:

With the widespread proliferation of machine learning, there arises both the opportunity for societal benefit as well as the risk of harm. Approaching responsible machine learning is challenging because technical approaches may prioritize a mathematical definition of fairness that correlates poorly to real-world constructs of fairness due to too many layers of abstraction. Conversely, social approaches that engage with prescriptive theories may produce findings that are too abstract to effectively translate into practice. In my research, I bridge these approaches and utilize social implications to guide technical work. I will discuss three research directions that show how, despite the technically convenient approach of considering equality acontextually, a stronger engagement with societal context allows us to operationalize a more equitable formulation. First, I will introduce a dataset tool that we developed to analyze complex, socially-grounded forms of visual bias. Then, I will provide empirical evidence to support how we should incorporate societal context in bringing intersectionality into machine learning. Finally, I will discuss how in the excitement of using LLMs for tasks like human participant replacement, we have neglected to consider the importance of human positionality. Overall, I will explore how we can expand a narrow focus on equality in responsible machine learning to encompass a broader understanding of equity that substantively engages with societal context

Ksenia Sokolova FPO

Date and Time
Monday, March 18, 2024 - 10:30am to 12:30pm
Location
252 Nassau Street Conference room
Type
FPO

Ksenia Sokolova will present her FPO "Deep Learning for Sequence-Based Gene Expression Prediction" on Monday, March 18, 2024 at 10:30 AM in the 252 Nassau Street Conference room.

Location: 252 Nassau Street Conference room

The members of Ksenia’s committee are as follows:
Examiners: Olga Troyanskaya (Adviser), Kai Li, Ellen Zhong
Readers: Mona Singh, Yuri Pritykin

Everyone is invited to attend her talk.

Abstract follows below:

Human biology is rooted in highly specialized cell types programmed by a common genome, 98% of which is outside of genes. While genetic variation in the enormous noncoding space is linked to the majority of disease risk, the impact of this variation is poorly understood. The recent advances in sequencing technology made it possible to perform whole genome sequencing of the large cohorts, uncovering many variants per individual. A crucial challenge is to understand the collective impact of these variants on gene expression across varied human cell types and their subsequent roles in disease progression.

This dissertation begins by tackling the challenge of associating noncoding genetic variants with changes in gene expression in primary human cell types. We introduce ExPectoSC, an atlas of modular deep-learning-based models for predicting cell-type-specific gene expression directly from sequence. With models spanning 105 primary human cell types across seven organ systems, it offers a detailed insight into the effect of variation. The resulting atlas of sequence-based gene expression and variant effects is publicly available in a user-friendly interface and readily extensible to any primary cell types. We follow this work with an example application of the ExpectoSC to the study of glomerular diseases, a major cause of end stage renal disease in the US. Despite having similar clinical presentations, these diseases are known for their heterogeneity and variable patient outcomes. By integrating whole-genome sequencing data with ExPectoSC's predictions, we construct comprehensive gene expression disruption profiles for patients.  4 Finally, we developed a new method for genomic-centered contrastive pre-training, called cGen, to improve training of the models from sequence alone in limited-data contexts. Utilizing sequence augmentations, after pre-training cGen generates unsupervised embeddings that highlight functional clusters and are informative of gene expression in the absence of any labeled information.

Together, these contributions highlight the power of computational approaches to decode the noncoding genome, offering new avenues for the diagnosis, prognosis, and treatment of human diseases.

Uthsav Chitra FPO

Date and Time
Friday, March 1, 2024 - 10:00am to 12:00pm
Location
Computer Science 302
Type
FPO

Uthsav Chitra will present his FPO "Algorithms for understanding the spatial and network organization of biological systems" on Friday, March 1, 2024 at 10:00 AM in COS 302 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/99220301104

The members of Uthsav’s committee are as follows:
Examiners: Ben Raphael (Adviser), Bernard Chazelle,Yuri Pritykin
Readers: Ellen Zhong, Fei Chen (Harvard)

Everyone is invited to attend his talk.

Abstract follows below:

Biological systems are characterized by their spatial organization and network interactions at a hierarchy of scales. For example, the spatial arrangement of different cells in a tissue underlies fundamental multicellular processes such as tissue differentiation and disease response, while interactions between genes/proteins comprise the biological pathways that regulate cellular state and function. Recent developments in high-throughput sequencing have enabled the systematic analysis of spatial and network processes in many complex biological systems including the brain and tumor microenvironment. However, such analyses are challenged by high levels of sparsity and/or noise in high-throughput sequencing datasets—underscoring the need for principled and rigorous computational methods for biological data analysis.

In this dissertation, we present a collection of mathematical frameworks and machine learning algorithms for modeling the spatial and network organization of biological systems. First, we derive a model of discrete and continuous spatial variation in gene expression. We present two algorithms, Belayer and GASTON, which learn the parameters of this model using complex analysis and interpretable deep learning, respectively.

Second, we present a mathematical framework for the identification of altered subnetworks, or subnetworks of a biological interaction network containing genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared to other genes/proteins. We prove that many existing algorithms are statistically biased, resolving the open question of why these algorithms often identify very large subnetworks that are difficult to interpret. We derive two altered subnetwork identification algorithms, NetMix and NetMix2, which we show are asymptotically unbiased and outperform existing approaches in practice.

Finally, we present two frameworks for learning and modeling higher-order interactions. We first derive a statistical framework for learning higher-order genetic interactions from experimental fitness data, unifying decades of existing work in the genetics literature. Then, we derive a theoretical framework for modeling random walks on hypergraphs that provably utilizes higher-order interactions in data, in contrast to many existing hypergraph methods which only utilize pairwise interactions.

Taken together, the approaches in this dissertation provide a theoretical and practical foundation for overcoming the computational challenges of modeling complex biological systems.

Yuan Wang FPO

Date and Time
Monday, February 5, 2024 - 3:00pm to 5:00pm
Location
Carl Icahn Lab 200
Type
FPO

Yuan Wang will present her FPO "Systematic analysis of cellular and immune system responses for therapy development" on Monday, February 5, 2024 at 3:00 PM in Carl C. Icahn Laboratory - 200 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/3225545420?omn=94214228673

The members of Yuan’s committee are as follows:
Examiners: Olga Troyanskaya (Adviser), Mona Singh, Ellen Zhong
Readers: Yuri Pritykin, Wendell Lim (UCSF)

A copy of her thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.
 
Everyone is invited to attend her talk.
 
Abstract follows below:
The human body is a complex program where many important regulatory molecules are reused throughout the body, but their functions are specific to particular locations and biological contexts. This notion of bio-specificity is especially valuable in the context of immunology, which can be harnessed to develop effective diagnostics and therapeutics for human diseases. However, given the extensive number of potential candidate molecules, experimental analysis of the functions of all target genes is not feasible. Moreover, integrative interpretation of public omics datasets remains challenging due to substantial noise and heterogeneity present in the data. Systematically harmonizing and mapping transcriptional regulation and gene expression is thus of critical importance.

In this thesis we developed and applied two computational frameworks, SPEEDI and TissueGPS, which harness large-scale heterogeneous omics datasets that systematically integrate and analyze epigenetic and transcriptional omics data to understand immune system dynamics. We applied these frameworks in the context of cellular and systemic immunity. First, we systematically study the systemic immune transcriptional responses of vaccinated subjects to SARS-CoV-2 infections. Second, by investigating the immune landscape of infected individuals, we predict epigenetic-informed gene signatures to assist rapid diagnosis for infectious disease assessments. Third, we designed cell-based therapeutics to arm the immune system with customized sensing circuits to detect and treat central nervous system disorders, such as malignant tumors, with reduced systemic off-target toxicity. Collectively, our methods offer novel frameworks for interrogating human diseases with precision and reproducibility across immune contexts.

Arushi Gupta FPO

Date and Time
Thursday, February 8, 2024 - 1:30pm to 3:30pm
Location
Computer Science 402
Type
FPO

Arushi Gupta will present her FPO "Understanding the Role of Data in Model Decisions" on Thursday, February 8, 2024 at 1:30 PM in CS 402.

Location: CS 402

The members of Arushi’s committee are as follows:
Examiners: Sanjeev Arora (Adviser), Elad Hazan, Ryan Adams
Readers: Karthik Narasimhan, Tom Griffiths

A copy of her thesis is available upon request. Please email  if you would like a copy of thethesis.

Everyone is invited to attend her talk.

Abstract follows below:

As neural networks are increasingly employed in high stakes applications such as criminal justice, medicine, etc, [1] it becomes increasingly important to understand why these models make the decisions they do. For example, it is important to develop tools to analyze whether models are perpetuating harmful demographic inequalities they have found in their training data in their future decision making [2]. However, neural networks typically require large training sets, have “black-box” decision making, and have costly retraining protocols, increasing the difficulty of this problem. This work considers three questions. Q1) What is the relationship between the elements of an input and the models’ decision? Q2) What is the relationship between the individual training points and the model’s decision. And finally Q3) to what extent do there exist (efficient) approximations that would allow practitioners to predict how model performance would change given different training data, or a different training protocol.

Part I addresses Q1 for masking saliency methods. These methods implicitly assume that grey pixels in an image are “uninformative.” We find experimentally that this assumption may not always be true, and define “soundness,” which measures a desirable property of a saliency map.

Part II addresses Q2 and Q3 in the context of influence functions, which aim to approximate the effect of removing a training points on the model’s decision. We use harmonic analysis to examine a particular type of influence method, namely datamodels, and find that there is a relationship between the coefficients of the datamodel, and the Fourier coefficients of the target function.

Finally, Part III addresses Q3 in the context of test data. First, we assess whether held out test data is necessary to approximate the outer loop of meta learning, or whether recycling training data constitutes a sufficient approximation. We find that held out test data is important, as it learns representations that are low rank. Then, inspired by the PGDL competition [3] we investigate whether GAN generated data, despite well known limitations, can be used to approximate generalization performance when no test or validation set is available, and find that they can.

Jeff Helt FPO (CS 105)

Date and Time
Friday, February 16, 2024 - 1:00pm to 3:00pm
Location
Computer Science Small Auditorium (Room 105)
Type
FPO
Speaker
Jeff Helt
Host
Jeff Helt

Committee: Wyatt Lloyd (advisor), Amit Levy, Ravi Netravali (readers), Mike Freedman, Zak Kincaid (examiners)

details to follow

Tim Alberdingk Thijm FPO "Modular Control Plane Verification" CS 105 & Zoom

Date and Time
Friday, February 9, 2024 - 1:30pm to 3:30pm
Location
Not yet determined.
Type
FPO

"Modular Control  Plane Verification" 

Zoom link: 
https://princeton.zoom.us/j/95302742818

Naorin Hossain FPO "Navigating Emerging Complexities of Modern Systems: Advancements in Automated Verification and Security Techniques" in CS 302

Date and Time
Thursday, January 11, 2024 - 9:00am to 11:00am
Location
Computer Science 302
Type
FPO

Naorin Hossain will present her FPO "Navigating Emerging Complexities of Modern Systems: Advancements in Automated Verification and Security Techniques" in CS 302 on January 11, 2024 at 9am in CS 302.

The members of her committee are as follows: 

Margaret Martonosi (advisor, examiner)
Aarti Gupta (reader)
Pradip Bose (reader, IBM)
Amit Levy (examiner)
David August (examiner)

The increasing design complexity at the end of Moore’s Law and Dennard Scaling presents a new challenge for implementing modern systems correctly and securely. This dissertation presents focused efforts for advancing automated verification and security techniques to mitigate incorrect implementations and attackers, respectively. It explores two avenues: 1) correctness verification in virtual memory systems, where shared memory interactions occur across hardware and system-level events, and 2) enhancing security of heterogeneous systems-on-a-chip (SoCs) with anomalous activity detection and localization. In the first thrust, this dissertation develops tools for a systematic, end-to-end formalized approach for verifying and validating virtual memory system implementations using memory transistency model (MTM) specifications. First, I propose a novel formal language for reasoning about MTMs, enabling specification of the software-visible impacts of complex hardware- and system-level interactions that occur in parallel systems with virtual memory. Next, I present empirical MTM testing techniques for verification and validation on real system implementations using specialized stress measures to coax out MTM-specific bugs. The proposed techniques can alleviate complex system verification efforts by automating detection of bugs and vulnerabilities at the hardware-system interface of virtual memory systems. The second thrust of this dissertation develops processes to enhance security of heterogeneous SoC designs. Sophisticated attacks can take down parts of the SoC, resulting in diminished performance gains and failed processes. To design built-in defenses against such attacks, this dissertation proposes the use of network-on-chip (NoC) based hardware counters to monitor ongoing SoC activity in a holistic fashion. With these counters, I develop and demonstrate an anomalous activity detection and localization system to detect and pinpoint availability attacks on SoC components, leveraging machine learning techniques and the inherent interpretability offered by the 3 NoC counters. These techniques enable automated attack and failure detection to be built into the SoC, a particularly valuable feature in the fast-changing heterogeneous SoC design space. Overall, this dissertation advocates for the use of formal and empirical modeling tools to effectively capture complex system behaviors such that vulnerabilities, bugs, and threats can be detected at design time and runtime with automated methods that provide elevated coverage and accuracy.

Udaya Ghai FPO

Date and Time
Monday, December 18, 2023 - 4:00pm to 6:00pm
Location
Computer Science 402
Type
FPO

Udaya Ghai will present his FPO "A Game-theoretic Lens for Robustness in Control" on Monday, December 18, 2023 at 4:00 PM in CS 402.

Location: CS 402

The members of Udaya’s committee are as follows:

Examiners: Elad Hazan (Adviser), Ryan Adams, Naomi Leonard
Readers: Sanjeev Arora, Anirudha Majumdar

A copy of his thesis is available upon request.  Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

The control of dynamical systems is a fundamental problem with a vast array of applications, from robotics to biological engineering. Recently, the game-theoretic primitive of regret minimization has been applied to control, yielding novel instance-optimal performance guarantees in more challenging non-stochastic control settings. This thesis further explores the benefits of a multi-agent perspective of control.

Concretely, we begin with a new algorithm for generating disturbances for controller verification, which relies on recasting the players in the nonstochastic control game. Next, we provide a cooperative multi-agent extension of the nonstochastic control setting, involving a reduction from our multi-agent game to single agent regret minimization. Furthermore, we show new notions of robustness to failure can be attained through this perspective, even in a single-agent setting.

While control is a powerful tool, it relies heavily on knowledge of the dynamics. The final chapters provide two very different approaches to robustness without such a model. The first approach extends the nonstochastic control methodology to model-free reinforcement learning. In an alternative approach, we consider unknown systems with dynamics that are \emph{approximately} linear using tools from classical control theory.

Follow us: Facebook Twitter Linkedin