# FPO

## "Tony" Runzhe Yang FPO

"Tony" Runzhe Yang will present his FPO "From mind to machine: neural circuits, learning algorithms, and beyond." on Tuesday, September 26, 2023 at 12:00 PM in CS 105 and Zoom.

Location: Zoom Link: https://princeton.zoom.us/j/91084214303

The members of Runzhe’s committee are as follows:

Examiners: H. Sebastian Seung (Adviser), Karthik Narasimhan (Adviser), Mala Murthy

Readers: Ryan Adams, Dmitri Chklovskii (Flatiron Institute; NYU)

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

This thesis explores diverse topics within computational neuroscience and machine learning. The work begins by examining the organization of biological neuronal circuits re- constructed by electron microscopy. First, our study of neural connectivity patterns in the mouse primary visual cortex illustrates the necessity for refined understanding of the non- random features of cortical connections, challenging conventional perspectives. Second, in the larval zebrafish hindbrain, our novel discovery highlights overrepresented three-cycles of neuron, an observation unprecedented in electron microscopy-reconstructed neuronal wiring diagrams. Additionally, I present an exhaustive compilation of motif statistics and network characteristics for the complete adult Drosophila brain. These efforts collectively enrich our understanding of the intricate wiring diagram of neurons, offering new insights into the organizational principles of biological brains.

In the second part of the thesis, I introduce three distinct machine learning algorithms. The first algorithm, a biologically plausible unsupervised learning algorithm, is implemented within artificial neural networks using Hebbian feedforward and anti-Hebbian lateral connections. The theoretical discourse explores the duality and convergence of the learning process, connecting with the generalized concept of the “correlation game” principle. The second algorithm presents a novel multi-objective reinforcement learning approach, adept at managing real-world scenarios where multiple potentially conflicting criteria must be optimized without predefined importance weighting. This innovation allows the trained neural network model to generate policies that align optimally with user- specified preferences across the entire space of preference. The third algorithm employs a cognitive science-inspired learning principle for dialog systems. The designed system engages in negotiation with others, skillfully inferring the intent of the other party and predicting how its responses may influence the opponent’s mental state.

Collectively, these contributions shed light on the complexities of neural circuit organization and offer new methodologies in machine learning. By examining intelligence from both biological and computational perspectives, the thesis presents insights and reference points for future research, contributing to our growing understanding of intelligence.

## Xiaoqi Chen FPO "Designing Compact Data Structures for Network Measurement and Control" in Friend Center 125 (and Zoom).

Xiaoqi (Danny) Chen will present his FPO "Designing Compact Data Structures for Network Measurement and Control" on Wednesday, October 4, 2023 at 2pm in Friend Center 125 (and Zoom).

Zoom link: https://princeton.zoom.us/j/96617518355

The members of his committee are as follows:

Examiners: Jennifer Rexford (adviser), David Walker, and Maria Apostolaki

Readers: Mark Braverman and Minlan Yu (Harvard University)

Talk abstract follows below. All are welcome to join.

Abstract:

This dissertation explores the implementation of network measurement and closed-loop control in the data plane of high-speed programmable switches. After discussing the algorithmic constraints imposed by the switch pipeline architecture, primarily stemming from the requirement of high-speed processing, we share our experience in tailoring algorithms for the data plane. Initially, we focus on efficient measurement algorithms, and present two works for detecting heavy hitters and executing multiple distinct-count queries; both require designing novel approximate data structures to meet the tight memory access constraints. Subsequently, we pivot towards using real-time, closed-loop control in the data plane for performance optimization, and present two works for mitigating microbursts and enforcing fair bandwidth limits; both require approximated computation and exploit the sub-millisecond reaction latency unattainable through conventional control planes. We hope by sharing our experience and techniques, which are widely applicable to various algorithms and other data-plane hardware targets, we can lay the foundation for future innovations in the field of network programming for researchers and practitioners alike.

## Sven Dorkenwald FPO

Sven Dorkenwald will present his FPO "Analysis of neuronal wiring diagrams" on Friday, August 18, 2023 at 1:00 PM in Neuroscience Institute A32 and Zoom.

Location: Zoom link:https://princeton.zoom.us/j/97076372854

The members of Sven’s committee are as follows:

Examiners: H. Sebastian Seung (Adviser), Michael Freedman, Mala Murthy

Readers: Felix Heid, Viren Jain (Google)

A copy of his thesis is available upon request. Please email gradinfo (@cs.princeton.edu) if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

To understand how the brain works, one must know its neurons and their connections. Maps of synaptic connections between neurons can be created by acquiring and analyzing electron microscopic (EM) brain images, but until recently, even the largest datasets were insufficient to map full neuronal circuits in most organisms or even small brains. Advances in automated EM image acquisition and analysis have now given rise to neuronal reconstructions of mammalian brain circuits and entire insect brains. Errors in these automated reconstructions must still be corrected through proofreading. This thesis introduces new methods to bridge the gap between automated neuron reconstructions and the analysis of neuronal wiring diagrams. First, it presents a proofreading infrastructure that facilitates correction of whole neurons in up to petascale (∼ 1mm3 brain tissue) datasets by communities of scientists and proofreaders. Second, it embeds this proofreading system into an analysis infrastructure to enable queries of the connectome data while it is actively being edited. Third, it presents a self-supervised learning technique for efficient inference of semantic information which is crucial for circuit analyses. These methods have already been used by hundreds of users and enabled numerous neuroscientific analyses. Additionally, this thesis presents the analysis of the largest connectivity map to date between cortical neurons of a defined type (L2/3 pyramidal neurons in mouse visual cortex) which identified constraints on the learning algorithms employed by the cortex. Finally, this thesis reports on the completion of the wiring diagram of the Drosophila melanogaster brain containing ∼130,000 neurons and describes how this work enabled it. The Drosophila connectome was created through a years-long community-based effort and is an incredible resource for the science community and a milestone for connectomics on the way to large mammalian brains. The methods presented in this thesis will be useful for the analysis of larger brain samples as we aim for connectomes of whole mammalian brains.

## Mingzhe Wang FPO

Mingzhe Wang will present his FPO "Deep Learning for Automated Theorem Proving" on Monday, August 21, 2023 at 3:00 PM in CS 301 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/3294749300

The members of Mingzhe’s committee are as follows:

Examiners: Jia Deng (Adviser), Danqi Chen, Karthik Narasimhan

Readers: Andrew Appel, Sanjeev Arora

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

Automated theorem proving is a fundamental AI task with broad applications to the formalization of mathematics, software and hardware verification, and program synthesis. Deep learning has emerged as a promising approach for guiding theorem provers. In this dissertation, we present our work on developing deep learning approaches for automated theorem proving.

First, we propose FormulaNet, a deep learning-based approach to the problem of premise selection. FormulaNet represents a logical formula as a graph that is invariant to variable renaming and then embeds the graph into a vector via a novel graph embedding method that preserves the information of edge ordering. Our approach achieves state-of-the-art results on the HolStep dataset, improving the classification accuracy from 83% to 90.3%.

Next, we propose MetaGen, a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover. On real-world tasks, we demonstrate that synthetic data from our approach improves the theorem prover and advances the state of the art of automated theorem proving in Metamath.

Finally, we extend our theorem generator to the interactive theorem prover Lean. We propose TermGen, a neural generator that automatically synthesizes theorems and proofs in Lean by directly constructing proof terms. We also propose a method of expert iteration for training TermGen to synthesize short theorems. At last, we present a case study of generating formal specifications of while-loop programs through a rule-based theorem generator.

## Qinshi Wang FPO "Formally verifiable data plane programming" in CS 402

Adviser: Appel

Readers: David Walker, Nate Foster (Cornell)

Examiners: Appel, Rexford, Kincaid

## Zhenyu Song will present his FPO "Optimizing Content Distribution Network Caches with Machine Learning" on Tuesday, August 1, 2023 at 1pm in CS 402.

Zhenyu Song will present his FPO "Optimizing Content Distribution Network Caches with Machine Learning" on Tuesday, August 1, 2023 at 1pm in CS 402.

The members of his committee are as follows:

Readers: Ravi Netravali Daniel S. Berger (Microsoft & UW)

Examiners: Kai Li (Adviser), Wyatt Lloyd (Adviser), and Ryan Adams

All are welcome to attend.

Abstract:

Content Distribution Networks (CDNs) play a pivotal role in Internet traffic. A key part of this caching mechanism is the eviction algorithm that handles the replacement of old cached objects. The effectiveness of the eviction algorithm significantly influences CDN performance.

This dissertation explores the application of machine learning (ML) to optimize cache eviction algorithms in CDNs. The central questions addressed in this work are: how to utilize ML to devise an eviction algorithm that surpasses existing heuristics on byte miss ratio, and how to mitigate the CPU overhead while enhancing the robustness of a learned cache in large-scale deployment.

Two major challenges faced in the design of a learning-based cache eviction algorithm include heterogeneous user access patterns across different locations and times, and computational and space overheads. To address these challenges, we developed two ML-based eviction algorithms, Learning Relaxed Belady (LRB) and Heuristic Aided Learned Preference (HALP). LRB is the first CDN cache algorithm to directly approximate the Belady MIN (oracle) algorithm by learning access patterns, providing a significant improvement over traditional eviction algorithms. It demonstrated a WAN traffic reduction of 4-25% across six production CDN traces in our simulation. HALP, on the other hand, achieves low CPU overhead and robust DRAM byte miss ratio improvement by augmenting a heuristic policy with ML. It has shown to reduce DRAM byte miss during peak by an average of 9.1%, with a modest CPU overhead of 1.8%, while deployed in YouTube CDN production clusters.

## Jiaxin Guan FPO

Jiaxin Guan will present his FPO "Cryptography against Space-Bounded Adversaries" on July 12, 2023 at 1pm in CS 105

The members of his committee are as follows:

Examiners: Mark Zhandry (Advisor), Gillat Kol, Ran Raz

Readers: Mark Braverman, Matt Weinberg

Abstract:

Traditionally in cryptography, we consider adversaries that are time-bounded by making specific computational assumptions. In this thesis, I examine the scenario where the adversaries are space-bounded instead, i.e. the adversary can only use up to a certain amount of memory bits. Under these scenarios, we can achieve either unconditional security properties or never-before-possible results.

First, we start off with Maurer's Bounded Storage Model. It is a model where the adversary abides to a certain memory bound throughout the entire game. Under this model, I show simple constructions of a key-agreement protocol, a commitment scheme, and an oblivious transfer protocol, all based on Raz's lower bound on parity learning. These constructions have several advantages over prior work, including an improved number of rounds and enhanced correctness.

Subsequently, I show that if we combine computational assumptions with the bounded storage model, we can achieve results that are not possible in the standard model. I define a new object named Online Obfuscation, and show how to use it to construct disappearing encryption and signature schemes where the ciphertext and the signature effectively "disappear" after transmission.

Lastly, I notice that in the Bounded Storage Model, the memory bound on the adversary is enforced throughout the entire game. One can imagine a variant where the bound is only enforced for long-term storage, allowing the adversary to use an arbitrary amount of memory during the transmission phase. I define incompressible cryptography to capture this intuition and show constructions using randomness extractors and other cryptographic tools. Furthermore, I show that under the multi-user setting, we can still achieve desired incompressible security if we simply replace the randomness extractor with a special "somewhere randomness extractor".

## Clayton Thomas FPO (CS 105)

Clayton Thomas will present his FPO "Explainable Mechanism Design" on Wednesday, July 5, 2023 at 10am in CS 402 and Zoom.

Please see below for the members of his committee:

Readers: Huacheng Yu, Ran Raz

Examiners: Matt Weinberg (Adviser), Mark Braverman, and Leeat Yariv (Princeton Economics)

https://princeton.zoom.us/j/97275041458

Meeting ID: 972 7504 1458

Talk abstract follows below. All are welcome to join.

Abstract: Mechanism design strives to construct protocols that achieve desirable outcomes in the presence of strategic behavior. Algorithmic mechanism design additionally factors in notions from theoretical computer science—the study of how efficiently these algorithms can communicate between and be implemented on computers. This thesis largely takes a different direction. We study how to effectively explain these algorithms and protocols to real humans. In Part I, we directly confront the objective of better explaining a certain canonical and widely-deployed class of algorithms, namely, matching mechanisms. To begin, we study certain prior frameworks for explanation, and find them to be very restrictive in this setting. Thus, we introduce a new framework for describing these algorithms to participants in a fundamentally different way, with the goal of exposing strategy proofness, a crucial property dictating one’s optimal behavior in these mechanisms. We propose new descriptions, conduct a laboratory experiment on real participants to test our framework, and prove rigorous theorems about the limits of simple descriptions in our framework (according to context-relevant formal notions of simplicity). Then, we investigate a host of additional types of descriptions and explanations using the theoretical tools we have developed.

In Part II, we investigate other novel directions in algorithmic mechanism design that stem from running mechanisms in the real world. We provide two new, worst case separations in the communication requirements of certain canonical mechanism

design problems: first, for incentivizing mechanisms vs. simply computing them; second, for providing strong incentive grantees vs. providing weak ones. We provide a framework for running more complex mechanisms with computationally limited bidders; this framework allows for a broad class of reasonable behaviors of the bidders, and guarantees good performance under modest computational resources.

In Part III, we investigate perhaps more-traditional properties of matching mechanisms, the vital class of mechanisms studied in Part I. We provide new proofs of classical results on both the structure and dynamics of these mechanisms—some of

these proofs are far simpler than known ones. Finally, we investigate a new random distribution over the preferences of agents in these mechanisms, characterizing the effect of heterogeneous attractiveness across different agents.

## Manik Dhar FPO (CS 105)

Manik Dhar will present his FPO "Beyond the polynomial method: Kakeya sets over finite rings and high dimensional variants" on Friday, June 30, 2023 at 12pm in CS 105.

The members of his committee are as follows: Readers: Ran Raz and Huacheng Yu; Examiners: Zeev Dvir (adviser), Bernard Chazelle, and Noga Alon (Math Department)

All are welcome to attend. Abstract follows below:

A Kakeya Set in Fn is a set that contains a line in every direction. It has been known for over a decade that such sets must be large. This goes back to Dvir’s proof of the finite field Kakeya conjecture as posed by Wolff. Dvir’s proof uses the polynomial method to solve this problem. The problem over finite fields also has applications in psuedorandomness and is crucially used in the construction of currently state-of-the-art seeded mergers and extractors. Wolff’s motivation was to pose a possibly simpler problem whose resolution might help with solving the notoriously difficult Euclidean Kakeya conjecture. The Euclidean Kakeya problem also has many connections with analysis, PDEs, and combinatorics. This thesis will look at two generalizations of the finite field Kakeya problem. One where Fq is replaced by the ring Z/N Z and another where we look at higher dimensional analogues of this problem. In both cases, the polynomial method does not work - as over Z/N Z polynomials do not represent all functions and in the other case a degree d polynomial can have much larger than d roots over a higher dimensional affine flat. We will cover a series of results that solve these problems by extending and in some cases going beyond the polynomial method. We will also cover recent applications of the higher dimensional Kakeya problem to give a tight analysis of the behaviour of linear hashes. At a high level, we show that if we pick subspaces of large enough dimension in comparison to the size of a set then for a large fraction of them their every shift intersects with the set in close to the expected number of points.

## Diana Cai FPO

Diana Cai will present her FPO "Probabilistic Inference When the Model is Wrong" on Wednesday, July 5, 2023 at 3:00 PM in COS 105.

Location: COS 105

The members of Diana’s committee are as follows:

Examiners: Ryan Adams (Adviser), Barbara Engelhardt (Adviser), Olga Russakovsky

Readers: Tom Griffiths, Tamara Broderick (MIT)

A copy of her thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend her talk.

Abstract follows below:

By simplifying complex real-world phenomena, probabilistic methods have proven able to accelerate applications in discovery and design. However, classical theory often evaluates models under the assumption that they are perfect representations of the observed data. There remains the danger that these simplifications might sometimes lead to failure under real-world conditions. This dissertation identifies popular data analyses that can yield unreliable conclusions—and in some cases ones that are arbitrarily unreliable—under such “misspecification.” But we also show how to practically navigate misspecification. To begin, we consider clustering, a mainstay of modern unsupervised data analysis, using Bayesian finite mixture models. Some scientists are interested not only in finding meaningful groups of data but also in learning the number of such clusters. We provide novel theoretical results and empirical studies showing that, no matter how small the misspecification, some common approaches, including Bayesian robustness procedures, for learning the number of clusters give increasingly wrong answers as one receives more data. But using imperfect models need not be hopeless. For instance, we consider a popular Bayesian modeling framework for graphs based on the assumption of vertex exchangeability. A consequence of this assumption is that the resulting graph models generate dense graphs with probability 1 and are therefore misspecified for sparse graphs, a common property of many real-world graphs. To address this undesirable scaling behavior, we introduce an alternative generative modeling framework and prove that it generates a range of sparse and dense scaling behaviors; we also show empirically that it can generate graphs with sparse power law scaling behavior. Finally, we consider the case where a researcher has access to a sequence of approximate models that become arbitrarily more complex at the cost of more computation, which is common in applications with simulators of physical dynamics or models requiring numerical approximations of some fidelity. In this case, we show how to obtain estimates as though one had access to the most complex model. In particular, we propose a framework for constructing Markov chain Monte Carlo algorithms that asymptotically simulates from the most complex model while only ever evaluating models from the sequence of approximate models.