FPO | Computer Science Department at Princeton University

Dmitry Paramonov will present his FPO "Handling Data Too Large To Handle: On Multi-pass Streaming and Interactive Coding" (Friend 006)

Date and Time

Tuesday, May 7, 2024 - 11:00am to 1:00pm

Location

Friend Center 006

Type

FPO

Dmitry Paramonov will present his FPO "Handling Data Too Large To Handle: On Multi-pass Streaming and Interactive Coding" on May 7th, 2024 at 11am in Friend 006.

His committee is as follows:

Examiners: Gillat Kol (adviser), Ran Raz, Mark Braverman

Readers: Huacheng Yu and Matt Weinberg

Title: Handling Data Too Large To Handle: On Multi-pass Streaming and Interactive Coding

Abstract:

Over the last decades, the world has become increasingly more information-centric and massive

amounts of data, potentially distributed between many different sources, are being processed all the

time. In this thesis, I consider two mechanisms for coping with big data and the distributed nature

of timely tasks.

In Part I, I showcase my work on the streaming setting, where the input to the algorithm is

given as a stream of elements. The algorithm’s goal is to compute a value that depends on the

stream while only utilizing memory that is much smaller than the entire stream. My work in this

field focuses on proving that various fundamental graph problems essentially require the streaming

algorithm to store the entire graph, even if it is allowed to make several passes through the given

stream of edges.

In Part II, I consider error-correcting codes for distributed, interactive settings. Classical error-

correcting codes assume that a sender who has all the information wishes to send it to a receiver over

a noisy channel. However, in many modern, big data applications, the information is distributed

amongst many parties that communicate back-and-forth to compute a value that depends on all

their inputs. My work examines the noise resilience of various such settings. For some models,

we can design error-correcting protocols that allow the encoding of every noiseless protocol by a

noise-resilient protocol with low overhead, whereas for other models, it can be shown that this task

is impossible.

While these two topics appear greatly unrelated, and almost orthogonal to one another, the tools

used to prove results in both turn out to be remarkably similar, with many standard problems and

information theory lemmas being a critical part of both.

Ashwini Raina will present his FPO "Rethinking System Design with Awareness for cross-layer aspects of datacenter storage"

Date and Time

Friday, May 3, 2024 - 10:30am to 12:30pm

Location

Not yet determined.

Type

FPO

Ashwini Raina will present his FPO "Rethinking System Design with Awareness for cross-layer aspects of datacenter storage" on Friday, May 3, 2024 at 10:30am in CS 302.

The members of his committee are as follows:

Examiners: Michael Freedman (adviser), Wyatt Lloyd, Ravi Netravali

Readers: Amit Levy, Asaf Cidon (Columbia University)

Abstract follows below.

Storage is a critical piece of infrastructure in modern web applications. In recent years, storage technologies employed in building such systems have undergone significant evolution, bringing about novel cost-performance trade-offs. Concurrently, datacenter storage architectures have become increasingly layered. Software systems designed based on outdated assumptions of datacenter storage often result in poor cost-performance trade-offs or suffer from suboptimal performance. This dissertation proposes a new design approach for systems, one that incorporates the awareness of cross-layer aspects of datacenter storage, and validates the effectiveness of this approach through two systems. The first system is PrismDB, a novel key-value store that exploits two extreme ends of the spectrum of modern NVMe storage technologies (3D XPoint and QLC NAND) simultaneously. In recent years, emerging storage technologies have focused on divergent goals: better performance or lower cost. Correspondingly, data systems that employ these technologies are typically optimized either to be fast (but expensive) or cheap (but slow). PrismDB take a different approach: by architecting a storage engine to natively utilize two tiers of fast and low-cost storage technologies, it shows that a Pareto-efficient balance between performance and cost can be achieved. The second system is Fusion, an object store for analytics that is optimized for query pushdown on erasure-coded data. Computation pushdown is a widely adopted technique to reduce latency of highly selective queries in modern OLAP cloud database running on disaggregated storage. However, existing pushdown solutions are inefficient on erasure-coded storage since the analytics file objects get partitioned across storage nodes. Consequently, the storage system must reassemble the object across nodes before executing the query, leading to significant network latency. Fusion addresses this problem by co-designing its erasure coding and file placement topologies, taking into account popular analytics file formats (e.g., Parquet).

It employs a novel stripe construction algorithm that prevents the fragmentation of computable units within an object, and minimizes storage overhead during erasure coding. Overall, this dissertation advocates for designing software systems with an awareness of cross-layer aspects in datacenter storage, and demonstrates the benefits of that approach via two systems: PrismDB and Fusion.

Zheng Shi FPO

Date and Time

Friday, May 3, 2024 - 3:00pm to 5:00pm

Location

Not yet determined.

Type

FPO

Zheng Shi will present her FPO "Task-Specific Computational Cameras" on Friday, May 3, 2024 at 3:00 PM in CS 302.

Location: CS 302

The members of Zheng’s committee are as follows:
Examiners: Felix Heide (Adviser), Adam Finkelstein, Olga Russakovsky
Readers: Ellen Zhong, Tian-Ming Fu

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend her talk.

Abstract follows below:
Machine vision, while fundamentally relying on images as inputs, has traditionally treated image acquisition and image processing as two separate tasks. However, traditional image acquisition systems are tuned for aesthetics, photos that please the human eye, and not computational tasks that requires beyond human vision. My research focuses on developing task-specific computational imaging systems to enable the capture of information that extends beyond the capabilities of standard RGB cameras, thereby enhancing the effectiveness of downstream machine vision applications.

This thesis begins with combining multiple imaging modalities to facilitate training on unpaired real-world datasets, addressing the scarcity of supervised training data. We introduce ZeroScatter, a single-image descattering method capable of removing adverse weather effects from RGB captures. By integrating model-based, temporal, and multi-view cues, as well as information contained in gated imager captures, we offer indirect supervision for training on real-world adverse weather captures lacking ground truth. This approach significantly enhances generalizability on unseen data, surpassing methods trained exclusively on synthetic adverse weather data.

Despite its great applicability, relying solely on conventional RGB image inputs limits available information, and requires the model to fill in gaps by generating plausible inferences based on learnt prior, such as when car window wiper obscure objects from the dash cameras. To bypass these constraints, we shift towards computational cameras, and design specialized flat optics to boost the capabilities of cameras for a range of applications.

We first propose a computational monocular camera that optically cloaks unwanted near-camera obstructions. We learn a custom diffractive optical element (DOE) that performs depth-dependent optical encoding, scattering nearby occlusions while allowing paraxial wavefronts emanating from background objects to be focused. This allows us to computationally reconstruct unobstructed images without requiring captures different camera views or hallucinations.

Lastly, we introduce a split-aperture 2-in-1 computational camera that combines application-specific optical modulation with conventional imaging into one system. This approach simplifies complex inverse problems faced by computational cameras, enhances reconstruction quality, and offers a real-time viewfinder experience; paving the way for the adoption of computational camera technology in consumer devices.

Shunyu Yao FPO

Date and Time

Thursday, May 2, 2024 - 10:00am to 12:00pm

Location

Computer Science Small Auditorium (Room 105)

Type

FPO

Shunyu Yao will present his FPO "Language Agents: From Next-Token Prediction to Digital Automation" on Thursday, May 2, 2024 at 10:00 AM in CS 105 and Zoom.

Location: Zoom link: http://princeton.zoom.us/my/shunyuy

The members of Shunyu’s committee are as follows:
Examiners: Karthik Narasimhan (Adviser), Tom Griffiths, Benjamin Eysenbach
Readers: Sanjeev Arora, Tatsunori Hashimoto (Stanford)

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
Building autonomous agents to interact with world lies at the core of artificial intelligence (AI). This thesis introduces “language agents”, a new category of agents that utilize large language models (LLMs) to reason to act, marking a departure from traditional agents via extensive rule design or learning. It is developed in three parts:

Part I motivates the necessity for language agents by introducing a new set of AI problems and benchmarks based on interaction with large-scale, real-world computer environments, such as the Internet or code interfaces. These “digital automation” tasks present tremendous values for alleviating tedious labor and improving our life, yet pose significant challenges for prior agent or LLM methods in decision-making over open-ended natural language and long horizon, calling for new methodology.

Part II lays the methodological foundation for language agents, where the key idea is to apply LLM reasoning for versatile and generalizable agent acting and planning, which also augments LLM reasoning to be more grounded and deliberate via external feedback and internal control. We show language agents can solve a diversity of language and agent tasks (especially digital automation tasks proposed in Part I), with notable improvements over prior LLM-based methods and traditional agents.

Part III consolidates insights from Parts I and II and outlines a principled framework for language agents. The framework provides modular abstractions to organize various LLM-based methods, to understand their gaps from human cognition, and to inspire and develop new methods towards general-purpose autonomous agents.

From foundational empirical tasks and methods to a unifying conceptual framework, this thesis establishes the study of language agents as a distinct and rigorously defined field at the frontier of AI research.

Samuel Barnett FPO

Date and Time

Wednesday, May 1, 2024 - 9:30am to 11:30am

Location

Computer Science 402

Type

FPO

Samuel Barnett will present his FPO "Incorporating human plausibility in single- and multi-agent AI systems" on Wednesday, May 1, 2024 at 9:30 AM in CS 402 .

Location: CS 402

The members of Samuel’s committee are as follows:
Examiners: Ryan Adams (Adviser), Tom Griffiths (Adviser), Benjamin Eysenbach
Readers: Elad Hazan, Karthik Narasimhan

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
As AI systems play a progressively larger role in human affairs, it becomes more important that these systems are built with insights from human behavior. In particular, models that are developed on the principle of human plausibility will more likely yield results that are more accountable and more interpretable, in a way that greater ensures an alignment between the behavior of the system and what its stakeholders want from it. In this dissertation, I will present three projects that build on the principle of human plausibility for three distinct applications:

(i) Plausible representations: I present the Priority-Adjusted Reply for Successor Representations (PARSR) algorithm, a single-agent reinforcement learning algorithm that brings together the ideas of prioritisation-based replay and successor representation learning. Both of these ideas lead to a more biologically plausible algorithm that captures human-like capabilities of transferring and generalizing knowledge from previous tasks to novel, unseen ones.

(ii) Plausible inference: I present a pragmatic account of the weak evidence effect, a counterintuitive phenomenon of social cognition that occurs when humans must account for persuasive goals when incorporating evidence from other speakers. This leads to a recursive, Bayesian model that encapsulates how AI systems and their human stakeholders communicate with and understand one another in a way that accounts for the vested interests that each will have.

(iii) Plausible evaluation: I introduce a tractable and generalizable measure for cooperative behavior in multi-agent systems that is counterfactually contrastive, contextual, and customizable with respect to different environmental parameters. This measure can be of practical use in disambiguating between cases in which collective welfare is achieved through genuine cooperation, or by each agent acting solely in its own self-interest, both of which result in the same outcome.

Pranay Manocha FPO

Date and Time

Wednesday, May 1, 2024 - 1:30pm to 3:30pm

Location

Computer Science 401

Type

FPO

Pranay Manocha will present his FPO "Do we need a Reference Signal for Speech Quality Assessment?" on Wednesday, May 1, 2024 at 1:30 PM in CS 401.

Location: CS 401

The members of Pranay’s committee are as follows:
Examiners: Adam Finkelstein (Adviser), Szymon Rusinkiewicz, Paul Calamia (Meta Reality Labs Research)
Readers: Karthik Narasimhan, Zeyu Jin (Adobe Research)

A copy of his thesis is available upon request. Please email gradinfo (@cs.princeton.edu) if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
This thesis investigates new metrics for assessing speech quality that aim to align more closely with human auditory perception than current methods. It aims to improve the techniques and understanding of speech quality evaluation. It considers traditional methods that compare speech to a perfect (clean) reference and introduces new approaches for scenarios where such a reference is not available. It also emphasizes the significance of reference signals and explores the necessity for flexible evaluation techniques that can function effectively without an ideal reference. The dissertation describes three main categories of metrics: full-reference (FR), no-reference (NR), and non-matching reference (NMR), providing a detailed comparison of their benefits and limitations. Despite the general preference for FR metrics in situations where a corresponding clean reference signal is available, this research identifies specific circumstances where FR metrics may not be the most effective approach, thereby highlighting the utility and relevance of NMR metrics across different evaluative scenarios. Another contribution of this thesis is the introduction of CoRN, a novel metric formulated through the integration of FR, and NR metrics. This metric builds on an exhaustive analysis of various evaluation metrics, demonstrating its utility in advancing audio quality assessment. Additionally, applying these methods to spatial audio in augmented and virtual reality settings expands the thesis’s contribution to the more general domain of audio quality assessment. This thesis aims to improve the techniques and understanding of speech quality evaluation. This dissertation aims to refine and expand the methodologies and understanding of speech quality evaluation, a crucial step for the evolution of digital communication technologies.

Ted Sumers FPO

Date and Time

Wednesday, April 24, 2024 - 11:00am to 1:00pm

Location

Computer Science 301

Type

FPO

Ted Sumers will present his FPO "Grounding Communication in Real-World Action" on Wednesday, April 24, 2024 at 11:00 AM in CS 301

Location: CS 301

The members of Ted’s committee are as follows:
Examiners: Tom Griffiths (Adviser), Ryan Adams, Adele Goldberg
Readers: Karthik Narasimhan, Dylan Hadfield-Menell (MIT), Tom Griffiths (Adviser)

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
This dissertation bridges psychology and artificial intelligence (AI) to develop agents capable of learning through communication with humans. The first half establishes a foundation by comparing the efficacy of language and demonstration for transmitting complex concepts. Experiments reveal language’s superior ability to convey abstract rules, suggesting its importance for social learning. I then connect computational models of pragmatic language understanding to reinforcement learning settings, grounding a speaker’s utility in their listener’s decision problem. Behavioral evidence validates this as a model of human language use.

Building on these insights, the second half develops AI agents capable of learning from such language. I first extend the computational model to incorporate both commands and teaching. Experiments show this allows an AI listener to robustly infer the human’s latent reward function. I then introduce the problem of learning from fully natural language and contribute two novel approaches: utilizing aspect-based sentiment analysis and a inference network learned end-to-end. Behavioral evaluations demonstrate these models successfully learn from interactive human feedback.

Together, this dissertation provides a formal computational theory of the cognitive mechanisms supporting human social learning and embeds them in artificial agents. I discuss implications both for large language models and the continued development of AI agents that acquire and use information through genuine dialogue. This work suggests that building machines to learn as humans do – socially and linguistically – is a promising path towards beneficial artificial intelligence.

Fangyin Wei FPO

Date and Time

Monday, April 22, 2024 - 1:30pm to 3:30pm

Location

Computer Science 302

Type

FPO

Fangyin Wei will present her FPO "Learning to Edit 3D Objects and Scenes" on Monday, April 22, 2024 at 1:30 PM in CS 302

Location: CS 302

The members of Fangyin’s committee are as follows:
Examiners: Szymon Rusinkiewicz (Adviser), Thomas Funkhouser (Adviser), Jia Deng
Readers: Felix Heide, Olga Russakovsky

A copy of her thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend her talk.

Abstract follows below:
3D editing plays a key role in many fields ranging from AR/VR, industrial and art design, to robotics. However, existing 3D editing tools either (i) demand labor-intensive manual efforts and struggle to scale to many examples, or (ii) use optimization and machine learning but produce unsatisfactory results (e.g., losing details, supporting only coarse editing, etc.). These shortcomings often arise from editing in geometric space rather than structure-aware semantic space, where the latter is the key to automatic 3D editing at scale. While learning a structure-aware space will result in significantly improved efficiency and accuracy, labeled datasets to train 3D editing models don’t exist. In this dissertation, we present novel approaches for learning to edit 3D objects and scenes in structure-aware semantic space with noisy or no supervision.

We first address how to extract the underlying structure to edit 3D objects, with a focus on editing two critical properties: semantic shape parts and articulations.

Our semantic editing method enables specific edits to an object’s semantic parameters (e.g., the pose of a person’s arm or the length of an airplane’s wing), leading to better preservation of input details and improved accuracy compared to previous work.

Next, we introduce a 3D annotation-free method that learns to model geometry, articulation, and appearance of articulated objects from color images. The model works on an entire category (as opposed to typical NeRF extensions that only overfit on a single scene) and enables various applications such as few-shot reconstruction and static object animation. It also generalizes to real-world captures.

Then, we tackle how to extract structure for scene editing. We present an automatic system that removes clutter (frequently moving objects such as clothes or chairs) from 3D scenes and inpaints the resulting holes with coherent geometry and texture. We address challenges including the lack of well-defined clutter annotations, entangled semantics and geometry, and multi-view inconsistency.

In summary, this dissertation demonstrates techniques to exploit the underlying structure of 3D data for editing. Our work opens up new research directions such as leveraging structures from other modalities (e.g., text, images) to empower 3D editing models with stronger semantic understanding.

Marcelo Orenes Vera FPO

Date and Time

Friday, May 3, 2024 - 11:00am to 1:00pm

Location

Computer Science Small Auditorium (Room 105)

Type

FPO

details forthcoming

Uma Girish FPO

Date and Time

Thursday, May 2, 2024 - 11:00am to 1:00pm

Location

Computer Science Tea Room

Type

FPO

details forthcoming