Quick links

CS Department Colloquium Series

The Future is Hear: Innovations from the Interactive Audio Lab

Date and Time
Monday, October 7, 2024 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Adam Finkelstein

Bryan Pardo
The Interactive Audio Lab, headed by Bryan Pardo, works at the intersection of machine learning, signal processing and human-computer interaction. The lab invents new tools to generate, modify, find, separate, and label sound. In this talk, Prof. Pardo will discuss three projects illustrative of the work in the lab: 

Text2FX: Audio effects (e.g., equalization, reverberation, compression) are a cornerstone of modern audio production. However, their complex and unintuitive controls (e.g., decay, cutoff frequency) make them challenging for non-technical musicians, podcasters and sound artists. As people naturally describe sound in terms like `bright' or `warm,' natural language can serve as a more intuitive and accessible way to navigate the complex parameter spaces of audio effects. Text2FX leverages a shared audio-text embedding space (CLAP) and differentiable digital signal processing (DDSP) to control audio effects, such as equalization and reverberation, using open-vocabulary natural language prompts (e.g., “make it sound in-your-face and bold”).

VampNet: In recent years, advances in discrete acoustic token modeling have resulted in significant leaps in autoregressive generation of speech and music. Meanwhile, approaches that use non-autoregressive parallel iterative decoding have been developed for efficient image synthesis. In this work, we combine parallel iterative decoding with acoustic token modeling and apply them to music audio synthesis. The resulting model, VampNet is fast enough for interactive performance and can be prompted by music audio prompts, making it well suited for creating loops and variational accompaniment in artistic contexts. 

VoiceBlock: Deep-learning-based speaker recognition systems can facilitate mass surveillance, allowing search for a target speaker through thousands of concurrent voice communications. In this work, we propose a highly-effective approach to anonymize speech to an automated speaker recognition system, while leaving the voice perceptually unaltered to a human listener.  Because our method does not conceal speaker identity from human listeners, it still allows high-effort targeted surveillance (e.g. authorized human-attended wiretaps of criminal enterprises), while making mass automated surveillance significantly less reliable. In this way, we hope to return to the status quo of the 20th and early 21st centuries – in which the need for human listeners provided an important check on mass surveillance.

Bio: Bryan Pardo studies fundamental problems in computer audition, content-based audio search, and generative modeling of audio, and also develops inclusive interfaces for audio production. He is head of Northwestern University’s Interactive Audio Lab and co-director of the Northwestern University Center for HCI+Design. Prof. Pardo has appointments in the Department of Computer Science and Department of Radio, Television and Film. He received a M. Mus. in Jazz Studies in 2001 and a Ph.D. in Computer Science in 2005, both from the University of Michigan. He has authored over 140 peer-reviewed publications. He has developed speech analysis software for the Speech and Hearing department of the Ohio State University, statistical software for SPSS and worked as a machine learning researcher for General Dynamics. His patented technologies have been productized by companies including Bose, Adobe, Lexi, and Ear Machine. While finishing his doctorate, he taught in the Music Department of Madonna University. When he is not teaching or researching, he performs on saxophone and clarinet with the bands Son Monarcas and The East Loop.


If you need an accommodation for a disability please contact Emily Lawrence at emilyl@cs.princeton.edu at least one week before the event.

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

Learning Models of the Environment for Reinforcement Learning

Date and Time
Monday, April 22, 2024 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Sanjeev Arora

Tim Lillicrap
Model-based algorithms for decision making have long held the promise of being more powerful and data-efficient than model-free counterparts. However, the widespread application of model-based methods has been limited by the need for perfect models of the environment. The game of Go was mastered by AlphaGo using a combination of neural networks and the MCTS planning algorithm. But, planning required a perfect simulator of the game rules. For cases such as robotics or natural language understanding, where no perfect simulators are available, so model-based approaches are not easy to apply effectively. Addressing this limitation, I will describe algorithms (Dreamer and MuZero) that utilize deep neural networks to learn robust environment models. These models are used to imagine potential futures. Imagined futures can be used for planning, and for learning policy and value functions. The advent and success of powerful model-based learning algorithms offer hints about the next wave of language models.

Bio: Timothy Lillicrap received an Hon. B.Sc. in Cognitive Science & Artificial Intelligence from the University of Toronto and a Ph.D. in Systems Neuroscience from Queen’s University in Canada. He moved to the University of Oxford in 2012 where he worked as a Postdoctoral Research Fellow. In 2014 he joined Google DeepMind as a Research Scientist and became a Director of Research in 2023.  His research focuses on machine learning for optimal control and decision making, as well as using these mathematical frameworks to understand how the brain learns.  He has developed new algorithms for exploiting deep networks in the context of reinforcement learning, and new recurrent memory architectures for one-shot learning problems.  His projects have included applications of deep learning to robotics, solving games such as Go and Starcraft, and human interaction.


To request accommodations for a disability please contact Donna Ghilino, dg3548@princeton.edu, at least one week prior to the event.

Successes and failures of machine learning models of sensory systems

Date and Time
Tuesday, April 2, 2024 - 2:30pm to 3:30pm
Location
Princeton Neuroscience Institute A32
Type
CS Department Colloquium Series
Host
Nathaniel Daw

Jenelle Feather
The environment is full of rich sensory information. Our brain can parse this input, understand a scene, and learn from the resulting representations. The past decade has given rise to computational models that transform sensory inputs into representations useful for complex behaviors such as speech recognition or image classification. These models can improve our understanding of biological sensory systems and may provide a test bed for technology that aids sensory impairments, provided that model representations resemble those in the brain. In this talk, I will discuss my research program, which aims to develop methods to compare model representations with those of biological systems and to use insights from these methods to better understand perception and cognition. I will cover experiments in both the auditory and visual domains that bridge between neuroscience, cognitive science, and machine learning. By investigating the similarities and differences between computational model representations and those present in biological systems, we can use these insights to improve current computational models and better explain how our brain utilizes robust representations for perception and cognition. 

Bio: Jenelle Feather is a Flatiron Research Fellow at the Center for Computational Neuroscience (CCN), working with SueYeon Chung and Eero Simoncelli. She received her Ph.D. in 2022 from the Department of Brain and Cognitive Sciences at MIT, working in the Laboratory for Computational Audition with Josh McDermott. During that time, she was a Friends of McGovern Institute Graduate Fellow as part of the McGovern Institute, a DOE Computational Science Graduate Fellow, and was affiliated with the Center for Brains Minds and Machines. Previously, she was an intern at Google, a research assistant with Nancy Kanwisher, and received undergraduate degrees in physics and brain and cognitive sciences from MIT.


Coffee and Refreshments will be available outside A32 before the seminar.

Learning from Interaction

Date and Time
Monday, April 1, 2024 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Ryan Adams

Kiante Brantley
Machine learning systems have seen advancements due to large models pre-trained on vast amounts of data. These pre-trained models have led to progress on various downstream tasks when fine-tuned. However, for machine learning systems to function in real-world environments, they must overcome certain challenges that are not influenced by model or dataset sizes. One potential solution is to fine-tune machine learning models based on online interactions.

In this talk, I will present my research on developing natural language processing systems that learn from interacting in an environment. I will begin by describing the issues that arise when systems are trained on offline data and then deployed in interactive environments. Additionally, I will present an algorithm that addresses these issues using only environmental interaction without additional supervision. Moreover, I will demonstrate how learning from interaction can improve natural language processing systems. Finally, I will present a set of new interactive learning algorithms explicitly designed for natural language processing systems.

Bio: Kianté Brantley is a Postdoctoral Associate in the Department of Computer Science at Cornell University., working with Thorsten Joachims. He completed his Ph.D. in Computer Science at the University of Maryland College Park, advised by Dr. Hal Daumé III. His research focuses on developing machine learning models that can make automated decisions in the real world with minimal supervision. His research lies at the intersection of imitation learning, reinforcement learning, and natural language processing. He is a recipient of the NSF LSAMP BD Fellowship, ACM SIGHPC Computational and Data Science Fellowship, Microsoft Dissertation Research Grant, Ann G. Wylie Dissertation Fellowship, and NSF CIFellow Postdoctoral Fellowship.


To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Unifying the mechanisms of the hippocampal and prefrontal cognitive maps

Date and Time
Wednesday, March 20, 2024 - 2:30pm to 3:30pm
Location
Princeton Neuroscience Institute A32
Type
CS Department Colloquium Series
Speaker
James Whittington, from University of Oxford & Stanford University
Host
Nathaniel Daw

James Whittington
Cognitive maps have emerged as leading candidates, both conceptually and neurally, for explaining how brains seamlessly generalize structured knowledge across apparently different scenarios. Two brain systems are implicated in cognitive mapping: the hippocampal formation and the prefrontal cortex. Neural activity in these brain regions, however, differs during the same task, indicating that the regions have different mechanisms for cognitive mapping. In this talk, we first provide a mechanistic understanding of how the hippocampal and prefrontal systems could build cognitive maps (with the hippocampal mechanism related to transformers and the prefrontal mechanism related to RNNs/SSMs); second, we demonstrate how these two mechanisms explain a wealth of neural data in both brain regions; and lastly, we prove that the two different mechanisms are, in fact, mathematically equivalent

Bio: James Whittington works at Oxford and Stanford. He has been at Oxford since he was 18, first doing a physics undergraduate and masters, then medical school, then a PhD. He is now a Sir Henry Wellcome fellow. His PhD was with Rafal Bogacz and Tim Behrens. His work tries to mechanistically understanding how neural networks - both artificial and biological - solve structured tasks.


To request accommodations for a disability please contact Yi Liu, irene.yi.liu@princeton.edu, at least one week prior to the event.

The Host Network (and its implications to network protocols, OS and hardware)

Date and Time
Tuesday, April 2, 2024 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Wyatt Lloyd

Saksham Agarwal
The host network enables data transfers within hosts, and forms the “last mile” for data transfers across hosts for distributed applications. This talk will reflect on my (ongoing) journey that started with a surprising phenomenon observed in a lab experiment—nanosecond-scale inefficiencies within the host network percolating through network protocols and OS to create millisecond-scale impact on distributed applications. I will discuss my work on understanding, characterizing, and resolving the above phenomenon in the lab and in production clusters. I will also discuss how this phenomenon opens up intriguing research questions at the intersection of computer networking, OS and architecture.

Bio: Saksham Agarwal is a PhD student in the Computer Science department at Cornell University, advised by Prof. Rachit Agarwal. He did his undergraduate studies at IIT Kanpur. He is a recipient of Google PhD Fellowship, Cornell University Fellowship, a SIGCOMM Best Student Paper Award, and a Cornell CS Outstanding TA Award.


To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Distributionally Robust Machine Learning

Date and Time
Tuesday, March 26, 2024 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Ellen Zhong

Shiori Sagawa
Machine learning models are widely deployed today, but they can fail due to distribution shifts: mismatches in the data distribution between training and deployment. Models can fail on certain subpopulations (e.g., language models can fail on non-English languages) and on new domains unseen during training (e.g., medical models can fail on new hospitals). In this talk, I will discuss my work on algorithms for improving robustness to distribution shifts. First, to mitigate subpopulation shifts, I develop methods that leverage distributionally robust optimization (DRO). My methods overcome the computational and statistical obstacles of applying DRO on modern neural networks and on real-world shifts. Second, to tackle domain shifts, I build WILDS, a benchmark of real-world shifts, and show that existing methods fail on WILDS even though they perform well on synthetic shifts from prior benchmarks. I then develop a state-of-the-art method that successfully mitigates real-world domain shifts; my method proposes an alternative to domain invariance—a key principle behind the prior methods—to reflect the structure of real-world shifts. Altogether, my algorithms improve robustness to a wide range of distribution shifts in the wild, from subpopulation shifts in language modeling to domain shifts in wildlife monitoring and histopathology.

Bio: Shiori Sagawa is a final-year PhD Candidate in Computer Science at Stanford University, advised by Percy Liang. Her research focuses on algorithms for reliable machine learning. She was awarded the Stanford Graduate Fellowship and an Apple Scholars in AI/ML PhD Fellowship. Prior to her PhD, she received her B.A. in Computer Science and Molecular and Cell Biology from UC Berkeley, and she worked at D. E. Shaw Research.


To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Stochastic Computer Graphics

Date and Time
Monday, March 25, 2024 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Adam Finkelstein

Silvia Sellan
Computer Graphics research has long been dominated by the interests of large film, television and social media companies, forcing other, more safety-critical applications (e.g., medicine, engineering, security) to repurpose Graphics algorithms originally designed for entertainment. In this talk, I will advocate for a perspective shift in our field that allows us to design algorithms directly for these safety-critical application realms. I will show that this begins by reinterpreting traditional Graphics tasks (e.g., 3D modeling and reconstruction) from a statistical lens and quantifying the uncertainty in our algorithmic outputs, as exemplified by the research I have conducted for the past five years. I will end by mentioning several ongoing and future research directions that carry this statistical lens to entirely new problems in Graphics and Vision and into specific applications.

Bio: Silvia is a fifth year Computer Science PhD student at the University of Toronto, working in Computer Graphics and Geometry Processing. She is a Vanier Doctoral Scholar, an Adobe Research Fellow and the winner of the 2021 University of Toronto Arts & Science Dean’s Doctoral Excellence Scholarship. She has interned twice at Adobe Research and twice at the Fields Institute of Mathematics. She is also a founder and organizer of the Toronto Geometry Colloquium and a member of WiGRAPH.


To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Generalizing Beyond the Training Distribution through Compositional Generation

Date and Time
Thursday, April 4, 2024 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Felix Heide

Yilun Du
Generative AI has led to stunning successes in recent years but is fundamentally limited by the amount of data available.  This is especially limiting in the embodied setting – where an agent must solve new tasks in new environments. In this talk, I’ll introduce the idea of compositional generative modeling, which enables generalization beyond the training data by building complex generative models from smaller constituents. I’ll first introduce the idea of energy-based models and illustrate how they enable compositional generative modeling. I’ll then illustrate how such compositional models enable us to synthesize complex plans for unseen tasks at inference time. Finally, I'll show how such compositionality can be applied to multiple foundation models trained on various forms of Internet data, enabling us to construct decision-making systems that can hierarchically plan and solve long-horizon problems in a zero-shot manner.

Bio: Yilun Du is final year PhD student at MIT CSAIL advised by Leslie Kaelbling, Tomas Lozano-Perez and Joshua Tenenbaum. His research spans the fields of machine learning and robotics, with a focus on generative models.  He is supported by the NSFGraduate Research Fellowship and was previously a research fellow at OpenAI, a visiting researcher at FAIR and a student researcher at Google Deepmind.


To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Scalable and Efficient Systems for Large Language Models

Date and Time
Tuesday, March 5, 2024 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Ravi Netravali

Lianmin Zheng
Large Language Models (LLMs) have been driving recent breakthroughs in AI. These advancements would not have been possible without the support of scalable and efficient infrastructure systems. In this talk, I will introduce several underlying systems I have designed and built to support the entire model lifecycle, from training to deployment to evaluation. First, I will present Alpa, a system for large-scale model-parallel training that automatically generates execution plans unifying data, operator, and pipeline parallelism. Next, I will discuss efficient deployment systems, covering the frontend programming interface and backend runtime optimizations for high-performance inference. Finally, I will complete the model lifecycle by presenting our model evaluation efforts, including the crowdsourced live benchmark platform, Chatbot Arena, and the automatic evaluation pipeline, LLM-as-a-Judge. These projects have collectively laid a solid foundation for large language model systems, being widely adopted by leading LLM developers and companies. I will conclude by outlining some future directions of machine learning systems, such as co-optimizing across the full stack for building AI-centric applications.

Bio: Lianmin Zheng is a Ph.D. student in the EECS department at UC Berkeley, advised by Ion Stoica and Joseph E. Gonzalez. His research interests include machine learning systems, large language models, compilers, and distributed systems. He builds full-stack, scalable, and efficient systems to advance the development of AI. He co-founded LMSYS.org, where he leads impactful open-source large language model projects such as Vicuna and Chatbot Arena, which have received millions of downloads and served millions of users. He also co-organized the Big Model Tutorial at ICML 2022. He has received a Meta Ph.D. Fellowship, an IEEE Micro Best Paper Award, and an a16z open-source AI grant.


To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Follow us: Facebook Twitter Linkedin