CS Department Colloquium Series | Page 7 | Computer Science Department at Princeton University

Grounding Language by Seeing, Hearing, and Interacting

Date and Time

Monday, March 28, 2022 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Rowan Zellers, from University of Washington

Host

Danqi Chen

As humans, our understanding of language is grounded in a rich mental model about “how the world works” – that we learn through perception and interaction. We use this understanding to reason beyond what we literally observe or read, imagining how situations might unfold in the world. Machines today struggle at this kind of reasoning, which limits how they can communicate with humans.

In my talk, I will discuss three lines of work to bridge this gap between machines and humans. I will first discuss how we might measure grounded understanding. I will introduce a suite of approaches for constructing benchmarks, using machines in the loop to filter out spurious biases. Next, I will introduce PIGLeT: a model that learns physical commonsense understanding by interacting with the world through simulation, using this knowledge to ground language. From an English-language description of an event, PIGLeT can anticipate how the world state might change – outperforming text-only models that are orders of magnitude larger. Finally, I will introduce MERLOT, which learns about situations in the world by watching millions of YouTube videos with transcribed speech. Through training objectives inspired by the developmental psychology idea of multimodal reentry, MERLOT learns to jointly reason over language, vision, and sound.

Together, these directions suggest a path forward for building machines that learn language rooted in the world.

Bio: Rowan Zellers is a final year PhD candidate at the University of Washington in Computer Science & Engineering, advised by Yejin Choi and Ali Farhadi. His research focuses on enabling machines to understand language, vision, sound, and the world beyond these modalities. He has been recognized through an NSF Graduate Fellowship and a NeurIPS 2021 outstanding paper award. His work has appeared in several media outlets, including Wired, the Washington Post, and the New York Times. In the past, he graduated from Harvey Mudd College with a B.S. in Computer Science & Mathematics, and has interned at the Allen Institute for AI.

This talk will be recorded and live-streamed at https://mediacentrallive.princeton.edu/

Optimizing CPU Efficiency and Tail Latency in Datacenters

Date and Time

Thursday, April 21, 2022 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Amy Ousterhout ‘13, from University of California, Berkeley

Host

Jennifer Rexford

The slowing of Moore’s Law and increased concerns about the environmental impacts of computing are exerting pressure on datacenter operators to use resources such as CPUs and memory more efficiently. However, it is difficult to improve efficiency without degrading the performance of applications.

In this talk, I will focus on CPU efficiency and how we can increase efficiency while maintaining low tail latency for applications. The key innovation is to reallocate cores between applications on the same server very quickly, every few microseconds. First I will describe Shenango, a system design that makes such frequent core reallocations possible. Then I will show how policy choices for core reallocation and load balancing impact CPU efficiency and tail latency, and present the policies that yield the best combination of both.

Bio: Amy is a postdoctoral researcher in the Department of Electrical Engineering and Computer Sciences at UC Berkeley. She received her PhD in Computer Science from MIT and her BSE in Computer Science from Princeton University. Her research is on operating systems and distributed systems, and focuses on improving the efficiency, performance, and usability of applications in datacenters. She is a recipient of a Jacobs Presidential Fellowship at MIT, an NSF Graduate Research Fellowship, and a Hertz Foundation Fellowship.

This talk will be recorded and live-streamed at https://mediacentrallive.princeton.edu/

Interactive AI Model Debugging and Correction

Date and Time

Tuesday, March 22, 2022 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Sherry Tongshuang Wu, from University of Washington

Host

Andrés Monroy-Hernández & Adam Finkelstein

Research in Artificial Intelligence (AI) has advanced at an incredible pace, to the point where it is making its way into our everyday lives, explicitly and behind the scenes. However, beneath their impressive progress, many AI models hide deficiencies that amplify social biases or even cause fatal accidents. How do we identify, improve, and cope with imperfect models, while still benefiting from their use? I will discuss my work empowering humans to interact with AI models in order to debug and correct them. I will describe both (1) how I help experts run scalable and testable analyses on models in development, and (2) how I help end users collaborate with deployed AI in a transparent and controllable way. In my final remarks, I will discuss my future research perspectives on building human-centered AI through data-centric approaches.

Bio: Sherry Tongshuang Wu is a final year Ph.D. Candidate in Computer Science & Engineering at the University of Washington, advised by Jeffrey Heer and Dan Weld. She received her B.Eng in CSE from the Hong Kong University of Science and Technology. Her research lies at the intersection of Human-Computer Interaction (HCI) and Natural Language Processing (NLP), and aims to empower humans to debug and correct AI models interactively, both when the model is under active development, and after it is deployed for end users. Sherry has authored 19 papers in top-tier NLP, HCI and Visualization conferences and journals such as ACL, CHI, TOCHI, TVCG, etc, including a best paper award (top-1) and an honorable mention (top-3). You can find out more about her at https://homes.cs.washington.edu/~wtshuang/.

This talk will be recorded and live-streamed at https://mediacentrallive.princeton.edu/

Learning-Based Program Synthesis: Learning for Program Synthesis and Program Synthesis for Learning

Date and Time

Monday, March 21, 2022 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Xinyun Chen, from University of California, Berkeley

Host

Jia Deng

With the advancement of modern technologies, programming becomes ubiquitous not only among professional software developers, but also for general computer users. However, gaining programming expertise is time-consuming and challenging. Therefore, program synthesis has many applications, where the computer automatically synthesizes programs from specifications such as natural language descriptions and input-output examples. In this talk, I will present my work on learning-based program synthesis, where I have developed deep learning techniques for various program synthesis problems. Despite the remarkable success of deep neural networks for many domains, including natural language processing and computer vision, existing deep neural networks are still insufficient for handling challenging symbolic reasoning and generalization problems.

My learning-based program synthesis research lies in two folds: (1) learning to synthesize programs from potentially ambiguous and complex specifications; and (2) neural-symbolic learning for language understanding. I will first talk about program synthesis applications, where my work demonstrates the applicability of learning-based program synthesizers for production usage. I will then present my work on neural-symbolic frameworks that integrate symbolic components into neural networks, which achieve better reasoning and generalization capabilities. In closing, I will discuss the challenges and opportunities of further improving the complexity and generalizability of learning-based program synthesis for future work.

Bio: Xinyun Chen is a Ph.D. candidate at UC Berkeley, working with Prof. Dawn Song. Her research lies at the intersection of deep learning, programming languages, and security. Her recent research focuses on learning-based program synthesis and adversarial machine learning. She received the Facebook Fellowship in 2020, and Rising Stars in Machine Learning in 2021. Her work SpreadsheetCoder for spreadsheet formula prediction was integrated into Google Sheets, and she was part of the AlphaCode team when she interned at DeepMind.

This talk will be recorded and live-streamed at https://mediacentrallive.princeton.edu/

Reliable machine learning in the wild

Date and Time

Thursday, March 17, 2022 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Pang Wei Koh, from Stanford University

Host

Sanjeev Arora

Machine learning systems are widely deployed today, but they are unreliable. They can fail – and with catastrophic consequences – on subpopulations of the data, such as particular demographic groups, or when deployed in different environments from what they were trained on. In this talk, I will describe our work towards building reliable machine learning systems that are robust to these failures. First, I will show how we can use influence functions to understand the predictions and failures of existing models through the lens of their training data. Second, I will discuss the use of distributionally robust optimization to train models that perform well across all subpopulations. Third, I will describe WILDS – a benchmark of in-the-wild distribution shifts spanning applications such as pathology, conservation, remote sensing, and drug discovery – and show how current state-of-the-art methods, which perform well on synthetic distribution shifts, still fail to be robust on these real-world shifts. Finally, I will describe our work on building more reliable COVID-19 models, using anonymized cellphone mobility data, to inform public health policy; this is a challenging application as the underlying environment is often changing and there is substantial heterogeneity across demographic subpopulations.

Bio: Pang Wei Koh is a PhD student at Stanford, advised by Percy Liang. He studies the theory and practice of building reliable machine learning systems. His research has been published in Nature and Cell, featured in media outlets such as The New York Times and The Washington Post, and recognized by best paper awards at ICML and KDD, a Meta Research PhD fellowship, and the Kennedy Prize for best honors thesis at Stanford. Prior to his PhD, he was the 3rd employee and Director of Partnerships at Coursera.

This talk will be recorded and live-streamed at https://mediacentrallive.princeton.edu/

High-Performance Languages for Visual Computing

Date and Time

Tuesday, March 15, 2022 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Gilbert Bernstein, from University of California, Berkeley and MIT CSAIL

Host

Felix Heide

Computer Graphics and Visual Computing problems challenge us with a near inexhaustible demand for more resolution and scale in order to simulate the climate, reconstruct 3d environments, train neural networks, and produce games & movies. Building such applications requires integrating disciplinary expertise (e.g. physics, numerical methods, geometry and custom hardware) into a single system. However, abstraction barriers are regularly discarded in the name of higher-performance, leading to code that must be written and maintained by super-experts—programmers who simultaneously possess deep knowledge of all relevant disciplines. Programming languages, especially Domain Specific Languages (DSLs) are perhaps the most promising approach to recovering a separation of concerns in such high-performance systems.

In this talk, I will first describe my work on DSLs to enable parallel portability of physical simulation and optimization programs, including the use of relational algebra and automatic differentiation to structure these problem domains. Then I will discuss more recent work on “horizontal DSLs” designed to dig into specific sub-problems: maximizing utilization of novel hardware accelerators, formally verifying optimizations of tensor programs, and extending automatic differentiation to correctly handle discontinuities in inverse rendering and simulation problems.

Bio: Gilbert Bernstein is a Postdoctoral Scholar at the University of California, Berkeley and MIT CSAIL, working with Professor Jonathan Ragan-Kelley. His research lies in Computer Graphics and Programming Languages, especially the design of high-performance domain specific languages for numeric computing applications such as physical simulation, optimization and inverse problems. His work spans the gamut from user interfaces, to differentiable programming, parallel-portability, and new hardware design languages. His work has been published at SIGGRAPH, POPL, PLDI, & OOPSLA, as well as being incorporated into products at Adobe, Autodesk, and Disney. He holds a Ph.D. in Computer Science from Stanford University, where he was advised by Pat Hanrahan.

This talk will be recorded and live-streamed at https://mediacentrallive.princeton.edu/

Solving the Cloud Efficiency Crisis with Fast and Accessible Scheduling

Date and Time

Monday, March 14, 2022 - 12:30pm to 1:30pm

Location

Zoom Webinar (off campus)

Type

CS Department Colloquium Series

Speaker

Kostis Kaffes, from Stanford University

Host

Ravi Netravali

Webinar registration here

Operating systems (OS) specialization is necessary as the one-size-fits-all approach of fundamental OS operations such as scheduling is incompatible with today's diverse application landscape. Such specialization can improve application performance and cloud platform efficiency by an order of magnitude or more. Towards this goal, I will first discuss Shinjuku, a specialized OS that supports an order of magnitude higher load and lower tail latency than state-of-the-art systems by enabling better scheduling. Shinjuku leverages hardware support for virtualization to preempt as often as every 5 microseconds and disproves the conventional wisdom that interrupts are incompatible with microsecond timescales. Then, I will present Syrup, a framework that enables everyday application developers to specify custom scheduling policies easily and safely deploy them across different layers of the stack over existing operating systems like Linux, bringing the benefits of specialized scheduling to everyone. For example, Syrup allowed us to implement policies that previously required specialized dataplanes in less than 20 lines of code and improve the performance of an in-memory database by 8x without needing any application modification.

Bio: Kostis Kaffes is a final-year Ph.D. candidate in Electrical Engineering at Stanford University, advised by Christos Kozyrakis. He is broadly interested in computer systems, cloud computing, and scheduling. His thesis focuses on end-host, rack-scale, and cluster-scale scheduling for microsecond-scale tail latency with the goal of improving efficiency in the cloud. Recently, he has been looking for ways to make it easier to implement and deploy custom scheduling policies across different layers of the stack. Kostis's research has been supported by a Facebook Research Award and various scholarships and fellowships from Stanford, A.G. Leventis Foundation, and Gerondelis Foundation. Prior to Stanford, he received his undergraduate degree in Electrical and Computer Engineering from the National Technical University of Athens in Greece.

This talk will be recorded.

Eliminating Bugs in Real Systems

Date and Time

Thursday, March 18, 2021 - 12:30pm to 1:30pm

Location

Zoom Webinar (off campus)

Type

CS Department Colloquium Series

Speaker

Fraser Brown, from Stanford University

Host

Wyatt Lloyd

Please register here

Software is everywhere, and almost everywhere, software is broken. Some bugs just crash your printer; others hand an identity thief your bank account number; still others let nation-states spy on dissidents and persecute minorities.

This talk outlines my work preventing bugs using a blend of programming languages techniques and systems design. First, I'll talk about securing massive, security-critical codebases without clean slate rewrites. This means rooting out hard-to-find bugs---as in Sys, which scales symbolic execution to find exploitable bugs in systems like the twenty-million line Chrome browser. It also means proving correctness of especially vulnerable pieces of code---as in VeRA, which automatically verifies part of the Firefox JavaScript engine. Finally, I'll discuss work on stronger foundations for new systems---as in CirC, a recent project unifying compiler infrastructure for program verification, cryptographic proofs, optimization problems, and more.

Bio: Fraser Brown is a PhD student at Stanford advised by Dawson Engler, occasional visiting student at UCSD with Deian Stefan, and NSF graduate research fellowship recipient. She works at the intersection of programming languages, systems, and security, and her research has been used by several companies. She holds an undergraduate degree in English from Stanford.

To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Reliable Machine Learning in Feedback Systems

Date and Time

Monday, March 29, 2021 - 4:30pm to 5:30pm

Location

Zoom Webinar (off campus)

Type

CS Department Colloquium Series

Speaker

Sarah Dean, from University of California, Berkeley

Host

Elad Hazan

Please register here

Machine learning techniques have been successful for processing complex information, and thus they have the potential to play an important role in data-driven decision-making and control. However, ensuring the reliability of these methods in feedback systems remains a challenge, since classic statistical and algorithmic guarantees do not always hold.

In this talk, I will provide rigorous guarantees of safety and discovery in dynamical settings relevant to robotics and recommendation systems. I take a perspective based on reachability, to specify which parts of the state space the system avoids (safety) or can be driven to (discovery). For data-driven control, we show finite-sample performance and safety guarantees which highlight relevant properties of the system to be controlled. For recommendation systems, we introduce a novel metric of discovery and show that it can be efficiently computed. In closing, I discuss how the reachability perspective can be used to design social-digital systems with a variety of important values in mind.

Bio: Sarah is a PhD candidate in the Department of Electrical Engineering and Computer Science at UC Berkeley, advised by Ben Recht. She received her MS in EECS from Berkeley and BSE in Electrical Engineering and Math from the University of Pennsylvania. Sarah is interested in the interplay between optimization, machine learning, and dynamics in real-world systems. Her research focuses on developing principled data-driven methods for control and decision-making, inspired by applications in robotics, recommendation systems, and developmental economics. She is a co-founder of a transdisciplinary student group, Graduates for Engaged and Extended Scholarship in computing and Engineering, and the recipient of a Berkeley Fellowship and a NSF Graduate Research Fellowship.

To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.

Data Structures and Algorithms in Sublinear Computation

Date and Time

Thursday, March 4, 2021 - 12:30pm to 1:30pm

Location

Zoom Webinar (off campus)

Type

CS Department Colloquium Series

Speaker

Huacheng Yu, from Princeton University

Host

Zeev Dvir

Please register here

Sublinear algorithms are sustainable under the exponential increase of data volume and processing speed. Such algorithms use a sublinear amount of resources, e.g., spending time, space, or communication that is asymptotically smaller than the input data size. Typical examples include data structures, which compute query functions of the data in sublinear time, and streaming algorithms, which make one pass over massive data streams while maintaining a sublinear-sized memory.

In this talk, I will give an overview of my work in sublinear computation, focusing on succinct data structures and distributed graph sketching algorithms. I will first discuss my work on a nearly optimal data structure for the dictionary problem, for which the textbook solution uses hash tables. Then, I will talk about detecting the connectivity of graphs using distributed sketching, and my recent work showing the optimality of a well-known sketching algorithm (the AGM sketch). I will conclude the talk with discussion on future directions and my other work in theoretical computer science.

Bio: Huacheng Yu is an associate research scholar in the Department of Computer Science at Princeton University. His research interests include data structures and streaming algorithms, and other directions in theoretical computer science such as communication complexity and graph algorithms. Prior to Princeton, Huacheng was a postdoctoral researcher at Harvard University hosted by Jelani Nelson and Madhu Sudan. He received his Ph.D. from Stanford University (advised by Ryan Williams and Omer Reingold) and B.Eng from Tsinghua University, both in Computer Science.

To request accommodations for a disability please contact Emily Lawrence, emilyl@cs.princeton.edu, at least one week prior to the event.