Towards Efficient and Reliable Machine Learning for Natural Language Processing (and Beyond)

Monday, March 27, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Karthik Narasimhan

Adam Fisch
In this talk, I will introduce work on fundamental techniques for building and deploying effective natural language processing (NLP) systems that are also efficient and reliable. Specifically, I will address three interconnected challenges for modern machine learning in NLP: how to quickly adapt foundation models to new tasks with limited data, how to dynamically reconfigure large architectures for more efficient computation, and how to develop powerful theoretical tools for rigorous, yet practical, uncertainty quantification. To conclude, I will highlight a number of my future research directions, as well as extensions to interesting applications beyond natural language.

Bio: Adam Fisch is a PhD candidate at MIT working with Regina Barzilay and Tommi Jaakkola, and a recipient of an NSF Graduate Research Fellowship. His research centers around principled methods for efficient and reliable machine learning systems that work effectively in realistic scenarios, and has appeared in top-tier venues such as *ACL, ICLR, ICML, and NeurIPS. Adam also served as a co-instructor for the tutorial on Uncertainty Estimation for NLP at COLING 2022, and as a co-organizer of the Machine Reading for Question Answering workshops at EMNLP 2019 and 2021. Prior to MIT, Adam was a research engineer at Meta (Facebook) AI Research for two years, and studied mechanical engineering as an undergraduate at Princeton University.

Collaborative, Communal, & Continual Machine Learning

Monday, March 20, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Danqi Chen

Colin Raffel
Pre-trained models have become a cornerstone of machine learning thanks to the fact that they can provide improved performance with less labeled data on downstream tasks. However, these models are typically created by resource-rich research groups that unilaterally decide how a given model should be built, trained, and released, after which point it is never updated. In contrast, open-source development has demonstrated that it is possible for a community of contributors to work together to iteratively build complex and widely used software. This kind of large-scale distributed collaboration is made possible through a mature set of tools including version control and package management. In this talk, I will discuss a research focus in my group that aims to make it possible to build machine learning models in the way that open-source software is developed. Specifically, I will discuss our preliminary work on merging multiple models while retaining their individual capabilities, patching models with cheaply-communicable updates, designing modular model architectures, and tracking changes through a version control system for model parameters. I will conclude with an outlook on how the field will change once truly collaborative, communal, and continual machine learning is possible.

Bio: Colin Raffel is an Assistant Professor at UNC Chapel Hill and a Faculty Researcher at Hugging Face. His work aims to make it easy to get computers to do new things. Consequently, he works mainly on machine learning (enabling computers to learn from examples) and natural language processing (enabling computers to communicate in natural language). He received his Ph.D. from Columbia University in 2016 and spent five years as a research scientist at Google Brain.

Towards Responsible Machine Learning in Societal Systems

Wednesday, April 5, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Olga Russakovsky and Peter Ramadge

Lydia Liu
Machine learning systems are deployed in consequential domains such as education, employment, and credit, where decisions have profound effects on socioeconomic opportunity and life outcomes. High stakes decision settings present new statistical, algorithmic, and ethical challenges. In this talk, we examine the distributive impact of machine learning algorithms in societal contexts, and investigate the algorithmic and sociotechnical interventions that bring machine learning systems into alignment with societal values---equity and long-term welfare. First, we study the dynamic interactions between machine learning algorithms and populations, for the purpose of mitigating disparate impact in applications such as algorithmic lending and hiring. Next, we consider data-driven decision systems in competitive environments such as markets, and devise learning algorithms to ensure efficiency and allocative fairness. We end by outlining future directions for responsible machine learning in societal systems that bridge the gap between the optimization of predictive models and the evaluation of downstream decisions and impact.

Bio: Lydia T. Liu is a postdoctoral researcher in Computer Science at Cornell University, working with Jon Kleinberg, Karen Levy, and Solon Barocas. Her research examines the theoretical foundations of machine learning and algorithmic decision-making, with a focus on societal impact and human welfare. She obtained her PhD in Electrical Engineering and Computer Sciences from UC Berkeley, advised by Moritz Hardt and Michael Jordan, and has received a Microsoft Ada Lovelace Fellowship, an Open Philanthropy AI Fellowship, an NUS Development Grant, and a Best Paper Award at the International Conference on Machine Learning.

This talk is co-sponsored with Electrical and Computer Engineering and the Center for Information Technology Policy.

Controlling Large Language Models: Generating (Useful) Text from Models We Don’t Fully Understand

Thursday, March 23, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Danqi Chen

Ari Holtzman
Generative language models have recently exploded in popularity, with services such as ChatGPT deployed to millions of users. These neural models are fascinating, useful, and incredibly mysterious: rather than designing what we want them to do, we nudge them in the right direction and must discover what they are capable of. But how can we rely on such inscrutable systems?

This talk will describe a number of key characteristics we want from generative models of text, such as coherence and correctness, and show how we can design algorithms to more reliably generate text with these properties. We will also highlight some of the challenges of using such models, including the need to discover and name new and often unexpected emergent behavior. Finally, we will discuss the implications this has for the grand challenge of understanding models at a level where we can safely control their behavior.

Bio: Ari Holtzman is a PhD student at the University of Washington. His research has focused broadly on generative models of text: how we can use them and how can we understand them better. His research interests have spanned everything from dialogue, including winning the first Amazon Alexa Prize in 2017, to fundamental research on text generation, such as proposing Nucleus Sampling, a decoding algorithm used broadly in deployed systems such as the GPT-3 API and academic research. Ari completed an interdisciplinary degree at NYU combining Computer Science and the Philosophy of Language.

Integrating expertise into computational tools for design and media authoring

Wednesday, March 8, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Andrés Monroy-Hernández

Mackenzie Leake
Finding a good computational representation for a problem allows us to map high level objectives to low level details and select the appropriate set of algorithmic tools. Selecting this representation requires not only computational knowledge but also a deep understanding of the application domain. In this talk I will discuss my work on building design and media authoring tools by combining domain expertise with a wide range of algorithmic techniques. I will describe how this approach helps us to offload tedious steps to computation and guide users’ attention toward the more creative, open-ended decisions. As two different examples of this approach, I will discuss my work on video editing and quilt design tools. I will also discuss future opportunities to combine domain expertise and algorithmic insights to build novel computational tools.

Bio: Mackenzie Leake is a METEOR postdoctoral fellow at MIT CSAIL. She received her PhD and MS in computer science from Stanford University and a BA in computational science and studio art from Scripps College. Her research in human-computer interaction and computer graphics focuses on designing computational tools for various creative domains, including textiles and video. Her research has been supported by Adobe Research, Brown Institute for Media Innovation, and Stanford Enhancing Diversity in Graduate Education (EDGE) fellowships. In 2022 she was named a Rising Star in EECS and a WiGraph Rising Star in Computer Graphics.

Designing Provably Performant Networked Systems

Monday, April 3, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Ravi Netravali

Venkat Arun
As networked systems become critical infrastructure, their design must reflect their new societal role. Today, we build systems with hundreds of heuristics but often do not understand their inherent and emergent behaviors. I will present a set of tools and techniques to prove performance properties of heuristics running in real-world conditions. Rigorous proofs can not only inspire confidence in our designs, but also give counter-intuitive insights about their performance.

A key theme in our approach is to model uncertainty in systems using non-random, non-deterministic objects that cover a wide range of possible behaviors under a single abstraction. Such models allow us to analyze complex system behaviors using automated reasoning techniques. I will present automated tools to analyze congestion control and process scheduling algorithms. These tools prove performance properties and find counter-examples where widely deployed heuristics fail. I will also show that current end-to-end congestion control algorithms that bound delay cannot avoid starvation and present a method to beamform wireless signals using thousands of antennas.

Bio: Venkat Arun is a PhD candidate at MIT working with Hari Balakrishnan and Mohammad Alizadeh. His work spans internet congestion control, video streaming, privacy-preserving computation, wireless networks, and mobile systems. Across these areas, a unifying theme of his work is to bridge between heuristics that systems use in practice and proofs of how well they work. He believes that rigorous proof combined with automated reasoning will enable us to make networked systems more robust and performant. He has won two ACM SIGCOMM best paper awards and the president of India gold medal.

Self-Supervised Reinforcement Learning

Tuesday, March 21, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Karthik Narasimhan

Benjamin Eysenbach
Reinforcement learning (RL) promises to harness the power of machine learning to solve sequential decision making problems, with the potential to enable applications ranging from robotics to chemistry. However, what makes the RL paradigm broadly applicable is also what makes it challenging: only limited feedback is provided for learning to select good actions. In this talk, I will discuss how we have made headway of this challenge by designing a class of self-supervised RL methods, ones that can learn skills for acting using unsupervised (reward-free) experience. These skill learning methods are practically-appealing and have since sparked a vibrant area of research. I will also share how we have answered some open theoretical questions in this area.

Bio: Benjamin Eysenbach a final-year PhD student at Carnegie Mellon University. His research has developed machine learning algorithms for sequential decision making. His algorithms not only achieve a high degree of performance, but also carry theoretical guarantees, are typically simpler than prior methods, and draw connections between many areas of ML and CS. Ben is the recipient of the NSF and Hertz graduate fellowships. Prior to the PhD, he was a resident at Google Research and studied math as an undergraduate at MIT.

Programming Distributed Systems

Thursday, April 6, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Andrew Appel

Mae Milano
Our interconnected world is increasingly reliant on distributed systems of unprecedented scale, serving applications which must share state across the globe. And, despite decades of research, we're still not sure how to program them!  In this talk, I'll show how to use ideas from programming languages to make programming at scale easier, without sacrificing performance, correctness, or expressive power in the process.  We'll see how slight tweaks to modern imperative programming languages can provably eliminate common errors due to replica consistency or concurrency---with little to no programmer effort.  We'll see how new language designs can unlock new systems designs, yielding both more comprehensible protocols and better performance.  And we'll conclude by imagining together the role that a new cloud-centric programming language could play in the next generation of distributed programs.

Bio: Mae Milano is a postdoctoral scholar at UC Berkeley working at the intersection of Programming Languages, Distributed Systems, and Databases.  Her work has appeared at top-tier venues including PLDI, OOPSLA, POPL, VLDB, and TOCS, and has attracted the attention of the Swift language team. She is a recipient of the NDSEG Fellowship, has won several awards for her writing and service, and is a founding member of the Computing Connections Fellowship's selection committee (https://computingconnections.org/).

Hardware-aware Algorithms for Efficient Machine Learning

Thursday, March 2, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Jia Deng

Tri Dao
Machine learning (ML) models training will continue to grow to consume more cycles, their inference will proliferate on more kinds of devices, and their capabilities will be used on more domains. Some goals central to this future are to make ML models efficient so they remain practical to train and deploy, and to unlock new application domains with new capabilities. We describe some recent developments in hardware-aware algorithms to improve the efficiency-quality tradeoff of ML models and equip them with long context. In the first half, we focus on structured sparsity, a natural approach to mitigate the extensive compute and memory cost of large ML models. We describe a line of work on learnable fast transforms which, thanks to their expressiveness and efficiency, yields some of the first sparse training methods to speed up large models in wall-clock time (2x) without compromising their quality. In the second half, we focus on efficient Transformer training and inference for long sequences. We describe FlashAttention, a fast and memory-efficient algorithm to compute attention with no approximation. By careful accounting of reads/writes between different levels of memory hierarchy, FlashAttention is 2-4x faster and uses 10-20x less memory compared to the best existing attention implementations, allowing us to train higher-quality Transformers with 8x longer context. FlashAttention is now widely used in some of the largest research labs and companies, in just 6 months after its release. We conclude with some exciting directions in ML and systems, such as software-hardware co-design, structured sparsity for scientific AI, and long context for new AI workflows and modalities.

Bio: Tri Dao is a PhD student in Computer Science at Stanford, co-advised by Christopher Ré and Stefano Ermon. He works at the interface of machine learning and systems, and his research interests include sequence models with long-range memory and structured matrices for compact deep learning models. His work has received the ICML 2022 Outstanding paper runner-up award.

The Design of a General-Purpose Distributed Execution System

Wednesday, March 1, 2023 - 12:30pm to 1:30pm
Computer Science Small Auditorium (Room 105)
Wyatt Lloyd

Stephanie Wang
Scaling applications with distributed execution has become the norm. With the rise of big data and machine learning, more and more developers must build applications that involve complex and data-intensive distributed processing.

In this talk, I will discuss the design of a general-purpose distributed execution system that can serve as a common platform for such applications. Such a system offers two key benefits: (1) common system functionality such as distributed resource management can be shared across different application domains, and (2) by building on the same platform, applications across domains can easily interoperate.

First, I will introduce the distributed futures interface, a powerful yet expressive distributed programming abstraction for remote execution and memory. Second, I will introduce ownership, an architecture for distributed futures systems that simultaneously provides horizontal scalability, low latency, and fault tolerance. Finally, I will present Exoshuffle, a large-scale shuffle system that builds on distributed futures and ownership to match the speed and reliability of specialized data processing frameworks while using an order of magnitude less code. These works have reached a broad audience through Ray, an open-source distributed futures system for Python that has more than 23,000 GitHub stars and that has been used to train ChatGPT and to break the world record for CloudSort.

Bio: Stephanie Wang is a final-year PhD student at UC Berkeley, advised by Professor Ion Stoica. She is interested in distributed systems, with current focus on problems in cloud computing and fault tolerance. She is a co-creator and committer of the popular open-source project Ray for distributed Python. Stephanie has received the UC Berkeley Chancellor’s Fellowship, a Distinguished Artifact Award at SOSP’19, and was selected for Rising Stars in EECS in 2021.

