Quick links

Colloquium

Make Your Database Dream of Electric Sheep: Designing for Autonomous Operation

Date and Time
Friday, November 16, 2018 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium

Andy Pavlo
In the last 20 years, researchers and vendors have built advisory tools to assist DBAs in tuning and physical design. Most of this previous work is incomplete because they require humans to make the final decisions about any database changes and are reactionary measures that fix problems after they occur. What is needed for a "self-driving" DBMS are components that are designed for autonomous operation. This will enable new optimizations that are not possible today because the complexity of managing these systems has surpassed the abilities of humans.

In this talk, I present the core design principles of an autonomous DBMS. These are necessary to support ample data collection, fast state changes, and accurate reward observations. I will discuss techniques on how to build a new autonomous DBMS or the steps needed to retrofit an existing one to enable automated management. Our work is based on our experiences at CMU from developing an automatic tuning service (OtterTune) and our self-driving DBMS (Peloton).

Bio:
Andy Pavlo is an Assistant Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. He also used to raise clams.

Lunch for talk attendees will be served at 12:00pm.

Datacenters and Energy Efficiency: A Game-Theoretic Perspective

Date and Time
Tuesday, May 1, 2018 - 4:30pm to 5:30pm
Location
Andlinger Center Maeder Hall
Type
Colloquium

Sharing datacenter hardware improves energy efficiency, but whether strategic users participate in consolidated systems depends on management policies. Users who dislike allocations may refuse to participate and deploy private, less-efficient systems. We rethink systems management, drawing on game theory to model strategic behavior and incentivize participation. We illustrate this perspective for two fundamental challenges in datacenters.  For power delivery, we design sprinting games to produce equilibria in which users selfishly draw power for performance boosts yet avoid oversubscribing the shared supply. For resource allocation, we use Cobb-Douglas utility functions to produce fair allocations that incentivize users to share cache and memory. These solutions provide foundations for rigorously managing systems shared by strategic, competitive participants. 
 
Bio:
Benjamin Lee is an Associate Professor of Electrical and Computer Engineering at Duke University. He received his B.S. from the University of California at Berkeley (2004), his Ph.D. from Harvard University (2008), and his post-doctorate from Stanford University (2010). He has held visiting positions at Microsoft Research, Intel Labs, and Lawrence Livermore National Lab. Dr. Lee’s research interests include computer architecture, energy efficiency, and security / privacy. He pursues these interests by building interdisciplinary links to statistical inference and algorithmic economics. His research has been recognized by IEEE Micro Top Picks (4x), Communications of the ACM Research Highlights (3x), as well as paper honors from the ASPLOS, HPCA, MICRO, and SC conferences. He received the NSF Computing Innovation Fellowship, NSF CAREER Award, and Google Faculty Research Award.

"Quantum Supremacy" and the Complexity of Random Circuit Sampling

Date and Time
Monday, April 23, 2018 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Host
Sanjeev Arora

A critical goal for the field of quantum computation is quantum supremacy -- a demonstration of any quantum computation that is prohibitively hard for classical computers. Besides dispelling any skepticism about the viability of quantum computers, quantum supremacy also provides a test of quantum theory in the realm of high complexity. A leading near-term candidate, put forth by the Google/UCSB team, is sampling from the probability distributions of randomly chosen quantum circuits, called Random Circuit Sampling (RCS).

While RCS was defined with experimental realization in mind (the first results are expected later this year), we give the first complexity-theoretic evidence of classical hardness of RCS, placing it on par with the best theoretical proposals for supremacy. Specifically, we show that RCS satisfies an average-case hardness condition -- computing output probabilities of typical quantum circuits is as hard as computing them in the worst-case, and therefore #P-hard. Our reduction exploits the polynomial structure in the output amplitudes of random quantum circuits, enabled by the Feynman path integral. We also describe a new verification measure which in some formal sense maximizes the information gained from experimental samples.

Based on joint work with Adam Bouland, Bill Fefferman and Chinmay Nirkhe. 

Deep Learning and Cognition

Date and Time
Thursday, November 16, 2017 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Host
Dr. Sanjeev Arora

Christopher Manning
Deep learning has had enormous success on perceptual tasks but still struggles in providing a model for inference. To address this gap, we have been developing Compositional Attention Networks (CANs). The CAN design provides a strong prior for explicitly iterative reasoning, enabling it to support explainable and structured learning, as well as generalization from a modest amount of data. The model builds on the great success of existing recurrent cells such as LSTMs: A CAN is a sequence of a single recurrent Memory, Attention, and Control (MAC) cell, and by careful design imposes structural constraints on the operation of each cell and the interactions between them, incorporating explicit control and soft attention mechanisms into their interfaces. We demonstrate the model’s strength and robustness on the challenging CLEVR dataset for visual reasoning (Johnson et al. 2016), achieving a new state-of-the-art 98.9% accuracy, halving the error rate of the previous best model. More importantly, we show that the new model is more computationally efficient and data-efficient, requiring an order of magnitude less time and/or data to achieve good results. Joint work with Drew Arad.

Bio: Christopher Manning is the Thomas M. Siebel Professor in Machine Learning, Linguistics and Computer Science at Stanford University. He works on software that can intelligently process, understand, and generate human language material.  He is a leader in applying Deep Learning to Natural Language Processing, including exploring Tree Recursive Neural Networks, sentiment analysis, neural network dependency parsing, the GloVe model of word vectors, neural machine translation, and deep language understanding. He also focuses on computational linguistic approaches to parsing, robust textual inference and multilingual language processing, including being a principal developer of Stanford Dependencies and Universal Dependencies. Manning is an ACM Fellow, a AAAI Fellow, an ACL Fellow, and a Past President of ACL. He has coauthored leading textbooks on statistical natural language processing and information retrieval. He is a member of the Stanford NLP group (@stanfordnlp) and manages development of the Stanford CoreNLP software.

Data+

Date and Time
Tuesday, February 14, 2017 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Host
Amit Singer, PACM CS CSML

In this talk I will explore how data science is changing the way we practice education and research.

My title Data+ is that of a ten-week summer research experience offered by the Information Institute at Duke (iiD) in which undergraduates join small project teams, working alongside other teams in a communal environment. They learn how to marshal, analyze, and visualize data, as well as how to communicate with the client sponsoring the project. The 2016 program involved about 70 undergraduate students and 30 graduate student and postdoctoral mentors working on 25 projects. I will describe projects that illuminate different ways in which student centered enquiry can transform university institutions.

Our ambition for research is that it impacts society. We used to think of impact as starting with an idea, then developing that idea into a prototype, then turning the prototype into a product, then marketing the product, and so on – it is a long march and the problem with long marches is that most ideas don’t make it. I will describe an example of a different model, one where society is part of the research process. It is a collaboration with Apple on child mental health led by Guillermo Sapiro.
 

The Implications of Privacy-Aware Choice

Date and Time
Tuesday, February 7, 2017 - 4:30pm to 5:30pm
Location
Sherrerd Hall 101
Type
Colloquium

Privacy concerns are becoming a major obstacle to using data in the way that we want. It's often unclear how current regulations should translate into technology, and the changing legal landscape surrounding privacy can cause valuable data to go unused.  In addition, when people know that their current choices may have future consequences, they might modify their behavior to ensure that their data reveal less --- or perhaps, more favorable --- information about themselves.  Given these concerns, how can we continue to make use of potentially sensitive data, while providing satisfactory privacy guarantees to the people whose data we are using?  Answering this question requires an understanding of how people reason about their privacy and how privacy concerns affect behavior.

In this talk, we will see how strategic and human aspects of privacy interact with existing tools for data collection and analysis.  I will begin by adapting the standard model of consumer choice theory to a setting where consumers are aware of, and have preferences over, the information revealed by their choices.  In this model of privacy-aware choice, I will show that little can be inferred about a consumer's preferences once we introduce the possibility that she has concerns about privacy, even when her preferences are assumed to satisfy relatively strong structural properties.  Next, I will analyze how privacy technologies affect behavior in a simple economic model of data-driven decision making.  Intuition suggests that strengthening privacy protections will both increase utility for the individuals providing data and decrease usefulness of the computation. I will demonstrate that this intuition can fail when strategic concerns affect consumer behavior.  Finally, I'll discuss ongoing behavioral experiments, designed to empirically measure how people trade off privacy for money, and to test whether human behavior is consistent with theoretical models for the value of privacy.

Acquiring Metric and Semantic Information using Autonomous Robots

Date and Time
Tuesday, February 23, 2016 - 12:31pm
Location
Not yet determined.
Type
Colloquium

Recent years have seen impressive progress in robot control and perception including adept manipulation, aggressive quadrotor maneuvers, dense metric map reconstruction, and object recognition in real time. The grand challenge in robotics today is to capitalize on these advances in order to enable autonomy at a higher-level of intelligence.  It is compelling to envision teams of autonomous robots in environmental monitoring, precision agriculture, construction and structure inspection, security and surveillance, and search and rescue.

In this talk, I will emphasize that many such applications can be addressed by thinking about how to coordinate robots in order to extract useful information about the environment. More precisely, I will formulate a general active estimation problem that captures the common characteristics of the aforementioned scenarios. I will show how to manage the complexity of the problem over metric information spaces with respect to long planning horizons and large robot teams. These results lead to computationally scalable, non-myopic algorithms with quantified performance for problems such as distributed source seeking and active simultaneous localization and mapping (SLAM).

I will then focus on acquiring information using both metric and semantic observations (e.g., object recognition).  In this context, there are several new challenges such as missed detections, false alarms, and unknown data association.  To address them, I will model semantic observations via random sets and will discuss filtering using such models. A major contribution of our approach is in proving that the complexity of the problem is equivalent to computing the permanent of a suitable matrix. This enables us to develop and experimentally validate algorithms for semantic localization, mapping, and planning on mobile robots, Google's project Tango phone, and the KITTI visual odometry dataset.

Instruction Sets Want to Be Free: A Case for RISC-V

Date and Time
Wednesday, November 11, 2015 - 4:30pm to 5:30pm
Location
Friend Center Convocation Room
Type
Colloquium

We start by quickly reviewing 50 years of computer architecture to show there is now widespread agreement on instruction set architecture (ISA). Unlike most other fields, despite this harmony there is no open alternative to proprietary offerings from ARM and Intel.
Thus, we propose RISC-V (“RISC Five”), which targets Systems on a Chip (SoC). It has:

  • A small base of <50 classic RISC instructions that run a full open-source software stack.
  • Opcodes reserved for tailoring an SoC to applications.
  • Standard instruction extensions optionally included in an SoC.
  • Incorporated, as an open ISA, community suggestions before extensions are finalized.
  • A foundation to evolve the RISC-V slowly based solely on technical reasons voted on by members vs. by companies that inflate ISAs rapidly for business as well as technical reasons; ARM and Intel average about 2 new instructions per month.
  • No restrictions: there is no cost, no paperwork, and anyone can use it.

Attendees will get a 2-page reference card (“green card”), which lists all RISC-V extensions, to contrast this minimal ISA with the 3,600-page x86 manual and the 5,400-page ARMv8 manual.
We conclude by recapping 10 RISC-V chips built using Agile methods in just 4 years, including how shockingly cheap it is today to manufacture 100 2x2-mm, 28-nm chips. We used Chisel, a new hardware design language that reduces design effort by greatly increasing reuse.
 
David Patterson joined UC Berkeley nearly 40 years ago. He has been Director of several research labs, Chair of Berkeley’s CS Division, Chair of the Computing Research Association, and President of ACM. His most successful projects are likely Reduced Instruction Set Computers (RISC), Redundant Arrays of Inexpensive Disks (RAID), and Network of Workstations (NOW), all of which helped lead to multi-billion-dollar industries. This research led to many papers, 6 books, and about 35 honors for him and his friends, including election to the National Academy of Engineering, the National Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He shared the IEEE von Neumann Medal and the NEC C&C Prize with John Hennessy, co-author of two of his books and President of Stanford University.

The Resilience of the Perceptron

Date and Time
Monday, September 28, 2015 - 4:30pm to 5:30pm
Location
Fine Hall 214
Type
Colloquium
Host
PACM and CS

The most widely used optimization method in machine learning practice is the Perceptron Algorithm, also known as the Stochastic Gradient Method (SGM). This method has been used since the fifties to build statistical estimators, iteratively improving models by correcting errors observed on single data points. SGM is not only scalable, robust, and simple to implement, but achieves the state-of-the-art performance in many different domains. In contemporary systems, SGM powers enterprise analytics systems and is the workhorse tool used to train complex pattern-recognition systems in speech and vision.

In this talk, I will explore why SGM has had such staying power, focusing on notions of stability and robustness. I will first discuss how SGM is robust to perturbations of the model and the updates. From a computing systems perspective, this robustness enables parallel implementations with minimal communication, with no locking or synchronization, and with strong spatial locality. I will then show how SGM is robust to perturbations of the data itself, and prove that any model trained with stochastic gradient method in a reasonable amount of time attains small generalization error.  I will subsequently provide a new interpretation of common practices in neural networks, and provide a formal rationale for many popular techniques in training large, deep models.

Sustainable Networks and Systems for the Developing World

Date and Time
Tuesday, April 28, 2009 - 3:00pm to 4:30pm
Location
Friend Center 013
Type
Colloquium
Speaker
Lakshmi Subramanian , from NYU
Host
Jennifer Rexford
The Internet and the World Wide Web have largely remained urban phenomena and a significant fraction of the developing world, especially in rural and underdeveloped regions, remains disconnected from the rest of the world.

In this talk, I will briefly elaborate on our current efforts in addressing three broad challenges that can help in bridging the digital divide: (a) developing low-cost high-bandwidth connectivity solutions; (b) extending the Web to developing regions;(c) building new mobile applications for enhancing healthcare and financial services. This talk will specifically focus on the challenge of low-cost connectivity solutions for developing regions. I will describe our experiences in developing and deploying WiFi-based Long Distance (WiLD) Networks, a low-cost connectivity solution which provides the ability to achieve nearly 6-10 Mbps over 100s of kilometers for roughly $1000/link. WiLD networks have been deployed in 10 developing countries and have been used to provide telemedicine services in Aravind Eye hospitals in India to 50,000 patients/year in rural areas which is currently being expanded to cover 500,000 patients/year. We are currently extending WiLD networks to develop WiFi based Rural Extensions (WiRE), a low-cost alternative to cellular networks for rural settings.

Bio

Prof. Lakshminarayanan Subramanian is an Assistant Professor in the Courant Institute of Mathematical Sciences at NYU. His research interests are in the space of networks, distributed systems, security and technologies for developing regions. He received his MS and PhD from UC Berkeley and B.Tech from IIT-Madras.

Follow us: Facebook Twitter Linkedin