CS Department Colloquium Series | Page 22 | Computer Science Department at Princeton University

Coordination Avoidance in Distributed Databases

Date and Time

Thursday, February 12, 2015 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Peter Bailis, from University of California, Berkeley

Host

Michael Freedman

Peter Bailis

The rise of Internet-scale geo-replicated services has led to considerable upheaval in the design of modern data management systems. Namely, given the availability, latency, and throughput penalties associated with classic mechanisms such as serializable transactions, a broad class of systems (e.g., "NoSQL") has sought weaker alternatives that reduce the use of expensive coordination during system operation, often at the cost of application integrity. When can we safely forego the cost of this expensive coordination, and when must we pay the price?

In this talk, I will discuss the potential for coordination avoidance -- the use of as little coordination as possible while still ensuring application integrity -- in several modern data-intensive domains. Specifically, I will demonstrate how to leverage the semantic requirements of applications in data serving, transaction processing, and statistical analytics to enable more efficient distributed algorithms and system designs. The prototype systems I have built demonstrate order-of-magnitude speedups compared to their traditional, coordinated counterparts on a variety of tasks, including referential integrity and index maintenance, transaction execution under common isolation models, and asynchronous convex optimization. I will also discuss our experiences studying and optimizing a range of open source applications and systems, which exhibit similar results.

Peter Bailis is a Ph.D. candidate at UC Berkeley working in databases and distributed systems. As part of his dissertation work, he has studied and built high performance distributed data management systems for large scale transaction processing, data serving, and statistical analytics in the AMPLab and BOOM projects under the advisement of Joseph M. Hellerstein, Ali Ghodsi, and Ion Stoica. He is the recipient of the NSF Graduate Research Fellowship, the Berkeley Fellowship for Graduate Study, and best-of-conference citations for research appearing in both SIGMOD and VLDB. He received his A.B. in Computer Science from Harvard College in 2011, where he also received the CRA Outstanding Undergraduate Researcher Award.

Learning compact models from high dimensional large datasets

Date and Time

Monday, February 9, 2015 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Yoram Singer, from Google Research

Host

Sebastian Seung

Yoram Singer

I review the design, analysis, and implementation of stochastic optimization techniques, online algorithms, and modeling approaches for learning in high dimensional spaces using large amounts of data. The focus is on algorithms and models that are efficient, accurate, and yield compact models. Concretely, the forward-backward shrinkage algorithm (Fobos), mirror descent for learning composite objectives (COMID), and the adaptive gradient (AdaGrad) algorithm. We also discuss simple yet effective modeling approaches based on locality for learning from high dimensional data.

Yoram Singer is a senior research scientist at Google. From 1999 through 2007 he was an associate professor at the Hebrew University of Jerusalem. From 1995 through 1999 he was a member of the technical staff at AT&T Research. He was the co-chair of the conference on Computational Learning Theory (COLT) in 2004 and of Neural Information Processing Systems (NIPS) in 2007. He serves as an editor of the Journal of Machine Learning, IEEE Signal Processing Magazine, and IEEE Trans. on Pattern Analysis and Machine Intelligence.

Building an Operating System for the Data Center

Date and Time

Monday, February 23, 2015 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Simon Peter, from University of Washington

Host

Jennifer Rexford

Simon Peter

Data centers run a range of important applications with ever increasing performance demands, from cloud and server computing to Big Data and eScience. However, the scaling of CPU frequency has stalled in recent years, leading to hardware architectures that no longer transparently scale software performance. Two trends stand out: 1) Instead of frequency, hardware architectures increase the number of CPU cores, leaving complex memory system performance and CPU scheduling tradeoffs exposed to software. 2) Network and storage I/O performance continues to improve, but without matching improvements in CPU frequency. Software thus faces ever increasing I/O efficiency demands.

In my research, I address how operating systems (OSes) can handle these growing complexity and performance pressures to avoid becoming the limiting factor to performance. I first explain how current OS architecture is already inadequate to address these trends, limiting application performance. I then present how the OS can be redesigned to eliminate the performance limitations without compromising on existing features or application compatibility. I finish with an outlook on how these hardware trends affect software in the future and present ideas to address them.

Simon is a postdoctoral research associate at the University of Washington, where he leads research in operating systems and networks. His postdoctoral advisors are Tom Anderson and Arvind Krishnamurthy. Simon received a Ph.D. in Computer Science from ETH Zurich in 2012 and an MSc in Computer Science from the Carl-von-Ossietzky University Oldenburg, Germany in 2006.

Simon's research focus is on data center performance issues. For his work on the Arrakis high I/O performance operating system, he received the Jay Lepreau best paper award (2014) and the Madrona prize (2014). Previously, Simon has worked on the Barrelfish multicore operating system and conducted further award-winning systems research at various locations, including MSR Silicon Valley, MSR Cambridge, Intel Labs Germany, UC Riverside, and the NSF.

Randomness Vs. Memory: A Treasure(s) Hunt

Date and Time

Wednesday, February 4, 2015 - 12:30pm to 1:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Omer Reingold, from Weizmann Institute of Science

Host

Sanjeev Arora

Omer Reingold

Randomness plays a major role in computation, allowing the execution of tasks that would otherwise be impossible or prohibitively inefficient. Yet the relative power of randomness compared with other resources of computation is mostly unresolved. In this talk we will address the tradeoff between memory and randomness. Namely, can randomness signiﬁcantly reduce the amount of memory needed for algorithmic tasks? This is one of the central problems in Computational Complexity and its study is tied with the study of pseudorandom generators that fool memory-bounded algorithms (i.e. combinatorial objects that look random to small-memory distinguishers).

There are many exciting questions on the path towards resolving the randomness vs memory problem. In this talk we will focus on two such areas of research. First, we will discuss small memory algorithms for graph connectivity as well as pseudorandom walks, which we will relate to the general problem of randomness vs. memory. Then we will describe pseudorandom families of hash functions for data structures that are simultaneously small and efficient (with applications to load balancing, the two-choice paradigm, cuckoo hashing, and min-hashing). We will relate such families to progress on more traditional pseudorandom objects such as epsilon-bias sequences and functions with limited independence.

Omer Reingold is a Professor of Computer Science at the Weizmann Institute of Science and a Visiting Professor at Stanford University. Past positions include Microsoft Research, the Institute for Advanced Study in Princeton, NJ, and AT&T Labs. His research is in the Foundations of Computer Science and most notably in Computational Complexity and the Foundations of Cryptography with emphasis on randomness, derandomization and explicit combinatorial constructions. He is an ACM Fellow and among his distinctions are the 2005 Grace Murray Hopper Award and the 2009 Gödel Prize.

Data Centers, Energy, and Online Optimization

Date and Time

Thursday, January 15, 2015 - 11:00am to 12:00pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Adam Wierman, from California Institute of Technology

Host

Jennifer Rexford

Adam Wierman

This talk will tell two stories, one about designing sustainable data centers and one about the underlying algorithmic challenges, which fall into the context of online convex optimization.

Story 1: The typical message surrounding data centers and energy is an extremely negative one: Data centers are energy hogs. This message is pervasive in both the popular press and academia, and it certainly rings true. However, the view of data centers as energy hogs is too simplistic. One goal of this talk is to highlight that, yes, data centers use a lot of energy, but data centers can also be a huge benefit in terms of integrating renewable energy into the grid and thus play a crucial role in improving the sustainability of our energy landscape. In particular, I will highlight a powerful alternative view: data centers as demand response opportunities.

Story 2: Typically in online convex optimization it is enough to exhibit an algorithm with low (sub-linear) regret, which implies that the algorithm can match the performance of the best static solution in retrospect. However, what if one additionally wants to maintain performance that is nearly as good as the dynamic optimal, i.e., a good competitive ratio? In this talk, I'll highlight that it is impossible for an online algorithm to simultaneously achieve these goals. Luckily though, in practical settings (like data centers), noisy predictions about the future are often available, and I will show that, under a general model of prediction noise, even very limited predictions about the future are enough to overcome the impossibility result.

Adam Wierman is a Professor in the Department of Computing and Mathematical Sciences at the California Institute of Technology, where he is a founding member of the Rigorous Systems Research Group (RSRG) and maintains a popular blog called Rigor + Relevance. His research interests center around resource allocation and scheduling decisions in computer systems and services. He received the 2011 ACM SIGMETRICS Rising Star award, the 2014 IEEE Communications Society William R. Bennett Prize, and has been coauthor on papers that received of best paper awards at ACM SIGMETRICS, IEEE INFOCOM, IFIP Performance (twice), IEEE Green Computing Conference, IEEE Power & Energy Society General Meeting, and ACM GREENMETRICS.

Algorithms for Cancer Genomics

Date and Time

Monday, January 12, 2015 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Ben Raphael, from Brown University

Host

Barbara Engelhardt

Ben Raphael

Understanding the molecular alterations that cause cancer is one of the greatest challenges in biology and medicine. Recent advances in DNA sequencing technology now enable large-scale measurement of mutations in cancer genomes and provide an unprecedented opportunity to address this challenge. However, translating this information into deeper insight about the mutational process that drives cancer demands novel computational approaches.

In this talk, I will describe algorithms that we have developed to address key problems in cancer genomics. First, I will discuss algorithms that deconvolve DNA sequencing data from a single tumor and determine subpopulations of cancer cells harboring different mutations. These algorithms exploit integrality constraints on copy number aberrations and phylogenetic tree constraints that relate subpopulations. Next, I will describe algorithms to identify combinations of mutations that perturb cellular signaling and regulatory networks. One algorithm employs a heat diffusion process to identify subnetworks of a genome-scale interaction network that are recurrently mutated across samples. A second algorithm finds combinations of mutations that optimize a measure of mutual exclusivity across samples. I will illustrate applications of these approaches to multiple cancer types in The Cancer Genome Atlas (TCGA), including a recent Pan-Cancer study of >3000 samples from 12 cancer types.

Ben Raphael is an Associate Professor in the Department of Computer Science and Director of the Center for Computational Molecular Biology (CCMB) at Brown University. His research focuses on the design of algorithms for genome sequencing and interpretation. Recent interests include structural variation in human and cancer genomes, and network/pathway analysis of genetic variants. He received an S.B. in Mathematics from MIT, a Ph.D. in Mathematics from the University of California, San Diego (UCSD), and completed postdoctoral training in Bioinformatics and Computer Science at UCSD. He is the recipient of an NSF CAREER award, a Career Award from the Burroughs Wellcome Fund, and a Sloan Research Fellowship.

Computational Sprinting

Date and Time

Monday, December 8, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Thomas Wenisch, from University of Michigan

Host

Margaret Martonosi

Although transistor density continues to increase, power density is increasing each chip generation. Particularly in mobile devices, which have limited cooling options, these trends lead to "dark silicon" in which sustained chip performance is limited primarily by power rather than area. However, many mobile applications do not demand sustained performance; rather they comprise short bursts of computation in response to sporadic user activity.

To improve responsiveness for such applications, this talk describes "computational sprinting", in which the system activates otherwise powered-down cores for brief bursts of intense parallel computation. During a sprint the chip temporarily exceeds its sustainable thermal power budget to provide instantaneous throughput, after which the chip must return to nominal operation to cool down. To enable longer sprints, we describe a thermal design that incorporates phase-change materials to provide thermal capacitance. Results indicate that sprinting can both provide large improvements in responsiveness while actually saving energy due to race-to-idle effects.

Thomas Wenisch is an Associate Professor of Computer Science and Engineering at the University of Michigan, specializing in computer architecture. His prior research includes memory streaming for commercial server applications, multiprocessor memory systems, memory disaggregation, and rigorous sampling-based performance evaluation methodologies. His ongoing work focuses on computational sprinting, server and data center architectures, programming models for byte-addressable NVRAM, and architectures to enable hand-held 3D ultrasound. Wenisch received the NSF CAREER award in 2009. Prior to his academic career, Wenisch was a software developer at American Power Conversion, where he worked on data center thermal topology estimation. He received his Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University.

Searching and Browsing Visual Data

Date and Time

Wednesday, December 3, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Kristen Grauman, from University of Texas at Austin

Host

Tom Funkhouser

Kristen Gruman

Widespread visual sensors and unprecedented connectivity have left us awash with visual data---from online photo collections, home videos, news footage, medical images, or surveillance feeds. How can we efficiently browse image and video collections based on semantically meaningful criteria? How can we bring order to the data, beyond manually defined keyword tags? I will present work exploring these questions in the context of interactive visual search and summarization.

In particular, I’ll first introduce attribute representations that connect visual properties to human describable terms. I’ll show how these attributes enable both fine-grained content-based retrieval as well as new forms of human supervision for recognition problems. Then, I’ll overview our recent work on video summarization, where the goal is to automatically transform a long video into a short one. Using videos captured with egocentric wearable cameras, we’ll see how hours of data can be distilled to a succinct visual storyboard that is understandable in just moments. Together, these ideas are promising steps towards widening the channel of communication between humans and computer vision algorithms, which is critical to facilitate efficient browsing of large-scale image and video collections.

This is work done with Adriana Kovashka, Yong Jae Lee, Devi Parikh, Lu Zheng, Bo Xiong, and Dinesh Jayaraman.

Kristen Grauman is an Associate Professor in the Department of Computer Science at the University of Texas at Austin. Her research in computer vision and machine learning focuses on visual search and object recognition. Before joining UT-Austin in 2007, she received her Ph.D. in the EECS department at MIT, in the Computer Science and Artificial Intelligence Laboratory. She is an Alfred P. Sloan Research Fellow and Microsoft Research New Faculty Fellow, a recipient of NSF CAREER and ONR Young Investigator awards, the Regents' Outstanding Teaching Award from the University of Texas System in 2012, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence, and a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013. She and her collaborators were recognized with the CVPR Best Student Paper Award in 2008 for their work on hashing algorithms for large-scale image retrieval, and the Marr Best Paper Prize at ICCV in 2011 for their work on modeling relative visual attributes.

Some Algorithmic Challenges in Statistics

Date and Time

Monday, December 1, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Sham Kakade, from Microsoft Research, New England

Sham Kakade

Machine learning is seeing tremendous progress in its impact on society. Along with this progress comes an increasing role for both scalable algorithms and theoretical foundations; the hope being that the such progress can facilitate further breakthroughs on core AI problems. This will talk will survey recent progress and future challenges at the intersection of computer science and statistics, with a focus on three areas:

How can we learn the interactions between observed variables, where there exist certain latent (or hidden) causes which help to explain the correlations in the observed data. Such settings where latent variable models have seen successes include document (or topic) modeling, hidden Markov models (say for modeling time series of acoustic signals), and discovering communities of individuals in social networks.

The second is that of stochastic optimization. Many problems that arise in science and engineering are those in which we only have a stochastic approximation to the underlying problem at hand (e.g. linear regression or other problems where our objective function is a sample average). Such problems highlight some of the challenges we face at the interface of computer science and statistics: should we use a highly (numerically) accurate algorithm (with costly time and space requirements) or a crude stochastic approximation scheme like stochastic gradient descent (which is light on memory and simple to implement, yet has a poor convergence rate)?

Finally, I will provide a brief discussion with regards to future challenges inspired by the impressive successes of deep learning.

A recurring theme is that algorithmic advances can provide new practical techniques for statistical estimation.

Sham Kakade is a principal research scientist scientist at Microsoft Research, New England. His research focus is on designing scalable and efficient algorithms for machine learning and artificial intelligence; he has worked (and has continued interests) in areas such as statistics, optimization, probability theory, algorithms, economics, and neuroscience. Previously, Dr. Kakade was an associate professor at the Department of Statistics, Wharton, University of Pennsylvania (from 2010-2012) and was an assistant professor at the Toyota Technological Institute at Chicago. Before this, he did a postdoc in the Computer and Information Science department at the University of Pennsylvania under the supervision of Michael Kearns. Dr. Kakade completed his PhD at the Gatsby Unit where his advisor was Peter Dayan. Before Gatsby, Dr. Kakade was an undergraduate at Caltech where he did his BS in physics.

Interactive ML for People: The Small Data Problem

Date and Time

Wednesday, November 19, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Emma Brunskill, from Carnegie Mellon University

Host

Barbara Engelhardt

Emma Brunskill

Consider an intelligent tutoring system or an autonomous decision support tool for a doctor. Though such systems may in aggregate have a huge amount of data, the data collected for a single individual is typically very small, and the policy space (of what to next teach a student or how to help treat a patient) is enormous.

I will describe two machine learning efforts to tackle these small data challenges: learning across multiple tasks, and better use of previously collected task data, where tasks in both cases involve
sequential stochastic decision processes (reinforcement learning and bandits). I will also present results of how one of these techniques allowed us to substantially increase engagement in an educational game
to teach fractions.

Emma Brunskill is an assistant professor in the computer science department at Carnegie Mellon University. She is also affiliated with the machine learning department at CMU. She works on reinforcement learning, focusing on applications that involve artificial agents interacting with people, such as intelligent tutoring systems. She is a Rhodes Scholar, Microsoft Faculty Fellow and NSF CAREER award recipient, and her work has received best paper nominations in Education Data Mining (2012, 2013) and CHI (2014).