CS Department Colloquium Series | Page 24 | Computer Science Department at Princeton University

The CouchPotato Project: Learning Visual Concepts by Watching YouTube

Date and Time

Thursday, October 9, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Rahul Sukthankar, from Google Research

Host

Jianxiong Xiao

Rahul Sukthankar

What could computers learn about visual concepts, such as objects and actions, by watching large quantities of internet video, with minimal human supervision? In this talk, I will present several of our recent explorations in this area, including spatio-temporal object segmentation in video, weakly-supervised learning of human actions and unsupervised discovery of motion patterns in animal videos.

The talk includes research contributions from many colleagues at Google, interns and faculty colleagues: L. Del Pero, I. Essa, L. Fei-Fei, V. Ferrari, M. Grundmann, G. Hartmann, J. Hoffman, A. Karpathy, T. Leung, V. Kwatra, O. Madani, J. Rehg, S. Ricco, S. Shetty, K. Tang, G. Toderici, D. Tsai, J. Yagnik.

Rahul Sukthankar is a scientist at Google Research, an adjunct research professor in the Robotics Institute at Carnegie Mellon and courtesy faculty at UCF. He was previously a senior principal researcher at Intel Labs, a senior researcher at HP/Compaq and research scientist at Just Research. Rahul received his Ph.D. in Robotics from Carnegie Mellon in 1997 and his B.S.E. in Computer Science from Princeton in 1991. His current research focuses on computer vision and machine learning, particularly in the areas of object recognition and video understanding.

Mapping Neural Circuits in the Whole Mouse Brain

Date and Time

Wednesday, September 17, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Partha Mitra, from Cold Spring Harbor

Host

Sanjeev Arora

Neuroanatomy, a century old subject, is currently undergoing a computational technology-driven make-over. Fall in data storage prices as well as automated equipment for digital histology and imaging, has made it possible for entire mammalian brains to be digitized using light microscopy. The talk will describe an ongoing effort to systematically uncover the neural circuit architecture of the whole mouse brain by scaling up classical neuroanatomical methods (http://mouse.brainarchitecture.org). The resulting petabyte-sized data sets are larger than any previously encountered in neuroscience and pose new and interesting data-analysis challenges.

Partha Mitra received his B Sc in physics from Presidency College, Calcutta and his PhD in theoretical physics from Harvard University. He was an Assistant Professor of Physics at Caltech (1996) and a member of Theoretical Physics department at Bell Laboratories (1995-2003). He is currently Crick-Clay Professor of Biomathematics at Cold Spring Harbor Laboratory. His research combines experimental, theoretical and computational approaches to gain an understanding of how brains work.

Quantum computation as a lens on quantum physics

Date and Time

Thursday, April 10, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Dorit Aharonov, from Hebrew University, Jerusalem, Israel

Host

Sanjeev Arora

Quantum computation inspires the study of quantum many body systems from a computational perspective. This approach leads to a remarkably rich set of insights and questions, with deep implications to both physics and quantum computation. This direction, now coined "quantum Hamiltonian complexity", had turned over the past decade into an exciting fast growing field. I will try to give a taste of some of its main achievements, e.g., unexpected hardness of certain physical systems, and testing quantum mechanics using interactive proofs.

Dorit Aharonov did a BSc in Physics and Mathematics at the Hebrew university, an MSc in Physics at the Weizmann institute, and a PhD in Computer Science and Physics at the Hebrew university (1999). After a postdoc at IAS Princeton and UC Berkeley she joined the faculty of the CS department at the Hebrew university in 2001. In 2005, Aharonov was profiled in Nature as one of four "young theorists who are making waves in their chosen fields", and in 2006 she received the Krill Prize for Excellence in Scientific Research. In 2011 she was awarded an ERC starting grant from the European Research Council. Her current topics of interest include quantum algorithms and complexity, and the computational view on quantum mechanics and multiparticle entanglement.

The Aha! Moment: From Data to Insight

Date and Time

Tuesday, April 15, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Dafna Shahaf, from Stanford University

Host

Andrea LaPaugh

Dafna Shahaf

The amount of data in the world is increasing at incredible rates. Large-scale data has potential to transform almost every aspect of our world, from science to business; for this potential to be realized, we must turn data into insight. In this talk, I will describe two of my efforts to address this problem computationally:

The first project, Metro Maps of Information, aims to help people understand the underlying structure of complex topics, such as news stories or research areas. Metro Maps are structured summaries that can help us understand the information landscape, connect the dots between pieces of information, and uncover the big picture.

The second project proposes a framework for automatic discovery of insightful connections in data. In particular, we focus on identifying gaps in medical knowledge: our system recommends directions of research that are both novel and promising.

I will formulate both problems mathematically and provide efficient, scalable methods for solving them. User studies on real-world datasets demonstrate that our methods help users acquire insight efficiently across multiple domains.

Dafna Shahaf is a postdoctoral fellow at Stanford University. She received her Ph.D. from Carnegie Mellon University; prior to that, she earned an M.S. from the University of Illinois at Urbana-Champaign and a B.Sc. from Tel-Aviv university. Dafna's research focuses on helping people make sense of massive amounts of data. She has won a best research paper award at KDD 2010, a Microsoft Research Fellowship, a Siebel Scholarship, and a Magic Grant for innovative ideas.

Sublinear Optimization for Machine Learning

Date and Time

Thursday, April 3, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Elad Hazan, from Technion, Israel Institute of Technology

Host

Robert Schapire

In many modern optimization problems, particularly those arising in machine learning, the amount data is too large to apply standard convex optimization methods. We'll discuss new optimization algorithms that make use of randomization to prune the data produce a correct solution albeit running in time which is smaller than the data representation, i.e. sublinear running time. We'll present such sublinear-time algorithms for linear classification, support vector machine training, semi-definite programming and other optimization problems. These new algorithms are based on a primal-dual approach, and use a combination of novel sampling techniques and the randomized implementation of online learning algorithms. We'll describe information-theoretic lower bounds that show our running times to be nearly best possible in the unit-cost RAM model.

The talk will be self contained - no prior knowledge in convex optimization or machine learning is assumed.

Elad Hazan is an associate professor of operations research at the Technion, Israel Institute of Technology. His main research area is machine learning and its relationship to optimization, game theory and computational complexity. Elad received his Ph.D. in computer science from Princeton University. He is the recipient of several best paper awards including the Goldberg Best Paper award (twice), and the European Research Council starting grant.

Building systems that compute on encrypted data

Date and Time

Tuesday, February 18, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Raluca Popa, from Massachusetts Institute of Technology

Host

Ed Felten

Raluca Popa

Theft of confidential data is prevalent. In most applications, confidential data is stored at servers. Thus, existing systems naturally try to prevent adversaries from compromising these servers. However, experience has shown that adversaries still find a way to break in and steal the data.

In this talk, I will describe a new approach to protecting data confidentiality even when attackers get access to all server data: building practical systems that compute on encrypted data without access to the decryption key. In this setting, I designed and built a database system (CryptDB), a web application platform (Mylar), and two mobile systems, as well as developed new cryptographic schemes for them. I showed that these systems support a wide range of applications with low overhead. The talk will focus primarily on CryptDB and Mylar.

My work has already had impact: Google uses CryptDB’s design for their new Encrypted BigQuery service, and a medical application of Boston’s Newton-Wellesley hospital is secured with Mylar. Looking forward, this approach promises to solve privacy problems in other systems, such as big data or genomics systems.

Raluca Ada Popa is a PhD candidate at MIT working in security, systems, and applied cryptography. As part of her PhD work, she built practical systems that compute over encrypted data as well as designed new encryption schemes that underlie these systems. Raluca is the recipient of a Google PhD Fellowship for secure cloud computing, Johnson award for best CS Masters of Engineering thesis from MIT, and CRA Outstanding undergraduate award from the ACM. Raluca received her undergraduate degree from MIT with two BS degrees, in computer science and in mathematics.

Efficient learning with combinatorial structure

Date and Time

Tuesday, April 8, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Stefanie Jegelka, from University of California, Berkeley

Host

Jianxiong Xiao

Stefanie Jegelka

Learning from complex data such as images, text or biological measurements invariably relies on capturing long-range, latent structure. But the combinatorial structure inherent in real-world data can pose significant computational challenges for modeling, learning and inference.

In this talk, I will view these challenges through the lens of submodular set functions. Considered a "discrete analog of convexity", the combinatorial concept of submodularity captures intuitive yet nontrivial dependencies between variables and underlies many widely used concepts in machine learning. Practical use of submodularity, however, requires care. My first example illustrates how to efficiently handle the important class of submodular composite models. The second example combines submodularity and graphs for a new family of combinatorial models that express long-range interactions while still admitting very efficient inference procedures. As a concrete application, our results enable effective realization of combinatorial sparsity priors on real data, significantly improving image segmentation results in settings where state-of-the-art methods fail. Motivated by good empirical results, we provide a detailed theoretical analysis and identify practically relevant properties that affect complexity and approximation quality of submodular optimization and learning problems.

Stefanie Jegelka is a postdoctoral researcher at UC Berkeley, working with Michael I. Jordan and Trevor Darrell. She received a Ph.D. in Computer Science from ETH Zurich in 2012, in collaboration with the Max Planck Institute for Intelligent Systems, and completed her studies for a Diploma in Bioinformatics with distinction at the University of Tuebingen (Germany) and the University of Texas at Austin. She was a fellow of the German National Academic Foundation (Studienstiftung) and its scientific college for life sciences, and has received a Google Anita Borg Europe Fellowship and an ICML Best Paper Award. She has also been a research visitor at Georgetown University Medical Center and Microsoft Research and has held workshops and tutorials on submodularity in machine learning.

Recursive Deep Learning for Modeling Compositional Meaning in Language

Date and Time

Tuesday, March 11, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Richard Socher, from Stanford University

Host

Sebastian Seung

Richard Socher

Great progress has been made in natural language processing thanks to many different algorithms, each often specific to one application. Most learning algorithms force language into simplified representations such as bag-of-words or fixed-sized windows or require human-designed features. I will introduce three models based on recursive neural networks that can learn linguistically plausible representations of language. These methods jointly learn compositional features and grammatical sentence structure for parsing or phrase level sentiment predictions. They can also be used to represent the visual meaning of a sentence which can be used to find images based on query sentences or to describe images with a more complex description than single object names.

Besides the state-of-the-art performance, the models capture interesting phenomena in language such as compositionality. For instance, people easily see that the "with" phrase in "eating spaghetti with a spoon" specifies a way of eating whereas in "eating spaghetti with some pesto" it specifies the dish. I show that my model solves these prepositional attachment problems well thanks to its distributed representations. In sentiment analysis, a new tensor-based recursive model learns different types of high level negation and how they can change the meaning of longer phrases with many positive words. They also learn that when contrastive conjunctions such as "but" are used the sentiment of the phrases following them usually dominates.

Richard Socher is a PhD student at Stanford working with Chris Manning and Andrew Ng. His research interests are machine learning for NLP and vision. He is interested in developing new deep learning models that learn useful features, capture compositional structure in multiple modalities and perform well across different tasks. He was awarded the 2011 Yahoo! Key Scientific Challenges Award, the Distinguished Application Paper Award at ICML 2011, a Microsoft Research PhD Fellowship in 2012 and a 2013 "Magic Grant" from the Brown Institute for Media Innovation.

Programming for Everyone: From Solvers to Solver-Aided Languages and Beyond

Date and Time

Monday, March 31, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Emina Torlak, from University of California, Berkeley

Host

David Walker

Emina Torlak

We live in a software-driven world. Software helps us communicate and collaborate; create art and music; and make discoveries in biological, physical, and social sciences. Yet the growing demand for new software, to solve new kinds of problems, remains largely unmet. Because programming is still hard, developer productivity is limited, and so is end-users' ability to program on their own.

In this talk, I present a new approach to constructing programs, which exploits advances in constraint solving to make programming easier for experts and more accessible to everyone else. The approach is based on two observations. First, much of everyday programming involves the use of domain-specific languages (DSLs) that are embedded, in the form of APIs and interpreters, into modern host languages (for example, JavaScript, Scala or Racket). Second, productivity tools based on constraint solvers (such as verification or synthesis) work best when specialized to a given domain. Rosette is a new kind of host language, designed for easy creation of DSLs that are equipped with solver-based tools. These Solver-Aided DSLs (SDSLs) use Rosette's symbolic virtual machine (SVM) to automate hard programming tasks, including verification, debugging, synthesis, and programming with angelic oracles. The SVM works by compiling SDSL programs to logical constraints understood by SMT solvers, and then translating the solver's output to counterexamples (in the case of verification), traces (in the case of angelic execution), or code snippets (in the case of synthesis and debugging). Rosette has hosted several new SDSLs, including imperative SDSLs for data-parallel and spatial programming; a functional SDSL for specifying executable semantics of secure stack machines; and a declarative SDSL for web scraping by example.

Emina Torlak is a researcher at U.C. Berkeley, working at the intersection of software engineering, formal methods, and programming languages. Her focus is on developing tools that help people build better software more easily. She received her B.Sc. (2003), M.Eng. (2004) and Ph.D. (2009) from MIT, where she developed Kodkod, an efficient SAT-based solver for relational logic. Kodkod has since been used in over 70 tools for verification, debugging, and synthesis of code and specifications. Emina has also worked on a wide range of domain-specific formal methods. She won an ACM SIGSOFT distinguished paper award for her work at LogicBlox, where she built a system for synthesizing massive data sets, used in testing of decision support applications. As a member of IBM Research, she led the development of a tool for bounded verification of memory models, enabling the first fully automatic analysis of the Java Memory Model. These experiences inspired her current research on solver-aided languages, which aims to reduce the effort of applying formal methods to new problem domains.

Determinism Is Not Enough: Making Parallel Programs Reliable with Stable Multithreading

Date and Time

Monday, April 28, 2014 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Junfeng Yang, from Columbia University

Host

Michael Freedman

Junfeng Yang

Our accelerating computational demand and the rise of multicore hardware have made parallel programs, especially shared-memory multithreaded programs, increasingly pervasive and critical. Yet, these programs remain extremely difﬁcult to write, test, analyze, debug, and verify. Conventional wisdom has attributed these difﬁculties to nondeterminism (i.e., repeated executions of the same program on the same input may show different behaviors), and researchers have recently dedicated much effort to bringing determinism into multithreading. In this talk, I argue that determinism is not as useful as commonly perceived: it is neither sufﬁcient nor necessary for reliability. We present our view on why multithreaded programs are difﬁcult to get right, describe a promising approach we call stable multithreading to dramatically improve reliability, and summarize our last four years’ research on building and applying stable multithreading systems.

Junfeng Yang's research (www.cs.columbia.edu/~junfeng) centers on making reliable and secure systems. He earned his PhD at Stanford, where he created eXplode, a general, lightweight system for effectively finding storage system errors. This work has led to an OSDI '04 best paper, numerous bug fixes to real systems such as the Linux kernel, and a featured article in Linux Weekly news. He worked at Microsoft Research, Silicon Valley from 2007-2008, extending eXplode to check production distributed systems. MoDist, the resultant system, is being transferred to Microsoft product groups. He's now co-directing the Software Systems Lab (ssl.cs.columbia.edu) at Columbia University, where his recent work on making reliable parallel programs---the Tern/Peregrine/Parrot stable multithreading systems---was featured in CACM, ACM Tech News, The Register, and many other sites. He won Sloan and AFOSR YIP both in 2012; and NSF CAREER in 2011.