Quick links

CS Department Colloquium Series

Modeling People from Billions of Photos

Date and Time
Wednesday, March 14, 2012 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Adam Finkelstein
Internet and personal photo collections now add up to over a trillion photos, with people being in most of them. The availability of so many photos presents a unique opportunity to model virtually the whole human population. Modeling humans is key to understanding how people interact with the environment, and to future human machine interfaces. In this talk, I will describe our work on 3D shape and motion estimation of a human face from large photo collections, as well as novel techniques for browsing those collections. This represents an exciting breakthrough towards modeling and visualizing any person just from their available photos. Part of this work is now included in Google’s Picasa.

Ira Kemelmacher-Shlizerman is a Postdoctoral researcher in the Department of Computer Science and Engineering at the University of Washington. She received her Ph.D in computer science and applied mathematics at the Weizmann Institute of Science in 2009. Dr. Kemelmacher-Shlizerman works in computer vision and graphics, with a particular interest in developing computational tools for modeling people from Internet and personal photo collections. Ira's recent research was covered by stories in New Scientist, CBS, Discovery News and others. She was also consulting for Google.

Machine Learning: Higher, Faster, Stronger

Date and Time
Tuesday, February 21, 2012 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Speaker
Ohad Shamir, from Microsoft Research New England
Host
David Blei
Over the past decade, machine learning has emerged as a major and highly influential discipline of computer science and engineering. As the scope and variety of its applications increase, it faces novel and increasingly challenging settings, which go beyond classical learning frameworks. In this talk, I will present two recent works which fall under this category. The first work introduces a new model of sequential decision making with partial information. The model interpolates between two well-known online learning settings ("experts" and multi-armed bandits), and trades-off between the information obtained per round and the total number of rounds required to reach the same performance. The second work discusses the problem of parallelizing gradient-based learning algorithms, which is increasingly important for web-scale applications, but is highly non-trivial as these algorithms are inherently sequential. We show how this can be done using a generic and simple protocol, prove its theoretical optimality, and substantiate its performance experimentally on large-scale data.

Ohad Shamir is a postdoctoral researcher at Microsoft Research New England. He joined Microsoft in 2010 after receiving a Ph.D. in computer science from the Hebrew university, advised by Prof. Naftali Tishby. His research focuses on machine learning, with emphasis on novel algorithms which combine practical applicability and theoretical insight. His work was recognized by several awards, such as the Hebrew University's Schlomiuk Ph.D. thesis prize, the COLT 2010 best paper award, and the Wolf foundation scholarship.

Computational approaches for the DNA sequencing data deluge

Date and Time
Tuesday, March 6, 2012 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Second-generation DNA sequencers are improving rapidly and are now capable of sequencing hundreds of billions of nucleotides of data in about a week for a few thousand dollars. Consequently, sequencing has become a common tool in many fields of life science. But with these developments comes a problem: growth in per-sequencer throughput is drastically outpacing growth in computer speed. As the throughput gap widens over time, the crucial research bottlenecks are increasingly computational: computing, storage, labor, power.

Along these lines, I will discuss collaborative scientific projects in epigenetics and gene expression profiling for which I provided novel computational methods in areas such as read alignment, text indexing, and data-intensive computing. I will also discuss a new set of methods for very time- and space-efficient alignment of sequencing reads: Bowtie and Bowtie 2. These tools build on the insight that the Burrows-Wheeler Transform and the FM Index, previously used for data compression and exact string matching, can be extended to facilitate fast and memory-efficient alignment of DNA sequences to long reference genomes such as the human genome.

Ben Langmead is a Research Associate in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He completed his Ph.D. in Computer Science in February 2012 at University of Maryland, advised by Steven L. Salzberg. His research addresses problems at the intersection of computer science and genomics, and he is the author of several open source software tools for analysis of high-throughput genomics data, including Bowtie, Bowtie 2, Crossbow and Myrna. His paper describing Bowtie won the Genome Biology award for outstanding paper published in 2009. At Johns Hopkins, he collaborates with biostatisticians, biomedical engineers, biologists, and other computer scientists to develop methods for analyzing second-generation DNA sequencing data.

Crowd-Powered Systems

Date and Time
Wednesday, February 22, 2012 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Speaker
Host
Rebecca Fiebrink
Crowd-powered systems combine computation with human intelligence, drawn from large groups of people connecting and coordinating online. These hybrid systems enable applications and experiences that neither crowds nor computation could support alone.

Unfortunately, crowd work is error-prone and slow, making it difficult to incorporate crowds as first-order building blocks in software systems. I introduce computational techniques that decompose complex tasks into simpler, verifiable steps to improve quality, and optimize work to return results in seconds. These techniques advance crowdsourcing into a platform that is reliable and responsive to the point where crowds can be used in interactive systems.

In this talk, I will present two crowd-powered systems to illustrate these ideas. The first, Soylent, is a word processor that uses paid micro-contributions to aid writing tasks such as text shortening and proofreading. Using Soylent is like having access to an entire editorial staff as you write. The second system, Adrenaline, is a camera that uses crowds to help amateur photographers capture the exact right moment for a photo. It finds the best smile and catches subjects in mid-air jumps, all in realtime. These systems point to a future where social and crowd intelligence are central elements of interaction, software, and computation.

Michael Bernstein is a PhD candidate in Computer Science at the Massachusetts Institute of Technology. His research in human-computer interaction focuses on crowdsourcing and social computing systems. He has been awarded Best Student Paper at UIST 2010, Best Paper at ICWSM 2011, the NSF graduate research fellowship and the Microsoft Research PhD fellowship. His work has appeared in venues like the New York Times, Slate, CNN and The Atlantic. He earned his masters in Computer Science at MIT, and a bachelors degree in Symbolic Systems at Stanford University.

Porting the Computer Science Toolbox to Game Theory and Economics

Date and Time
Wednesday, February 8, 2012 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Moses Charikar
Theoretical computer science has brought new ideas and techniques to game and economic theory. A primary signature of the computer science approach is {\em approximation} --- the idea of building credibility for a proposed solution by proving that its performance is always within a small factor of an ideal (and typically unimplementable) solution. We explain two of our recent contributions in this area, one motivated by networks and one by auctions.

We first discuss the "price of anarchy": how well does decentralized (or "selfish") behavior approximates centralized optimization? This concept has been analyzed in many applications, including network routing, resource allocation, network formation, health care, and even models of basketball. We highlight a new theory of robust price of anarchy bounds, which apply even to systems that are not in equilibrium.

Second, we consider auction design: for example, what selling procedure should be used to maximize the revenue of a seller? On the analysis side, we highlight a new framework that explicitly connects average-case (i.e., Bayesian) analysis, the dominant paradigm in economics, with the worst-case analysis approach common in computer science. On the design side, we provide a distribution-independent auction that performs, for a wide class of input distributions, almost as well as the distribution-specific optimal auction.

Tim Roughgarden received his Ph.D. from Cornell University in 2002 and joined the Stanford CS department in 2004, where he is currently an associate professor. His research interests are in theoretical computer science, especially its interfaces with game theory and networks. He wrote the book "Selfish Routing and the Price of Anarchy" (MIT Press, 2005) and co-edited the book "Algorithmic Game Theory", with Nisan, Tardos, and Vazirani (Cambridge, 2007). His significant awards include the 2002 ACM Doctoral Dissertation Award (Honorable Mention), the 2003 Tucker Prize, the 2003 INFORMS Optimization Prize for Young Researchers, speaking at the 2006 International Congress of Mathematicians, a 2007 PECASE Award, the 2008 Shapley Lectureship of the Game Theory Society, and the 2009 ACM Grace Murray Hopper Award. He's currently developing a free online course on the design and analysis of algorithms, which has over 50,000 students.

Expressive Interaction and the Evaluation of Creativity Support

Date and Time
Monday, December 5, 2011 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Rebecca Fiebrink
Visionaries in Computer Science have long seen the computer as a tool to augment our intellect. However, while it is relatively straightforward to measure the impact of a tool or technique on task efficiency for well-defined tasks, it is much more difficult to measure computers' impact on higher-level cognitive processes, such as creativity. In my own research in Human-Computer Interaction, I create novel interaction techniques, but run up against the problem of trying to evaluate how these tools impact higher-level processes such as creativity, expressiveness and exploration. In this talk, I briefly present a variety of interaction techniques that I have developed, and I then describe a new survey metric, the Creativity Support Index (CSI), that we have developed to help researchers and designers evaluate the level of creativity support provided by these types of systems, tools or interfaces. I will then present some current results using EEG and machine learning to classify the creative experience with more specific, temporal granularity. I present this work within the context of my longer term goal to develop a suite of tools that provide both stronger analytical power and a general framework for evaluating computational support for creative activities, engagement and aesthetic experience.

Dr. Celine Latulipe has a PhD in Computer Science from the University of Waterloo in Canada. She is an Assistant Professor of Human-Computer Interaction in the Department of Software and Information Systems in the College of Computing and Informatics at UNC Charlotte. Dr. Latulipe has long been fascinated by two-handed interaction in the real world, and the absence of it in the human-computer interface. She has developed numerous individual and collaborative two-handed interaction techniques and these have blossomed into a more general exploration of creative expression. Dr. Latulipe works on projects with choreographers, dancers, artists and theatre producers to better understand creative work in practice and how technology may play a role in supporting and evaluating creative work practices. Currently, Dr. Latulipe is working on many projects, including the NSF CreativeIT-funded Dance.Draw project, which has received national media attention.

C++11 Style

Date and Time
Wednesday, November 30, 2011 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Brian Kernighan
We know how to write bad code: Litter our programs with casts, macros, pointers, naked new and deletes, and complicated control structures. Alternatively (or in addition), obscure every design decision in a mess of deeply nested abstractions using the latest object-oriented programming and generic programming tricks. For good measure, complicate our algorithms with interesting special cases. Such code is incomprehensible, unmaintainable, usually inefficient, and not uncommon.

But how do we write good code? What principles, techniques, and idioms can we exploit to make it easier to produce quality code? I will make an argument for type-rich interfaces, compact data structures, integrated resource management and error handling, and highly-structured algorithmic code. I will illustrate my ideas and motivate my guidelines with a few idiomatic code examples.

I will use C++11 freely. Examples include auto, general constant expressions, uniform initialization, type aliases, type safe threading, and user-defined literals. C++ features are only just starting to appear in production compilers, so some of my suggestions have the nature of conjecture. However, developing a “modern style” is essential if we don’t want to maintain newly-written 1970s and 1980s style code in 2020.

This presentation reflects my thoughts on what “Modern C++” should mean in the 2010s: a language for programming based on light-weight abstraction with a direct and efficient mapping to hardware, suitable for infrastructure code.

Bjarne Stroustrup is the designer and original implementer of C++ and the author of Programming -- Principles and Practice using C++, The C++ Programming Language, The Design and Evolution of C++, and many other publications.

His research interests include distributed systems, design, programming techniques, software development tools, and programming languages. He is actively involved in the ANSI/ISO standardization of C++.

Dr. Stroustrup is a Distinguished Professor and holder of the College of Engineering Chair in Computer Science at Texas A&M University. He retains a link with AT&T Labs - Research as an AT&T Fellow. He has received numerous honors, including an honorary professorship in the University of Aarhus, Dr. Dobb's Excellence in Programming award, the William Procter Prize for Scientific Achievement from Sigma Xi, the IEEE Computer Society's Computer Entrepreneur Award, and the ACM Grace Murray Hopper award. He is a member of the National Academy of Engineering and an ACM and IEEE Fellow.

He received his Cand. Scient. (Mathematics and Computer Science) 1975, University of Aarhus Denmark, and Ph.D. (Computer Science) 1979, Cambridge University, England.

Automating End-user Programming and Education using Program Synthesis

Date and Time
Monday, November 21, 2011 - 1:30pm to 2:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
David Walker
Recent research in program synthesis has made it possible to effectively synthesize small programs in a variety of domains. In this talk, I will describe two useful applications of this technology that have the potential to influence daily lives of billions of people. One application involves automating end-user programming using examples or keywords, which can allow non-programmers to effectively use computational devices such as computers, smartphones (and in the future robots) to perform a variety of repetitive tasks. Another application involves building intelligent tutoring systems that can help teachers and students with a variety of educational activities such as synthesizing problems, hints, solutions in various domains including math, science, and programming.

Sumit Gulwani is a senior researcher in the RiSE group at Microsoft Research, Redmond. His primary research interest is in the area of program synthesis with applications to automating end user programming and building intelligent tutoring systems. Sumit obtained his PhD in computer science from UC-Berkeley in 2005, and was awarded the C.V. Ramamoorthy Award and the ACM SIGPLAN Outstanding Doctoral Dissertation Award. He obtained his BTech in computer science and engineering from the Indian Institute of Technology (IIT) Kanpur in 2000 and was awarded the President's Gold Medal.

CryptDB: Protecting Confidentiality with Encrypted Query Processing

Date and Time
Thursday, November 17, 2011 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Michael Freedman
Online applications are vulnerable to theft of sensitive information because adversaries can exploit software bugs to gain access to private data, and because curious or malicious administrators may capture and leak data. In this talk, I will describe CryptDB, a system that provides practical and provable confidentiality in the face of these attacks for applications backed by SQL databases. It works by executing SQL queries over encrypted data using a collection of efficient SQL-aware encryption schemes. CryptDB can also chain encryption keys to user passwords, so that a data item can be decrypted only by using the password of one of the users with access to that data. As a result, a database administrator never gets access to decrypted data, and even if all servers are compromised, an adversary cannot decrypt the data of any user who is not logged in. An analysis of a trace of 126 million SQL queries from a production MySQL server shows that CryptDB can support operations over encrypted data for 99.5% of the 128,840 columns seen in the trace. Our evaluation shows that CryptDB has low overhead, reducing throughput by 14.5% for phpBB, a web forum application, and by 26% for queries from TPC-C, compared to unmodified MySQL.

Nickolai Zeldovich is Douglas T. Ross Career Development Assistant Professor at MIT's EECS department and a member of the Computer Science and Artificial Intelligence Laboratory. His research interests are in building practical secure systems, from operating systems and hardware to programming languages and security analysis tools. He received his PhD from Stanford University in 2008, where he developed HiStar, an operating system designed to minimize the amount of trusted code by controlling information flow. In 2005, he co-founded MokaFive, a company focused on improving desktop management and mobility using x86 virtualization. Prof. Zeldovich received a Sloan fellowship in 2010, and an NSF CAREER award in 2011.

Uncertainty in Natural Image Segmentation

Date and Time
Wednesday, November 9, 2011 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
David Blei
We explore nonparametric Bayesian statistical models for image partitions which coherently model uncertainty in the size, shape, and structure of human image interpretations. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman-Yor (PY) process. This generalization of the Dirichlet process leads to segmentation algorithms which automatically adapt their resolution to each image. Generalizing previous applications of PY priors, we use non-Markov Gaussian processes (GPs) to infer spatially contiguous segments which respect image boundaries. We show how GP covariance functions can be calibrated to accurately match the statistics of human segmentations, and that robust posterior inference is possible via a variational method, expectation propagation. The resulting method produces highly accurate segmentations of complex scenes, and hypothesizes multiple image partitions to capture the variability inherent in human scene interpretations.
Follow us: Facebook Twitter Linkedin