CS Department Colloquium Series | Page 30 | Computer Science Department at Princeton University

Software Testing and Verification with Grammatical Inference

Date and Time

Monday, March 12, 2012 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Domagoj Babic, from University of California, Berkeley

Host

David Walker

Recent progress in automated deductive reasoning techniques has triggered a small scientific revolution in automated software testing, verification, and security. Indeed, it is becoming increasingly difficult to find new automated software analysis papers that do not use or assume existence of propositional satisfiability and satisfiability-modulo theories solvers.

In this talk, I will discuss the first results of my efforts to answer a simple question: What kind of automated reasoning will trigger the next scientific revolution in automated software analysis? As a potential candidate, I will present grammatical inference (GI), a class of techniques for learning formal languages. By inductively generalizing from examples, GI techniques nicely complement the capabilities of existing (deductive) approaches. Furthermore, the inferred grammars (or automata) are themselves analyzable objects, which can be used as approximations or abstractions of the behavior of the analyzed program.

In this talk, I'll analyze several open problems, namely (1) test generation for programs whose inputs have to adhere to complex grammars, and (2) automated verification of web sanitizers, and offer some solutions. The common trait of the above mentioned problems is that they require exact or approximate understanding of the relation between program's input and output. I'll explain how GI techniques can infer such relations and serve as a basis for inventing new powerful software analyses. In particular, I'll show that GI-based automated testing can find six times more vulnerabilities than the state-of-the-art methods, increasing code coverage by up to 58%. The novel symbolic learning technique that I'll present learns input-output relations of web sanitizers (for subsequent verification) in seconds, while construction of the same relations used to take several hours of a human expert's time in the past.

Domagoj Babic received his Dipl.Ing. in Electrical Engineering and M.Sc. in Computer Science from the Zagreb University (Faculty of Electrical Engineering and Computing) in 2001 and 2003. He received his Ph.D. in Computer Science in 2008 from the University of British Columbia. After spending a bit more than a year in industry, he joined UC Berkeley, as a research scientist.

Domagoj's research focuses on various aspects of software analysis (software verification, testing, and security), decision procedures, grammatical inference, and applied formal methods in general.

He is a recipient of the Canada's NSERC PDF Research Fellowship (2010-2012), Microsoft Graduate Research Fellowship (2005-2007), Fulbright Fellowship (declined to attend UBC), and several awards at international programming competitions (1st place at the 2007 Satisfiability Modulo Theories competition in the bit-vector arithmetic category and 3rd place at the 2005 Satisfiability Testing competition in the satisfiable-crafted instances category).

For more, see:http://www.domagoj.info/

Modeling People from Billions of Photos

Date and Time

Wednesday, March 14, 2012 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Ira Kemelmacher-Shlizerman, from University of Washington

Host

Adam Finkelstein

Internet and personal photo collections now add up to over a trillion photos, with people being in most of them. The availability of so many photos presents a unique opportunity to model virtually the whole human population. Modeling humans is key to understanding how people interact with the environment, and to future human machine interfaces. In this talk, I will describe our work on 3D shape and motion estimation of a human face from large photo collections, as well as novel techniques for browsing those collections. This represents an exciting breakthrough towards modeling and visualizing any person just from their available photos. Part of this work is now included in Google’s Picasa.

Ira Kemelmacher-Shlizerman is a Postdoctoral researcher in the Department of Computer Science and Engineering at the University of Washington. She received her Ph.D in computer science and applied mathematics at the Weizmann Institute of Science in 2009. Dr. Kemelmacher-Shlizerman works in computer vision and graphics, with a particular interest in developing computational tools for modeling people from Internet and personal photo collections. Ira's recent research was covered by stories in New Scientist, CBS, Discovery News and others. She was also consulting for Google.

Machine Learning: Higher, Faster, Stronger

Date and Time

Tuesday, February 21, 2012 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Ohad Shamir, from Microsoft Research New England

Host

David Blei

Over the past decade, machine learning has emerged as a major and highly influential discipline of computer science and engineering. As the scope and variety of its applications increase, it faces novel and increasingly challenging settings, which go beyond classical learning frameworks. In this talk, I will present two recent works which fall under this category. The first work introduces a new model of sequential decision making with partial information. The model interpolates between two well-known online learning settings ("experts" and multi-armed bandits), and trades-off between the information obtained per round and the total number of rounds required to reach the same performance. The second work discusses the problem of parallelizing gradient-based learning algorithms, which is increasingly important for web-scale applications, but is highly non-trivial as these algorithms are inherently sequential. We show how this can be done using a generic and simple protocol, prove its theoretical optimality, and substantiate its performance experimentally on large-scale data.

Ohad Shamir is a postdoctoral researcher at Microsoft Research New England. He joined Microsoft in 2010 after receiving a Ph.D. in computer science from the Hebrew university, advised by Prof. Naftali Tishby. His research focuses on machine learning, with emphasis on novel algorithms which combine practical applicability and theoretical insight. His work was recognized by several awards, such as the Hebrew University's Schlomiuk Ph.D. thesis prize, the COLT 2010 best paper award, and the Wolf foundation scholarship.

Computational approaches for the DNA sequencing data deluge

Date and Time

Tuesday, March 6, 2012 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Ben Langmead, from University of Maryland College Park

Second-generation DNA sequencers are improving rapidly and are now capable of sequencing hundreds of billions of nucleotides of data in about a week for a few thousand dollars. Consequently, sequencing has become a common tool in many fields of life science. But with these developments comes a problem: growth in per-sequencer throughput is drastically outpacing growth in computer speed. As the throughput gap widens over time, the crucial research bottlenecks are increasingly computational: computing, storage, labor, power.

Along these lines, I will discuss collaborative scientific projects in epigenetics and gene expression profiling for which I provided novel computational methods in areas such as read alignment, text indexing, and data-intensive computing. I will also discuss a new set of methods for very time- and space-efficient alignment of sequencing reads: Bowtie and Bowtie 2. These tools build on the insight that the Burrows-Wheeler Transform and the FM Index, previously used for data compression and exact string matching, can be extended to facilitate fast and memory-efficient alignment of DNA sequences to long reference genomes such as the human genome.

Ben Langmead is a Research Associate in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He completed his Ph.D. in Computer Science in February 2012 at University of Maryland, advised by Steven L. Salzberg. His research addresses problems at the intersection of computer science and genomics, and he is the author of several open source software tools for analysis of high-throughput genomics data, including Bowtie, Bowtie 2, Crossbow and Myrna. His paper describing Bowtie won the Genome Biology award for outstanding paper published in 2009. At Johns Hopkins, he collaborates with biostatisticians, biomedical engineers, biologists, and other computer scientists to develop methods for analyzing second-generation DNA sequencing data.

Crowd-Powered Systems

Date and Time

Wednesday, February 22, 2012 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Michael Bernstein, from MIT

Host

Rebecca Fiebrink

Crowd-powered systems combine computation with human intelligence, drawn from large groups of people connecting and coordinating online. These hybrid systems enable applications and experiences that neither crowds nor computation could support alone.

Unfortunately, crowd work is error-prone and slow, making it difficult to incorporate crowds as first-order building blocks in software systems. I introduce computational techniques that decompose complex tasks into simpler, verifiable steps to improve quality, and optimize work to return results in seconds. These techniques advance crowdsourcing into a platform that is reliable and responsive to the point where crowds can be used in interactive systems.

In this talk, I will present two crowd-powered systems to illustrate these ideas. The first, Soylent, is a word processor that uses paid micro-contributions to aid writing tasks such as text shortening and proofreading. Using Soylent is like having access to an entire editorial staff as you write. The second system, Adrenaline, is a camera that uses crowds to help amateur photographers capture the exact right moment for a photo. It finds the best smile and catches subjects in mid-air jumps, all in realtime. These systems point to a future where social and crowd intelligence are central elements of interaction, software, and computation.

Michael Bernstein is a PhD candidate in Computer Science at the Massachusetts Institute of Technology. His research in human-computer interaction focuses on crowdsourcing and social computing systems. He has been awarded Best Student Paper at UIST 2010, Best Paper at ICWSM 2011, the NSF graduate research fellowship and the Microsoft Research PhD fellowship. His work has appeared in venues like the New York Times, Slate, CNN and The Atlantic. He earned his masters in Computer Science at MIT, and a bachelors degree in Symbolic Systems at Stanford University.

Porting the Computer Science Toolbox to Game Theory and Economics

Date and Time

Wednesday, February 8, 2012 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Tim Roughgarden, from Stanford University

Host

Moses Charikar

Theoretical computer science has brought new ideas and techniques to game and economic theory. A primary signature of the computer science approach is {\em approximation} --- the idea of building credibility for a proposed solution by proving that its performance is always within a small factor of an ideal (and typically unimplementable) solution. We explain two of our recent contributions in this area, one motivated by networks and one by auctions.

We first discuss the "price of anarchy": how well does decentralized (or "selfish") behavior approximates centralized optimization? This concept has been analyzed in many applications, including network routing, resource allocation, network formation, health care, and even models of basketball. We highlight a new theory of robust price of anarchy bounds, which apply even to systems that are not in equilibrium.

Second, we consider auction design: for example, what selling procedure should be used to maximize the revenue of a seller? On the analysis side, we highlight a new framework that explicitly connects average-case (i.e., Bayesian) analysis, the dominant paradigm in economics, with the worst-case analysis approach common in computer science. On the design side, we provide a distribution-independent auction that performs, for a wide class of input distributions, almost as well as the distribution-specific optimal auction.

Tim Roughgarden received his Ph.D. from Cornell University in 2002 and joined the Stanford CS department in 2004, where he is currently an associate professor. His research interests are in theoretical computer science, especially its interfaces with game theory and networks. He wrote the book "Selfish Routing and the Price of Anarchy" (MIT Press, 2005) and co-edited the book "Algorithmic Game Theory", with Nisan, Tardos, and Vazirani (Cambridge, 2007). His significant awards include the 2002 ACM Doctoral Dissertation Award (Honorable Mention), the 2003 Tucker Prize, the 2003 INFORMS Optimization Prize for Young Researchers, speaking at the 2006 International Congress of Mathematicians, a 2007 PECASE Award, the 2008 Shapley Lectureship of the Game Theory Society, and the 2009 ACM Grace Murray Hopper Award. He's currently developing a free online course on the design and analysis of algorithms, which has over 50,000 students.

Expressive Interaction and the Evaluation of Creativity Support

Date and Time

Monday, December 5, 2011 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Celine Latulipe, from University of North Carolina at Charlotte

Host

Rebecca Fiebrink

Visionaries in Computer Science have long seen the computer as a tool to augment our intellect. However, while it is relatively straightforward to measure the impact of a tool or technique on task efficiency for well-defined tasks, it is much more difficult to measure computers' impact on higher-level cognitive processes, such as creativity. In my own research in Human-Computer Interaction, I create novel interaction techniques, but run up against the problem of trying to evaluate how these tools impact higher-level processes such as creativity, expressiveness and exploration. In this talk, I briefly present a variety of interaction techniques that I have developed, and I then describe a new survey metric, the Creativity Support Index (CSI), that we have developed to help researchers and designers evaluate the level of creativity support provided by these types of systems, tools or interfaces. I will then present some current results using EEG and machine learning to classify the creative experience with more specific, temporal granularity. I present this work within the context of my longer term goal to develop a suite of tools that provide both stronger analytical power and a general framework for evaluating computational support for creative activities, engagement and aesthetic experience.

Dr. Celine Latulipe has a PhD in Computer Science from the University of Waterloo in Canada. She is an Assistant Professor of Human-Computer Interaction in the Department of Software and Information Systems in the College of Computing and Informatics at UNC Charlotte. Dr. Latulipe has long been fascinated by two-handed interaction in the real world, and the absence of it in the human-computer interface. She has developed numerous individual and collaborative two-handed interaction techniques and these have blossomed into a more general exploration of creative expression. Dr. Latulipe works on projects with choreographers, dancers, artists and theatre producers to better understand creative work in practice and how technology may play a role in supporting and evaluating creative work practices. Currently, Dr. Latulipe is working on many projects, including the NSF CreativeIT-funded Dance.Draw project, which has received national media attention.

C++11 Style

Date and Time

Wednesday, November 30, 2011 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Bjarne Stroustrup, from Professor, Texas A&M University; Visiting Fellow, Princeton University

Host

Brian Kernighan

We know how to write bad code: Litter our programs with casts, macros, pointers, naked new and deletes, and complicated control structures. Alternatively (or in addition), obscure every design decision in a mess of deeply nested abstractions using the latest object-oriented programming and generic programming tricks. For good measure, complicate our algorithms with interesting special cases. Such code is incomprehensible, unmaintainable, usually inefficient, and not uncommon.

But how do we write good code? What principles, techniques, and idioms can we exploit to make it easier to produce quality code? I will make an argument for type-rich interfaces, compact data structures, integrated resource management and error handling, and highly-structured algorithmic code. I will illustrate my ideas and motivate my guidelines with a few idiomatic code examples.

I will use C++11 freely. Examples include auto, general constant expressions, uniform initialization, type aliases, type safe threading, and user-defined literals. C++ features are only just starting to appear in production compilers, so some of my suggestions have the nature of conjecture. However, developing a “modern style” is essential if we don’t want to maintain newly-written 1970s and 1980s style code in 2020.

This presentation reflects my thoughts on what “Modern C++” should mean in the 2010s: a language for programming based on light-weight abstraction with a direct and efficient mapping to hardware, suitable for infrastructure code.

Bjarne Stroustrup is the designer and original implementer of C++ and the author of Programming -- Principles and Practice using C++, The C++ Programming Language, The Design and Evolution of C++, and many other publications.

His research interests include distributed systems, design, programming techniques, software development tools, and programming languages. He is actively involved in the ANSI/ISO standardization of C++.

Dr. Stroustrup is a Distinguished Professor and holder of the College of Engineering Chair in Computer Science at Texas A&M University. He retains a link with AT&T Labs - Research as an AT&T Fellow. He has received numerous honors, including an honorary professorship in the University of Aarhus, Dr. Dobb's Excellence in Programming award, the William Procter Prize for Scientific Achievement from Sigma Xi, the IEEE Computer Society's Computer Entrepreneur Award, and the ACM Grace Murray Hopper award. He is a member of the National Academy of Engineering and an ACM and IEEE Fellow.

He received his Cand. Scient. (Mathematics and Computer Science) 1975, University of Aarhus Denmark, and Ph.D. (Computer Science) 1979, Cambridge University, England.

Automating End-user Programming and Education using Program Synthesis

Date and Time

Monday, November 21, 2011 - 1:30pm to 2:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Sumit Gulwani, from Microsoft Research

Host

David Walker

Recent research in program synthesis has made it possible to effectively synthesize small programs in a variety of domains. In this talk, I will describe two useful applications of this technology that have the potential to influence daily lives of billions of people. One application involves automating end-user programming using examples or keywords, which can allow non-programmers to effectively use computational devices such as computers, smartphones (and in the future robots) to perform a variety of repetitive tasks. Another application involves building intelligent tutoring systems that can help teachers and students with a variety of educational activities such as synthesizing problems, hints, solutions in various domains including math, science, and programming.

Sumit Gulwani is a senior researcher in the RiSE group at Microsoft Research, Redmond. His primary research interest is in the area of program synthesis with applications to automating end user programming and building intelligent tutoring systems. Sumit obtained his PhD in computer science from UC-Berkeley in 2005, and was awarded the C.V. Ramamoorthy Award and the ACM SIGPLAN Outstanding Doctoral Dissertation Award. He obtained his BTech in computer science and engineering from the Indian Institute of Technology (IIT) Kanpur in 2000 and was awarded the President's Gold Medal.

CryptDB: Protecting Confidentiality with Encrypted Query Processing

Date and Time

Thursday, November 17, 2011 - 4:30pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

CS Department Colloquium Series

Speaker

Nickolai Zeldovich, from Massachusetts Institute of Technology

Host

Michael Freedman

Online applications are vulnerable to theft of sensitive information because adversaries can exploit software bugs to gain access to private data, and because curious or malicious administrators may capture and leak data. In this talk, I will describe CryptDB, a system that provides practical and provable confidentiality in the face of these attacks for applications backed by SQL databases. It works by executing SQL queries over encrypted data using a collection of efficient SQL-aware encryption schemes. CryptDB can also chain encryption keys to user passwords, so that a data item can be decrypted only by using the password of one of the users with access to that data. As a result, a database administrator never gets access to decrypted data, and even if all servers are compromised, an adversary cannot decrypt the data of any user who is not logged in. An analysis of a trace of 126 million SQL queries from a production MySQL server shows that CryptDB can support operations over encrypted data for 99.5% of the 128,840 columns seen in the trace. Our evaluation shows that CryptDB has low overhead, reducing throughput by 14.5% for phpBB, a web forum application, and by 26% for queries from TPC-C, compared to unmodified MySQL.

Nickolai Zeldovich is Douglas T. Ross Career Development Assistant Professor at MIT's EECS department and a member of the Computer Science and Artificial Intelligence Laboratory. His research interests are in building practical secure systems, from operating systems and hardware to programming languages and security analysis tools. He received his PhD from Stanford University in 2008, where he developed HiStar, an operating system designed to minimize the amount of trusted code by controlling information flow. In 2005, he co-founded MokaFive, a company focused on improving desktop management and mobility using x86 virtualization. Prof. Zeldovich received a Sloan fellowship in 2010, and an NSF CAREER award in 2011.