Quick links

CS Department Colloquium Series

Interactive ML for People: The Small Data Problem

Date and Time
Wednesday, November 19, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Barbara Engelhardt

 

Emma Brunskill

Consider an intelligent tutoring system or an autonomous decision support tool for a doctor. Though such systems may in aggregate have a huge amount of data, the data collected for a single individual is typically very small, and the policy space (of what to next teach a student or how to help treat a patient) is enormous.

I will describe two machine learning efforts to tackle these small data challenges: learning across multiple tasks, and better use of previously collected task data, where tasks in both cases involve
sequential stochastic decision processes (reinforcement learning and bandits). I will also present results of how one of these techniques allowed us to substantially increase engagement in an educational game
to teach fractions.

Emma Brunskill is an assistant professor in the computer science department at Carnegie Mellon University. She is also affiliated with the machine learning department at CMU. She works on reinforcement learning, focusing on applications that involve artificial agents interacting with people, such as intelligent tutoring systems. She is a Rhodes Scholar, Microsoft Faculty Fellow and NSF CAREER award recipient, and her work has received best paper nominations in Education Data Mining (2012, 2013) and CHI (2014).

Flexible, Reliable, and Scalable Nonparametric Learning

Date and Time
Monday, November 24, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Barbara Engelhardt

Erik Sudderth

Erik Sudderth

Applications of statistical machine learning increasingly involve datasets with rich hierarchical, temporal, spatial, or relational structure.  Bayesian nonparametric models offer the promise of effective learning from big datasets, but standard inference algorithms often fail in subtle and hard-to-diagnose ways.  We explore this issue via variants of a popular and general model family, the hierarchical Dirichlet process.  We propose a framework for "memoized" online optimization of variational learning objectives, which achieves computational scalability by processing local batches of data, while simultaneously adapting the global model structure in a coherent fashion.  Using this approach, we build improved models of text, audio, image, and social network data.

Erik B. Sudderth is an Assistant Professor in the Brown University Department of Computer Science.  He received the Bachelor's degree (summa cum laude, 1999) in Electrical Engineering from the University of California, San Diego, and the Master's and Ph.D. degrees (2006) in EECS from the Massachusetts Institute of Technology.  His research interests include probabilistic graphical models; nonparametric Bayesian methods; and applications of statistical machine learning in computer vision and the sciences.  He received an NSF CAREER award in 2014, and in 2008 was named one of "AI's 10 to Watch" by IEEE Intelligent Systems Magazine.

Better Science Through Better Bayesian Computation

Date and Time
Thursday, November 13, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Barbara Engelhardt

Ryan Adams

Ryan Adams

As we grapple with the hype of "big data" in computer science, it is important to remember that the data are not the central objects: we collect data to answer questions and inform decisions in science, engineering, policy, and beyond.  In this talk, I will discuss my work in developing tools for large-scale data analysis, and the scientific collaborations in neuroscience, chemistry, and astronomy that motivate me and keep this work grounded.  I will focus on two lines of research that I believe capture an important dichotomy in my work and in modern probabilistic modeling more generally: identifying the "best" hypothesis versus incorporating hypothesis uncertainty.  In the first case, I will discuss my recent work in Bayesian optimization, which has become the state-of-the-art technique for automatically tuning machine learning algorithms, finding use across academia and industry. In the second case, I will discuss scalable Markov chain Monte Carlo and the new technique of Firefly Monte Carlo, which is the first provably correct MCMC algorithm that can take advantage of subsets of data.

Ryan Adams is an Assistant Professor of Computer Science at Harvard University, in the School of Engineering and Applied Sciences. He leads the Harvard Intelligent Probabilistic Systems group, whose research focuses on machine learning and computational statistics, with applied collaborations across the sciences.  Ryan received his undergraduate training in EECS at MIT and completed his Ph.D. in Physics at Cambridge University as a Gates Cambridge Scholar under David MacKay.  He was a CIFAR Junior Research Fellow at the University of Toronto before joining the faculty at Harvard.  His Ph.D. thesis received Honorable Mention for the Leonard J. Savage Award for Bayesian Theory and Methods from the International Society for Bayesian Analysis.  Ryan has won paper awards at ICML, AISTATS, and UAI, and received the DARPA Young Faculty Award.
 

Decentralized Anonymous Credentials and Electronic Payments from Bitcoin

Date and Time
Wednesday, November 5, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Ed Felten

Traditionally, making statements about identity on the Internet, whether literal assertions of identity or statements about one’s identity, requires centralized providers who issue credentials attesting to the user’s information. These organizations, which include Certificate Authorities, DNS maintainers, or login providers like Google and Facebook, play a large role in securing internet infrastructure, email, and financial transactions. Our increasing reliance on these providers raises concerns about privacy and trust. 

Anonymous credentials represent a powerful solution to this privacy concern: they deprive even colluding credential issuers and verifiers of the ability to identify and track their users. Although credentials may involve direct assertions of identity, they may also be used for a large range of useful assertions, such as “my TPM says my computer is secure”, “I have a valid subscription for content”, or “I am eligible to vote.” Anonymous credentials can also be used as a basis for constructing untraceable electronic payment systems, or “e-cash".

Unfortunately most existing anonymous credential and e-cash systems have a fundamental limitation: they require the appointment of a central, trusted party to issue credentials or tokens. This issuer represents a single point of failure and an obvious target for compromise. In distributed settings such as ad hoc or peer-to-peer networks, it may be challenging even to identify parties who can be trusted to play this critical role.

In this talk I will discuss new techniques for building anonymous credentials and electronic cash in a fully decentralized setting. The basic ingredient of these proposals is a "distributed public append-only ledger", a technology which has most famously been deployed in digital currencies such as Bitcoin. This ledger can be employed by individual nodes to make assertions about a user’s attributes in a fully anonymous fashion — without the assistance of a credential issuer. One concrete result of these techniques is a new protocol named “Zerocash”, which adds cryptographically unlinkable electronic payments to the Bitcoin currency.

Prof. Matthew Green is a Research Professor at the Johns Hopkins University Information Security Institute. His research focus is on cryptographic techniques for maintaining users’ privacy, and on technologies that enable the deployment of privacy-preserving protocols. From 2004-2011, Green served as CTO of Independent Security Evaluators, a custom security evaluation firm with a global client base. Along with a team at Johns Hopkins and RSA Laboratories, he discovered flaws in the Texas Instruments Digital Signature Transponder, a cryptographically-enabled RFID device used in the Exxon Speedpass payment system and in millions of vehicle immobilizers.

Machine Learning for Robots: Perception, Planning and Motor Control

Date and Time
Monday, November 3, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Sebastian Seung

Daniel Lee

Daniel Lee

Machines today excel at seemingly complex games such as chess and Jeopardy, yet still struggle with basic perceptual, planning, and motor tasks in the physical world.  What are the appropriate representations needed to execute and adapt robust behaviors in real-time?  I will present some examples of learning algorithms from my group that have been applied to robots for monocular visual odometry, high-dimensional trajectory planning, and legged locomotion. These algorithms employ a variety of techniques central to machine learning: dimensionality reduction, online learning, and reinforcement learning.  I will show and discuss applications of these algorithms to autonomous vehicles and humanoid robots.

Daniel Lee on Comedy Central's 'The Colbert Report' October 28 2010

Daniel Lee is the Evan C Thompson Term Chair, Raymond S. Markowitz Faculty Fellow, and Professor in the School of Engineering and Applied Science at the University of Pennsylvania. He received his B.A. summa cum laude in Physics from Harvard University in 1990 and his Ph.D. in Condensed Matter Physics from the Massachusetts Institute of Technology in 1995.  Before coming to Penn, he was a researcher at AT&T and Lucent Bell Laboratories in the Theoretical Physics and Biological Computation departments.  He is a Fellow of the IEEE and has received the National Science Foundation CAREER award and the University of Pennsylvania Lindback award for distinguished teaching. He was also a fellow of the Hebrew University Institute of Advanced Studies in Jerusalem, an affiliate of the Korea Advanced Institute of Science and Technology, and organized the US-Japan National Academy of Engineering Frontiers of Engineering symposium.  As director of the GRASP Robotics Laboratory and co-director of the CMU-Penn University Transportation Center, his group focuses on understanding general computational principles in biological systems, and on applying that knowledge to build autonomous systems.

Statistical and machine learning challenges in the analysis of large networks

Date and Time
Tuesday, December 2, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Barbara Engelhardt
Network data --- i.e., collections of measurements on pairs, or tuples, of units in a population of interest --- are ubiquitous nowadays in a wide range of machine learning applications, from molecular biology to marketing on social media platforms. Surprisingly, assumptions underlying popular statistical methods are often untenable in the presence of network data. Established machine learning algorithms often break when dealing with combinatorial structure. And the classical notions of variability, sample size and ignorability take unintended connotations. These failures open to door to a number of technical challenges, and to opportunities for introducing new fundamental ideas and for developing new insights. In this talk, I will discuss open statistical and machine learning problems that arise when dealing with large networks, mostly focusing on modeling and inferential issues, and provide an overview of key technical ideas and recent results and trends.
 
Edoardo M. Airoldi is an Associate Professor of Statistics at Harvard University, where he leads the Harvard Laboratory for Applied Statistical Methodology. He holds a holds Ph.D. in Computer Science and an M.Sc. in Statistics from Carnegie Mellon University, and a B.Sc. in Mathematical Statistics and Economics from Bocconi University. His current research focuses on statistical theory and methods for designing and analyzing experiments in the presence of network interference, and on inferential issues that arise in models of network data. He works on applications in molecular biology and proteomics, and in social media analytics and marketing. Airoldi is the recipient several research awards including the ONR Young Investigator Award, the NSF CAREER Award, and the Alfred P. Sloan Research Fellowship, and has received several outstanding paper awards including the Thomas R. Ten Have Award for his work on causal inference, and the John Van Ryzin Award for his work in biology. He has recently advised the Obama for America 2012 campaign on their social media efforts, and serves as a technical advisor at Nanigans and Maxpoint.
 

The Network Inside Out: New Vantage Points for Internet Security

Date and Time
Wednesday, October 15, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Ed Felten

J. Alex Halderman

The Internet's size, and the diversity of connected hosts, create difficult challenges for security.  Conventionally, most vulnerabilities are discovered through labor-intensive scrutiny of individual implementations, but this scales poorly, and important classes of vulnerabilities can be hard to detect when considering hosts in isolation. Moreover, the security of the Internet as a whole is affected by management decisions made by individual system operators, but it is difficult to make sense of these choices--or to influence them to improve security--without a global perspective.

In recent work, I have been developing new approaches to these challenges, based on the analysis of large-scale Internet measurement data. By collecting and analyzing the public keys used for HTTPS and SSH, my team discovered serious weaknesses in key generation affecting millions of machines, and we were able to efficiently factor the RSA moduli used by almost 0.5% of all HTTPS servers. By clustering and investigating the vulnerable hosts, we exposed flawed cryptographic implementations in network devices manufactured by more than 60 companies and uncovered a critical design flaw in the Linux kernel. 

To help other researchers apply similar techniques, we developed ZMap, a tool for performing Internet-wide network surveys that can probe the entire IPv4 address space in minutes, thousands of times faster than prior approaches. ZMap has become a thriving open-source project and is available in major Linux distributions. We've used it to develop defenses against compromised HTTPS certificate authorities, to study the Internet's response to the infamous OpenSSL Heartbleed vulnerability, and to significantly increase the global rate of patching for vulnerable hosts. Ultimately, measurement-driven approaches to Internet security may help shift the security balance of power to favor defenders over attackers.

J. Alex Halderman is an assistant professor of computer science and engineering at the University of Michigan and director of Michigan's Center for Computer Security and Society. His research focuses on computer security and privacy, with an emphasis on problems that broadly impact society and public policy. Prof. Halderman's interests include application security, network security, anonymous and censorship-resistant communication, electronic voting, digital rights management, mass surveillance, and online crime, as well as the interaction of technology with law, regulatory policy, and international affairs.

Prof. Halderman is widely known for developing the "cold boot" attack against disk encryption, which altered widespread security assumptions about the behavior of RAM, influenced computer forensics practice, and inspired the creation of a new subfield of theoretical cryptography. A noted expert on electronic voting security, he helped lead the first independent review of the election technology used by half a billion voters in India, which prompted the national government to undertake major technical reforms. He has authored more than 50 publications, and his work has won numerous distinctions, including two best paper awards from Usenix Security, a top systems security venue.

Colloquium Speaker: Alex Halderman

Date and Time
Wednesday, October 15, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Ed Felten

abstract to follow.

Robust Abstractions for Replicated Shared State

Date and Time
Thursday, November 20, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
David Walker

Sebastian Burckhardt

Sebastian Burckhardt

In the age of cloud-connected mobile devices, users want responsive apps that read and write shared data everywhere, at all times, even if network connections are slow or unavailable. Replication and eventual consistency, while able to deliver this experience, require us to face the complexity of asynchronous update propagation and conflict resolution. Our research goal is to find abstractions that encapsulate this complexity, in order to simplify the programming of distributed applications that are responsive, reactive, and collaborative.

In this talk, we first discuss the general principles of eventual consistency. Then, we introduce our programming model, consisting of cloud types (for declarative type-based conflict resolution) and the GLUT model (an operational consistency model based on a global log of update transactions). Finally, we report on our practical experiences with supporting cloud types and GLUT in the TouchDevelop programming language and mobile development environment.

Sebastian Burckhardt was born and raised in Basel, Switzerland, where he studied Mathematics at the local University. During an exchange year at Brandeis University, he discovered his affinity to Computer Science and immigrated to the United States. After a few years of industry experience at  IBM, he returned to academia and earned his PhD in Computer Science at the University of Pennsylvania.  Since then, he has worked as a researcher at Microsoft Research in Redmond.  His general research interest is the study of programming models for of concurrent, parallel, and distributed systems. More specific interests include consistency models, concurrency testing, self-adjusting computation, and the concurrent revisions programming model.

Google Strength Neural Networks

Date and Time
Monday, November 10, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Sebastian Seung

Greg Corrado

Greg Corrado

Industrial scale applications of machine learning are surprisingly important in the products and services we enjoy today. Over the last few years classical artificial neural networks have reemerged as one of the most powerful, practical machine learning tools available. More than it was driven by algorithmic advances, this “deep learning” renaissance has been fueled by the availability of ever larger data stores and clever use of vast computational resources. Greg will describe Google's large scale distributed neural network framework and the applications of neural networks to the domains of image recognition, speech recognition, and text understanding.

Greg Corrado is a senior research scientist at Google working in artificial intelligence, computational neuroscience, and scalable machine learning. He has worked for some time on brain inspired computing, and most recently has served as one of the founding members and a technical lead on Google's large scale deep learning project. Before coming to Google, he worked at IBM Research on the SyNAPSE neuromorphic silicon chip. He did his graduate work in Neuroscience and in Computer Science at Stanford University, and his undergraduate in work Physics at Princeton University.

 
Follow us: Facebook Twitter Linkedin