Quick links

CS Department Colloquium Series

The devil is in the details: on the real costs and benefits of WOM codes for Flash

Date and Time
Wednesday, December 7, 2016 - 4:30pm to 5:30pm
Location
Computer Science 402
Type
CS Department Colloquium Series
Host
Kai Li

NAND flash, used in modern SSDs, is a write-once medium, where each memory cell must be erased prior to writing. The lifetime of an SSD is limited by the number of erasures allowed on each cell. Thus, minimizing erasures is a key objective in SSD design.

A promising approach to eliminate erasures and extend SSD lifetime is to use write-once memory (WOM) codes, designed to accommodate additional writes on write-once media. However, these codes inflate the physically stored data by at least 29%, require an extra read operation before each additional write, and their applicability to MLC and TLC flash is restricted.

I will describe the first evaluation of the possible benefit from reusing flash pages with WOM codes on real flash chips and an end-to-end FTL design and implementation. Our results expose a considerable gap between the theoretical benefits of page reuse and those that can be achieved on current state-of-the-art hardware. However, they also lay the ground for promising future improvements.

Joint work with Fabio Margaglia, Eitan Yaakobi, Yue Li, Assaf Schuster and Andre Brinkmann.

Controlling the rate of false discoveries in tandem mass spectrum identifications

Date and Time
Thursday, November 10, 2016 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Ben Raphael
Uri Keich
A typical shotgun proteomics experiment produces thousands of tandem mass spectra, each of which can be tentatively assigned a corresponding peptide by using a database search procedure that looks for a peptide-spectrum match (PSM) that optimizes the score assigned to a matched pair. Some of the resulting PSMs will be correct while others will be false, and we have no way to verify which is which. The statistical problem we face is of controlling the false discovery rate (FDR), or the expected proportion of false PSMs among all reported pairings. While there is a rich statistical literature on controlling the FDR in the multiple hypothesis testing context, controlling the FDR in the PSM context is mostly done through the "home grown" method called target-decoy competition (TDC).
 
After a brief introduction to the problem of tandem mass spectrum identification we will explore the reasons why the mass spec community has been using this non-standard approach to controlling the FDR. We will then discuss how calibration can increase the number of correct discoveries and offer an alternative method for controlling the FDR in the presence of calibrated scores. We will conclude by arguing that our analysis extends to a more general setup than the mass spectrum identification problem.
 
Joint work with Bill Noble (University of Washington)
 

An introduction to (genomic) counting

Date and Time
Tuesday, December 13, 2016 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Barbara Engelhardt

Lior Pachter
I will survey a number of recently developed sequencing based assays for functional genomics, and discuss a number of fundamental statistical and computational ideas that make allow for extracting genomic information from sequencing reads. 

Examples will be drawn from assays such as RNA-Seq, DMS/SHAPE-Seq, PseudoSeq and iCLIP. I will be discussing work that is joint with Sharon Aviran, Nicolas Bray, Shannon Hateley, Bo Li, Shannon McCurdy, Páll Melsted, Vasilis Ntranos, Harold Pimentel, Akshay Tambe and Lynn Yi.

I was born in Ramat Gan, Israel, and grew up in Pretoria, South Africa where I attended Pretoria Boys High School. After receiving a B.S. in Mathematics from Caltech in 1994, I left for MIT where I was awarded a PhD in applied mathematics in 1999. I then moved to the University of California at Berkeley where I was a postdoctoral researcher (1999-2001), assistant professor (2001-2005), associate professor (2005-2009), and am currently the Raymond and Beverly Sackler professor of computational biology at UC Berkeley and professor of mathematics and molecular and cellular biology with a joint appointment in computer science.

My research interests span the mathematical and biological sciences, and I have authored over 100 research articles in the areas of algorithms, combinatorics, comparative genomics, algebraic statistics, molecular biology and evolution. I've been awarded a National Science Foundation CAREER award, a Sloan Research Fellowship, the Miller Professorship, and a Federal Laboratory Consortium award for the successful technology transfer of widely used sequence alignment software developed in my group.  

Predicting the impact of rare regulatory variation

Date and Time
Tuesday, November 29, 2016 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Barbara Engelhardt
The enormous increase in availability of full human genomic sequences presents great opportunity for understanding the impact of rare genetic variants. Based on current knowledge, however, we are still limited in our ability to interpret or predict regulatory consequences of rare variants. Association methods are inherently limited for variants observed in only one or a few individuals. Diverse genomic annotations such as growing resources of epigenetic data and have been shown to be informative regarding regulatory elements, but are only moderately predictive of impact for individual variants. The availability of personal RNA-seq and other cellular measurements for the same individuals with genome sequencing offers a new avenue for integrated methods for prioritizing rare functional variants. By considering informative genomic annotations along with molecular phenotyping, we are able to identify the regulatory impact of rare genetic variants largely excluded from previous analyses. We have evaluated the impact of rare regulatory variants using whole genome sequences and corresponding RNA-sequence in 44 different tissues samples from the GTEx project and report.  We demonstrate that rare variants in promoter elements, conserved regions, and variants that affect splicing are associated with extreme changes in gene expression across multiple tissues. Additionally, we have developed a Bayesian machine learning approach that integrates whole genome sequencing with RNA-seq data from the same individual, leveraging gene expression levels along with diverse genomic annotations and performing joint inference to identify likely functional rare regulatory variants. We have applied this model to the GTEx data to prioritize the most likely functional rare regulatory variants for each individual. We demonstrate that integrative models perform better than predictions from DNA-sequencing or RNA-sequencing alone. Our probabilistic model of rare regulatory genetic variants offers potential for identifying deleterious regulatory variants from individual genomes.
 
Alexis Battle’s research focuses on unraveling the impact of genetics on the human health, using machine learning and probabilistic methods to analyze large scale genomic data.  She is interested in applications to personal genomics, genetics of gene expression, and gene networks in disease, leveraging diverse data to infer more comprehensive models of genetic effects on the cell.  She earned her Ph.D. and Masters in Computer Science in 2014 from Stanford University in 2014, where she also received her Bachelors in Symbolic Systems (2003).  Alexis also spent several years in industry as a member of the technical staff at Google.  Prior to joining Hopkins, Alexis spent a year as a postdoc with Jonathan Pritchard with HHMI and the Genetics Department at Stanford.  She joined JHU in July 2014.

Constitutional Malware

Date and Time
Tuesday, October 25, 2016 - 12:30pm to 1:30pm
Location
Friend Center Convocation Room
Type
CS Department Colloquium Series
Host
Nick Feamster

Jonathan Mayer

Jonathan Mayer'09

The United States government hacks computer systems, for law enforcement purposes. As encryption becomes more pervasive, and as anonymization tools become easier to use, the government will foreseeably increase its resort to malware. This article provides a comprehensive examination of how federal law regulates law enforcement hacking.
 
The article's first contribution is descriptive. It explains why and how law enforcement agencies hack computer systems, and it offers a framework for examining critical steps in the operation of government malware.
 
The article's next contributions are doctrinal, analyzing the appropriate legal procedures for law enforcement hacking. About half of courts have, surprisingly, concluded that government malware does not necessarily implicate constitutional privacy protections. The article argues that law enforcement hacking constitutes a Fourth Amendment "search," and presumptively requires a warrant and associated protections.
 
The article also makes theoretical contributions. Government malware is the latest flashpoint for electronic surveillance, and it illuminates longstanding scholarly debates about the Fourth Amendment. Law enforcement hacking is a case study in how the three branches of government articulate surveillance regulation, the inherent challenges of rebalancing privacy law to account for new technology, and the relationship between statutory and constitutional privacy protections.
 
Jonathan Mayer is Chief Technologist of the Federal Communications Commission Enforcement Bureau, where he works to protect the security and privacy of America's telecommunications infrastructure. Jonathan is also a Cybersecurity Fellow at Stanford University, where he is completing a Ph.D. in computer science and received a J.D. in 2013.

Meta-unsupervised-learning: a principled approach to unsupervised learning

Date and Time
Monday, November 7, 2016 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Speaker
Adam Kalai, from Microsoft Research
Host
Elad Hazan

Adam Kalai
Unsupervised Learning and exploratory data analysis are among the most important and yet murkiest areas within machine learning. Debates rage even about how to choose which objective function to optimize. We introduce a principled data-driven approach: “meta-unsupervised-learning” using a collection of related or unrelated learning problems. We present simple agnostic models and algorithms illustrating how the meta approach circumvents impossibility results for novel "meta" problems such as meta-clustering, meta-outlier-removal, meta-feature-selection, and meta-embedding. We also present empirical results showing how the meta approach improves over standard techniques for problems such as outlier removal and choosing a clustering algorithm and a number of clusters. We also train an unsupervised neural network that learns from prior supervised classification problems drawn from learning problems at openml.org.

Joint work with Vikas Garg from MIT

Adam Tauman Kalai received his BA from Harvard, and MA and PhD under the supervision of Avrim Blum from CMU. After an NSF postdoctoral fellowship at M.I.T. with Santosh Vempala, he served as an assistant professor at the Toyota Technological institute at Chicago and then at Georgia Tech. He is now a Principal Researcher at Microsoft Research New England. His honors include an NSF CAREER award and an Alfred P. Sloan fellowship. His research focuses on machine learning, human computation, and algorithms.

Learning Multimodal Deep Models

Date and Time
Friday, October 21, 2016 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Elad Hazan

Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many Artificial Intelligence tasks, including visual object recognition, information retrieval, speech perception, and language understanding. In this talk I will first introduce a broad class of deep learning models and show that they can learn useful hierarchical representations from large volumes of high-dimensional data with applications in information retrieval, object recognition, and speech perception. I will next introduce deep models that are capable of extracting a unified representation that fuses together multiple data modalities as well as present the Reverse Annealed Importance Sampling Estimator (RAISE) for evaluating these deep generative models. Finally, I will discuss models that can generate natural language descriptions (captions) of images, as well as generate images from captions using attention mechanism. I will show that on several tasks, including modelling images and text, these models significantly improve upon many of the existing techniques.

Ruslan Salakhutdinov received his PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016 he joined the Machine Learning Department at Carnegie Mellon University as an Associate Professor. Ruslan's primary interests lie in deep learning, machine learning, and large-scale optimization. He is an action editor of the Journal of Machine Learning Research and served on the senior programme committee of several learning conferences including NIPS and ICML. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, Nvidia's Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research.

Structuring Peer Interactions for Learning Expertise at Scale

Date and Time
Thursday, November 17, 2016 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Olga Russakovsky

Learning with peers helps students reflect, generalize knowledge and apply it more successfully to new problems. How can we scale successful peer learning from the controlled environment of the small classroom to the wild, massive scale of online classes? In his talk, Chinmay Kulkarni will introduce computational systems that structure peer learning at massive scale. These systems demonstrate how insights from educational theory can be distilled into interfaces that scale teaching to thousands of learners. For instance, he will describe how students using his PeerStudio obtain improvement-oriented feedback on open-ended work in just twenty minutes at any time of day, enabling them to revise and gain mastery. Similarly, students use the Talkabout system to engage in globe-spanning discussions that improve learning through diversity. This talk will be a reflection of the growing maturity of research in this area, from articulating design patterns, to randomized controlled experiments at scale, to studies of long-term effects on learners and learning. 

Finally, Kulkarni hopes to reflect on how the large scale and diversity of online classes can enrich learning at universities, and his ongoing efforts to prepare engineering students for tomorrow's grand challenges. 

Chinmay Kulkarni is an Assistant Professor in the Human-Computer Interaction Institute at Carnegie Mellon University. His research group investigates how new software and pedagogical systems can leverage peer processes at massive scale. Over the past three years, his work has been used by more than 100,000 learners online, in classes taught by more than twenty universities, including Harvard. In his idle time, Chinmay is either making coffee, drinking coffee, or clicking "View Source" on web pages he visits.

Grand Challenges in Phylogenomics

Date and Time
Wednesday, October 5, 2016 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Mona Singh

Tandy Warnow
Estimating the Tree of Life will likely involve a two-step procedure, where in the first step trees are estimated on many genes, and then the gene trees are combined into a tree on all the taxa. However, the true gene trees may not agree with with the species tree due to biological processes such as deep coalescence, gene duplication and loss, and horizontal gene transfer. Statistically consistent methods based on the multi-species coalescent model have been developed to estimate species trees in the presence of incomplete lineage sorting; however, the relative accuracy of these methods compared to the usual "concatenation" approach is a matter of substantial debate within the research community.  In this talk I will present new state of the art methods (Statistical Binning, Science 2014, ASTRAL ECCB 2014 and ISMB 2015) we have developed for estimating species trees in the presence of incomplete lineage sorting (ILS), and show how they can be used to estimate species trees from genome-scale datasets.

Tandy Warnow is the Founder Professor of Engineering at the University of Illinois at Urbana-Champaign. She has a dual appointment between Computer Science and Bioengineering, and is a member of the Carl R. Woese Institute for Genomic Biology, a member of the National Center for Supercomputing Applications, and an affiliate in six other departments at UIUC (Statistics, Mathematics, Electrical and Computer Engineering, Plant Biology, Animal Biology, and Entomology). Tandy received her PhD in Mathematics at UC Berkeley under the direction of Gene Lawler, and did postdoctoral training with Simon Tavaré and Michael Waterman at the University of Southern California. She received the National Science Foundation Young Investigator Award in 1994, the David and Lucile Packard Foundation Award in Science and Engineering in 1996, an Emeline Bigelow Conland Fellowship at the Radcliffe Institute for Advanced Study in 2006, a Guggenheim Foundation Fellowship for 2011, and was elected an ACM Fellow in 2016. Her research combines mathematics, computer science, and statistics to develop improved models and algorithms for reconstructing complex and large-scale evolutionary histories in both biology and historical linguistics. Her current research focuses on phylogeny and alignment estimation for very large datasets (10,000 to 1,000,000 sequences), estimating species trees from collections of gene trees, and metagenomics.  Overall, Tandy has published more than 130 papers in the area of computational molecular biology.

Design and Implementation of a Freshman Algorithms Course with Contracts

Date and Time
Monday, September 26, 2016 - 12:30pm to 1:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Ananda Gunawardena

Frank Pfenning
We provide an overview of the introductory computer science curriculum at Carnegie Mellon University and then provide some detail about its perhaps most unusual component: a first-semester freshman-level course on imperative programming, algorithms, data structures, and computational thinking.  A key aspect of this course is its use of C0, a small safe subset of C augmented with a language for expressing contracts that capture pre- and post-conditions as well as loop and data structure invariants.  We provide some sample lecture material and report on our experience with this course over the last six years.

 Frank Pfenning studied Mathematics and Computer Science at the Technical University Darmstadt and then left for Carnegie Mellon University on a Fulbright scholarship where he obtained his Ph.D. in Mathematics in 1987 under the supervision of Professor Peter Andrews.

He subsequently joined the Department of Computer Science at Carnegie Mellon University as research faculty where he became Professor in 2002 and served as Director of Graduate Programs from 2004 to 2008 and Associate Dean for Graduate Education from 2009 to 2010. He was appointed Head of the Computer Science Department in January 2013 and the Joseph F. Traub Professor of Computer Science in October 2015.

He has spent time as visiting scientist at the Max-Planck-Institute for Computer Science in Saarbrücken, as Alexander-von-Humboldt fellow at the Technical University Darmstadt, and as visiting professor at École Polytechnique and INRIA-Futurs. He has advised 24 completed Ph.D. theses and won the Herbert A. Simon Award for Teaching Excellence in the School of Computer Science in 2002.

He served as trustee, vice president, and president of CADE, Inc., the governing body of the International Conference on Automated Deduction, and on advisory boards for INRIA, the Max-Planck-Institute for Computer Science, and Seoul National University. He has chaired several conferences and program committees, including CADE and LICS, and has been a member of the editorial boards for Theoretical Computer Science, Journal of Automated Reasoning, and the Journal of Symbolic Computation. He was named Fellow of the ACM in 2015.

His research interests include programming languages, logic and type theory, logical frameworks, automated deduction, and computer security. In his spare time he enjoys playing squash, running, hiking, cooking, and reading. 

Follow us: Facebook Twitter Linkedin