Quick links

CS Department Colloquium Series

Determinism Is Not Enough: Making Parallel Programs Reliable with Stable Multithreading

Date and Time
Monday, April 28, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Michael Freedman

Junfeng Yang

Junfeng Yang

Our accelerating computational demand and the rise of multicore hardware have made parallel programs, especially shared-memory multithreaded programs, increasingly pervasive and critical. Yet, these programs remain extremely difficult to write, test, analyze, debug, and verify. Conventional wisdom has attributed these difficulties to nondeterminism (i.e., repeated executions of the same program on the same input may show different behaviors), and researchers have recently dedicated much effort to bringing determinism into multithreading. In this talk, I argue that determinism is not as useful as commonly perceived: it is neither sufficient nor necessary for reliability. We present our view on why multithreaded programs are difficult to get right, describe a promising approach we call stable multithreading to dramatically improve reliability, and summarize our last four years’ research on building and applying stable multithreading systems.

Junfeng Yang's research (www.cs.columbia.edu/~junfeng) centers on making reliable and secure systems. He earned his PhD at Stanford, where he created eXplode, a general, lightweight system for effectively finding storage system errors. This work has led to an OSDI '04 best paper, numerous bug fixes to real systems such as the Linux kernel, and a featured article in Linux Weekly news. He worked at Microsoft Research, Silicon Valley from 2007-2008, extending eXplode to check production distributed systems. MoDist, the resultant system, is being transferred to Microsoft product groups. He's now co-directing the Software Systems Lab (ssl.cs.columbia.edu) at Columbia University, where his recent work on making reliable parallel programs---the Tern/Peregrine/Parrot stable multithreading systems---was featured in CACM, ACM Tech News, The Register, and many other sites. He won Sloan and AFOSR YIP both in 2012; and NSF CAREER in 2011.

Bots, Bubbles, and Bottlenecks: Safeguarding the User’s Internet Experience

Date and Time
Thursday, April 17, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Speaker
Host
Michael Freedman

Nick Feamster

Nick Feamster

A user's experience on the Internet rests in the hands of a large and increasingly diverse set of stakeholders. Internet service providers and content providers point fingers at one other about performance problems.  Miscreants launch attacks against both other users and the Internet infrastructure itself.  Content providers continually engage in practices to "personalize" what we see and when we see it.   Many governments aim to control its citizens' access to information, while activists design circumvention tools.   Safeguarding the user's Internet experience requires both gathering empirical network measurements to detect threats (typically in the absence of any "ground truth") and developing large-scale systems to mitigate them.  In this talk, I will present three classes of safeguards against different threats to the user's Internet experience: (1) technologies to characterize and improve performance in the Internet's "last mile", including a worldwide deployment of home routers in around 200 home networks and ongoing studies with the Federal Communications Commission; (2) methods for lightweight and fast detection of message abuse, such as spam, that have since been widely adopted by industry; and (3) defenses against against information manipulation attacks, a new class of attacks against personalization algorithms.  I will also discuss other such threats and how networking can draw from other disciplines to tackle them.

Nick Feamster is an associate professor in the College of Computing at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, with a focus on network operations, network security, and censorship-resistant communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, the IRTF Applied Networking Research Prize, and award papers at the SIGCOMM Internet Measurement Conference (measuring Web performance bottlenecks), SIGCOMM (network-level behavior of spammers), the NSDI conference (fault detection in router configuration), Usenix Security (circumventing web censorship using Infranet), and Usenix Security (web cookie analysis).

Machine Learning in the Wild

Date and Time
Thursday, March 13, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
David Blei

Ameet Talwalkar

Ameet Talwalkar

Modern datasets are rapidly growing in size and complexity, and this wealth of data holds the promise for many transformational applications. Machine learning is seemingly poised to deliver on this promise, having proposed and rigorously evaluated a wide range of data processing techniques over the past several decades. However, concerns over scalability and usability present major roadblocks to the wider adoption of these methods, and in this talk I will present work that addresses these concerns. In terms of scalability, my work relies on a careful application of divide-and-conquer methodology. In terms of usability, I focus on developing tools to diagnose the applicability of learning techniques and to autotune components of typical machine learning pipelines. I will discuss applications in the context of matrix factorization, estimator quality assessment and genomic variant calling.

Ameet Talwalkar is a postdoctoral fellow in the Computer Science Division at UC Berkeley. He obtained a Ph.D. in Computer Science from the Courant Institute at New York University, and prior to that graduated summa cum laude from Yale University. His work addresses scalability and ease-of-use issues in the field of machine learning, as well as applications related to large-scale genomic sequencing analysis. He has won the Janet Fabri Prize for best doctoral dissertation and the Henning Biermann Award for exceptional service at NYU, received Yale's undergraduate prize in Computer Science, and is an NSF OCI postdoctoral scholar.

Packet Transport Mechanisms for Data Center Networks

Date and Time
Monday, March 10, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Jennifer Rexford
Mohammad Alizadeh

Mohammad Alizadeh

We are witnessing a fundamental shift in the computing landscape in recent years as much of the world's application workloads move to massive "cloud" data centers. These data centers enable the services we rely on daily, such as web search, social networking, and Internet commerce. They also require huge investments to deploy and operate. This has spurred significant interest in the industry and the research community in innovation for data centers, and in particular, for data center networks. 
 
A crucial feature of a data center network is its transport mechanism: the method by which data is transferred from one server to another. Ideally, this should occur at the highest possible rate and with the lowest possible latency. But there is an inherent tension between these requirements, especially with today's state-of-the-art Transmission Control Protocol (TCP) in data centers. In this talk, I will present three data center transport designs with increasing levels of sophistication. First, I will describe Data Center TCP (DCTCP), a new congestion control algorithm for data centers that is now shipping with Windows Server 2012. DCTCP uses a simple modification to the TCP algorithm to achieve full throughput while reducing queueing latency in the switches by ~10x compared to TCP. I will present a control-theoretic analysis of DCTCP's control loop. I will then discuss HULL, an architecture that builds on DCTCP to deliver near-zero fabric latency: only propagation and switching latency, no queueing. Finally, I will describe pFabric, a minimalistic design that leverages application-level hints and very simple switch mechanisms to achieve near theoretically optimal scheduling of flows in the data center fabric. 
 
Mohammad Alizadeh is a Researcher at Insieme Networks (recently acquired by Cisco Systems). He received his Ph.D. in Electrical Engineering from Stanford University where he was advised by Balaji Prabhakar. Before that, he completed his undergraduate degree in Electrical Engineering at Sharif University of Technology. His research interests are broadly in networked systems, data center networking, and cloud computing. His dissertation work focused on designing high performance data center transport mechanisms. His research has garnered significant industry interest: the DCTCP algorithm has been implemented in Windows Server 2012; the QCN congestion control algorithm has been standardized as the IEEE 802.1Qau standard; and most recently, the CONGA adaptive load balancing mechanism has been implemented in Cisco's new flagship Application Centric Infrastructure products. Mohammad is a recipient of a Stanford Electrical Engineering Departmental Fellowship, the Caroline and Fabian Pease Stanford Graduate Fellowship, and the Numerical Technologies Inc. Prize and Fellowship. 

Software Verification: Locks and MLoCs*

Date and Time
Wednesday, April 2, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Andrew Appel

Software verification deals with proving or falsifying correctness properties of programs, and has broad applications from ensuring safety of mission-critical software to improving program robustness and programmer productivity. This area has long held theoretical interest, but the inherent undecidability of the general problem and intractability of specific formulations have been barriers to practical success. Approaches based on automatic techniques for state space exploration, such as model checking and symbolic execution, have seen a resurgence of interest in recent years, driven largely by advancements in solvers for Boolean Satisfiability (SAT) and Satisfiability Modulo Theories (SMT). These solvers perform the underlying search for proofs or bugs, typically by reasoning over abstract models of programs. While abstraction is critical in scaling to real-life programs, it is equally important to ensure enough precision in the models and the analysis to verify specific properties of interest. In this talk, I will describe my work in foundational techniques that address these twin challenges of precision and scalability in verifying sequential and multi-threaded programs. I will also share the lessons learnt in my efforts in driving them to industry practice where they are routinely used in a modular methodology to find complex bugs in software projects, some with millions of lines of code. Finally, I will discuss future extensions and applications of this verification framework.

*MLoC: Million Lines of Code

Aarti Gupta received her PhD in Computer Science from Carnegie Mellon University. Her broad research interests are in the areas of formal analysis of programs/systems and automatic decision procedures. At NEC Labs she leads research in systems analysis and verification. She has given several invited keynotes highlighting the foundational research contributions of her group in verification and their successful industrial deployment. She has served as an Associate Editor for Formal Methods in System Design (since 2005) and for the ACM Transactions on Design Automation of Electronic Systems (2008-2012). She has also served as Co-Chair for the International Workshop on Satisfiability Modulo Theories (SMT) 2010, the International Conference on Computed Aided Verification (CAV) 2008, and Formal Methods in Computer Aided Design (FMCAD) 2006. She is currently serving on the Steering Committee of CAV.

Guaranteed Learning of Latent Variable Models: Overlapping Community Models and Overcomplete Representations

Date and Time
Thursday, February 27, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Sanjeev Arora

Incorporating latent or hidden variables is a crucial aspect of statistical modeling.  I will present a statistical and a computational framework for guaranteed learning of a wide range of latent variable models.  I will focus on two instances, viz., community detection and overcomplete representations.

The goal of community detection is to discover hidden communities from graph data. I will present a tensor decomposition approach for learning probabilistic mixed membership models. The tensor approach is guaranteed to correctly recover the mixed membership communities with tight guarantees. We have deployed it on many real-world networks, e.g. Facebook, Yelp and DBLP. It is easily parallelizable, and is orders of magnitude faster than the state-of-art stochastic variational approach.

I will then discuss recent results on learning overcomplete latent representations, where the latent dimensionality can far exceed the observed dimensionality.  I will present two frameworks, viz., sparse coding and sparse topic modeling. Identifiability and efficient learning are established under some natural conditions such as incoherent dictionaries or persistent topics.

Anima Anandkumar is  a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics.  She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty  at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group at MIT between 2009-2010. She is the recipient of the Microsoft Faculty Fellowship, ARO Young Investigator Award, NSF CAREER Award, IBM Fran Allen PhD fellowship, thesis award from ACM SIGMETRICS society, paper awards from the ACM SIGMETRICS and IEEE Signal Processing societies, and 2014 Sloan Fellowship.


 

A new class of bugs: How compiler optimizations harm our systems through undefined behavior

Date and Time
Wednesday, February 26, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Vivek Pai

Xi Wang

Xi Wang

Software bugs introduce security vulnerabilities into our computer systems.  To understand and mitigate an increasing number of bugs, practitioners categorize them into classes, such as buffer overflow or SQL injection, and handle each class separately. 

This talk introduces a new class of bugs called unstable code: code that is unexpectedly discarded by compiler optimizations due to undefined behavior in the program.  I will discuss its prevalence and security impact in systems, and present a systematic approach for reasoning about unstable code, as well as a static checker called Stack that implements this approach to precisely identify unstable code in real systems.  Applying Stack to widely used software has uncovered 160 new bugs that have been confirmed and fixed by developers.  It has also been adopted by several companies to scan their codebases.

Xi Wang is a PhD candidate in Computer Science at MIT, advised by M. Frans Kaashoek and Nickolai Zeldovich.  His research interests are in building secure and reliable systems.  He was awarded a Best Paper Award at SOSP 2013, a Best Student Paper Award at EuroSys 2008, and an MIT Jacobs Presidential Fellowship in 2008.
 

Program Synthesis for the Masses

Date and Time
Tuesday, February 25, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Brian Kernighan

Rishabh Singh

Rishabh Singh

New computing platforms have greatly increased the demand for programmers, but learning to program remains a big challenge. Program synthesis has the potential to revolutionize programming by making it more accessible. My work has focused on two goals: making programming more intuitive through the use of new interfaces, and using automated feedback to help students learn programming. In this talk, I will present my work on three systems that work towards these goals. The FlashFill system helps end-users perform repetitive data transformations over strings, numbers, and tables using input-output examples. FlashFill was shipped as part of Excel 2013 and was quoted as one of the top features by many press reviews. The Storyboard Programming system helps students write data-structure manipulations using textbook-like visual examples and bridges the gap between high-level insights and low-level code. Finally, the Autograder system provides automated feedback to students on introductory programming assignments, and was successfully run on tens of thousands of programming exercises from edX. I will describe how ideas from advances in constraint-solving, machine learning, and formal verification enabled the new forms of interaction required by these systems.

Rishabh Singh is a PhD candidate in the Computer Science and Artificial Intelligence Laboratory at MIT. His research interests are broadly in formal methods and programming languages. His PhD work focuses on developing program synthesis techniques for making programming accessible to end-users and students. He is a Microsoft Research PhD fellow and winner of MIT's William A. Martin Outstanding Master's thesis Award. He obtained his BTech in Computer Science and Engineering from IIT Kharagpur in 2008, where he was awarded the Institute Silver Medal and Bigyan Sinha Memorial Award. He was also awarded to be Prime Minister's National Guest at Republic Day Parade, New Delhi in 2005.

Building Embedded Sensor Systems to Bring Ubicomp to Life

Date and Time
Tuesday, March 25, 2014 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Adam Finkelstein

Gabe Cohn

Gabe Cohn

Although we have successfully created smaller, faster, and cheaper computer devices, several adoption barriers remain to realize the dream of Ubiquitous Computing (Ubicomp). By lowering these barriers, we can seamlessly embed human-computer interfaces into our home and work environments. My work focuses on building integrated hardware/software sensing systems for Ubicomp applications using my expertise in embedded systems, low-energy hardware design, and sensing, in addition to integrating communications, signal processing, and machine learning. In this talk, I will use my research to present three main techniques to lower the installation, maintenance, and scalability adoption barriers and bring Ubicomp to life. First, I will discuss my work on using the existing infrastructure in buildings to reduce the number of sensors required to enable many Ubicomp applications. Second, I will discuss how the conductive properties of the human body can be leveraged to enable novel human-computer interactions and health sensing opportunities. Finally, I will describe techniques for dramatically reducing the power consumption of embedded sensor systems for Ubicomp applications. By continually working on application-driven interdisciplinary research, we can lower the adoption barriers and enable many new high-impact application domains.

Gabe Cohn is a Ph.D. candidate in Electrical Engineering in the Ubiquitous Computing (Ubicomp) Lab at the University of Washington, advised by Shwetak Patel. His research focuses on (1) designing and implementing ultra-low-power embedded sensing systems, (2) leveraging physical phenomena to enable new sensing modalities for human-computer interaction, and (3) developing sensor systems targeted at realizing immediate change in high-impact application domains. He was awarded the Microsoft Research Ph.D. Fellowship in 2012, the National Science Foundation Graduate Research Fellowship in 2010, and 6 Best Paper awards and nominations. He is the co-founder of SNUPI Technologies (www.wallyhome.com), a sensor and services company focused on home safety, security, and loss prevention. He received his B.S. with honors in Electrical Engineering from the California Institute of Technology in 2009, where he specialized in embedded systems, computer architectures, and digital VLSI.

 
 

Statistical models to study how a genetic variant impacts an organism

Date and Time
Monday, March 3, 2014 - 4:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Olga Troyanskaya

Barbara Engelhardt

Barbara Engelhardt

Consider sequencing the genome of a newborn, and selecting targeted therapeutics early in life to reduce her lifetime risk of addiction, obesity, type II diabetes, or pancreatic cancer. While genome-wide association studies (GWAS) have unquestionably been successful in identifying reproducible genomic risk factors for complex human diseases, the promise of developing therapeutics to reduce the heritable portion of disease risk is far from fulfillment. The essential technological developments to fulfill this promise, however, are mainly in statistics and computation rather than in genomic experimental methods. I describe three genomic studies from my recent work. First, I identified a genetic variant that behaves differently depending on whether or not an individual takes cholesterol-reducing drugs. Second, I found that genetic variants that are associated with different cell traits are co-localized with a large variety of different regulatory mechanisms more often than expected. Third, I developed a model to uncover genetic variants that affect many traits simultaneously, where the trait measurements have substantial technical noise. Throughout, I emphasize statistical and computational challenges, and innovations necessary to fulfill this promise of genomic studies.

 

Follow us: Facebook Twitter Linkedin