Quick links

Colloquium

Three Beautiful Quicksorts

Date and Time
Wednesday, October 17, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Jon Bentley, from Avaya Labs Research
Host
Brian Kernighan
This talk describes three of the most beautiful pieces of code that I have ever written: three different implementations of Hoare?s classic Quicksort algorithm. The first implementation is a bare-bones function in about a dozen lines of C. The second implementation starts by instrumenting the first program to measure its run time; a dozen systematic code transformations proceed to make it more and more powerful yet more and more simple, until it finally disappears in a puff of mathematical smoke. It therefore becomes the most beautiful program I never wrote. The third program is an industrial-strength C library Qsort function that I built with Doug McIlroy. A theme running through all three implementations is the power of elegance and simplicity. (This talk expands my Chapter 3 in Beautiful Code, edited by Oram and Wilson and published by O?Reilly in July, 2007.)

Probabilistic Graphical Models and Algorithms for Integrative Bioinformatics

Date and Time
Wednesday, October 10, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Eric Xing, from CMU
Host
Olga Troyanskaya
Probabilistic graphical model is a formalism that exploits the conjoined talents of graph theory and probability theory to build complex models out of simpler pieces. It offers a powerful language to elegantly define expressive distributions under complex scenarios in high-dimensional space, and provides a systematic computational framework for probabilistic inference. These virtues have particular relevance in bioinformatics, where many core inferential problems e.g., linkage analysis, phylogenetic analysis, network reconstruction are already naturally expressed in probabilistic terms, and must deal with experimental data with complex structure and temporal and/or spatial dynamics.

I will discuss our recent work on graphical model inferential methodology in three areas in bioinformatics: (1) Population structure and recombination hotspot inference, using a novel approach based on Dirichlet process priors. I present a hidden Markov version of the Dirichlet process which allows us to infer recombination events among haplotypes in an “open” ancestral space. (2) Comparative genomics prediction of imperfectly conserved transcription factor binding sites, where multi-resolution phylogenetic inference combines with Markovian inference to provide sensitive detection of motifs and their evolutionary turnovers in eleven Drosophila species. (3) Reverse-engineering of temporally rewiring networks from gene expression time courses, where a novel hidden temporal exponential random graph model is employed to model temporal evolution of network topologies during a biological process, and to facilitate the inference of transient (rather than a single universal) regulatory circuitry underlying each time-point of the microarray time series.

Bio: Eric Xing is an assistant professor in the Machine Learning Department, the Language Technology Institute, and the Computer Science Department within the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for building quantitative models and predictive understandings of the evolutionary mechanism, regulatory circuitry, and developmental processes of biological systems; and for building computational intelligence systems involving automated learning, reasoning, and decision-making in open, evolving possible worlds. Professor Xing received his B.S. in Physics from Tsinghua University, his first Ph.D. in Molecular Biology and Biochemistry from Rutgers University, and then his second Ph.D. in Computer Science from UC Berkeley. He has been a member of the faculty at Carnegie Mellon University since 2004, and his current work involves, 1) graphical models, Bayesian methodologies, inference algorithms, and optimization techniques for analyzing and mining high-dimensional, longitudinal, and relational data; 2) computational and comparative genomic analysis of biological sequences, systems biology investigation of gene regulation, and statistical analysis of genetic variation, demography and disease linkage; and 3) application of statistical learning in text/image mining, vision, and machine translation.

The Future of Reputation: Gossip, Rumor, and Privacy on the Internet

Date and Time
Monday, October 8, 2007 - 4:30pm to 6:00pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Daniel Solove, from George Washington University
Host
Edward Felten

Policing Online Games -- Defining A Strong, Mutually Verifiable Reality in Virtual Games

Date and Time
Wednesday, October 3, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Host
Kenneth Steiglitz
Ronald Reagan was fond of saying "trust but verify". Alas, the modern virtual world of games gives users no basis for trust and no mechanism for verification. Foolish people wager real money on a hand of cards dealt to them by an offshore server. Some of the virtual worlds have been rocked by scandals where sys admins take bribes to add special features like unbeatable armor to favored players.

The good news is that we can build strong, virtual worlds that give users the basis for trust and the ability to verify the fairness of a game. The algorithms are well-known and tested, to some extent, by time. This talk will review a number of the classic results designed for playing poker or distributing digital cash, and explore how they can be used to stop some of the most blatant cheating affecting the more sophisticated online world.

Peter Wayner is the author of 13 books including "Policing Online Games". (www.wayner.org/books/)

Lessons Learned from the Internet Project

Date and Time
Wednesday, September 19, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Doug Comer, from VP of Research, Cisco Systems
Host
Larry Peterson
The Internet ranks among the greatest achievements of 20th century Computer Science. The basic technology was so well conceived that it has remained virtually unchanged despite completely new applications and dramatic growth in the number of connected computers and traffic. This eclectic talk presents a series of lessons drawn from the Internet experience that may help us better understand how to proceed with new research. It considers the design of protocols, general principles, technologies, the underlying architecture, the effect of economics on networking research, and ways that experimental research projects can be organized to ensure success.

Douglas Comer is VP of Research at Cisco systems, and Distinguished Professor of Computer Science at Purdue University, where he is currently on an extended leave. An internationally recognized expert on computer networking, Comer has been involved in Internet research since the late 1970s. His series of ground-breaking textbooks have been translated into 16 languages, and are used by professional engineers and students around the world. For twenty years, Comer was editor-in-chief of the journal Software -- Practice And Experience. He is a Fellow of the ACM.

Improving the Performance of Highly Reliable Software Systems

Date and Time
Thursday, April 26, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Ed Nightingale, from University of Michigan
Host
Jennifer Rexford
Commodity operating systems still retain the design principles developed when processor cycles were scarce and RAM was precious. These out-dated principles have led to performance/functionality trade-offs that are no longer needed or required; I have found that, far from impeding performance, features such as safety, consistency and energy-efficiency can often be added while improving performance over existing systems. I will describe my work developing Speculator, which provides facilities within the operating system kernel to track and propagate causal dependencies.

Using Speculator, I will show that distributed and local file systems can provide strong consistency and safety guarantees without the poor performance these guarantees usually entail.

Bio:

Ed Nightingale is a Ph.D. candidate at the University of Michigan. His research focuses on experimental software systems, especially operating systems, distributed systems and mobile computing.

Designing Appropriate Computing Technologies for the Rural Developing World

Date and Time
Wednesday, April 25, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Tapan Parikh, from University of Washington
Host
Perry Cook
Recent history has seen an increase in disparity between the rich and poor regions of the world. Disproportionate access to information is both a symptom and a factor contributing to this disparity. People living in the rural developing world have many information needs that could, but are not, being met by information technology. Technology for this context must be low-cost, accessible and appropriate given the local infrastructure, including conditions of intermittent power and connectivity.

In this talk, I describe my experiences developing CAM - a toolkit for mobile phone data collection for the rural developing world. Designing technologies for an unfamiliar context requires understanding the needs and capabilities of potential users. Drawing from the results of an extended design study conducted with microfinance group members in rural India (many of whom are semi-literate or illiterate), I outline a set of user interface design guidelines for accessibility to such users.

The results of this study are used to inform the design of CAM, a mobile phone application toolkit including support for paper-based interaction; multimedia input and output; and disconnected operation. I provide evidence of CAM's usability, breadth, and real-world applicability. Regarding real-world applicability, a CAM application for microfinance data collection is now being used by 17 NGO (non-governmental organization) staff to serve over 10000 group members in two states of India. Regarding breadth, I list some important rural data collection applications - including for retail supply chain tracking, agricultural monitoring and health care - that we have implemented, or can be implemented, using the CAM toolkit. I conclude by discussing possible topics for future work and my long-term research vision.

Bio: Tapan S. Parikh is an Intel Fellow and Ph.D. Candidate in the Department of Computer Science & Engineering at the University of Washington. Earlier, he received a M.S. degree in Computer Science from UW and a Sc.B. degree in Molecular Modeling (with Honors) from Brown University. Tapan's research interests include human-computer interaction (HCI), systems engineering and information and communication technologies for development (ICTD).

EXPLODE: a lightweight, General System for Finding Serious Storage System Errors

Date and Time
Wednesday, April 18, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Junfeng Yang, from Stanford
Host
David Walker
Storage systems such as file systems, databases, and RAID systems have a simple, basic contract: you give them data, they do not lose or corrupt it. Unfortunately, their code is exceptionally hard to get right: it must anticipate all failures and crashes, and always correctly recover from them, regardless of the current system state. In this talk, I will describe EXPLODE, a storage system checking tool we developed that makes it easy to systematically check real storage systems for errors. EXPLODE practically adapts ideas from model checking, a comprehensive, heavyweight formal verification technique, which makes it systematic while being lightweight. We applied EXPLODE to a broad range of real storage systems, and found serious data-loss errors in every system checked, typically with little effort.

Biography:

Junfeng Yang is a Ph.D. candidate in the Computer Science Department at Stanford. His research interests span the areas of systems, security, software engineering and programming languages, focusing on practically checking large, real-world software systems. Junfeng Yang received his MS in Computer Science from Stanford in 2002, and his BS in Computer Science from Tsinghua University, Beijing, China in 2000. He is a receipt of the Siebel Scholarship and the Stanford School of Engineering fellowship. He received the Best Paper Award of OSDI 2004.

Buffer Overflows and Group Signatures: Recent Results in Security and Cryptography

Date and Time
Wednesday, April 11, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Hovav Shacham, from Weizmann Institute
Host
Edward Felten
We analyze the effectiveness of two techniques intended to make it harder for attackers to exploit vulnerable programs: W-xor-X and ASLR. W-xor-X marks all writable locations in a process' address space nonexecutable. ASLR randomizes the locations of the stack, heap, and executable code in an address space. Intel recently added hardware to its processors (the "XD bit") to ease W-xor-X implementation. Microsoft Windows Vista ships with W-xor-X and ASLR. Linux (via the PaX project) and OpenBSD also include support for both.

We find that both measures are less effective than previously thought, on the x86 at least. A new way of organizing exploits allows the attacker to perform arbitrary computation using only code already present in the attacked process' address space, so code injection is unnecessary. Exploits organized in the new way chain together dozens of short instruction sequences, each just two or three instructions long. Because of the properties of the x86 instruction set, these sequences might not have been intentionally compiled into the binary; we find them by means of static analysis. Furthermore, the effective entropy of PaX ASLR can be searched by brute force. The attack takes just a few minutes to mount over the network.

Group signatures are a variant of digital signatures that provides anonymity for signers. Any member of a group can sign messages, but the resulting signature keeps the identity of the signer secret. In some systems there is a third party that can undo the signature anonymity (trace) using a special trapdoor. New applications for group signatures include the trusted computing initiative (TCPA) and vehicle safety ad-hoc networks (DSRC). In each case, group signatures provide privacy guarantees for tamper-resistant embedded devices.

We describe a short group signature scheme. Signatures in our scheme are approximately the size of a standard RSA signature with the same security. The mathematical setting for our scheme is certain elliptic curves featuring an efficiently computable bilinear map, a setting that has proved fruitful in recent years. We also consider two choices for handling revocation in our scheme.

Deciphering Information Encoded in the Dark Matter of the Human Genome

Date and Time
Tuesday, April 10, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Xiaohui Xie, from MIT
Host
Olga Troyanskaya
Among the three billion bases contained in the human genome, only 1.5% are well characterized, primarily in the form of protein-coding genes. Understanding all functional elements encoded in the remaining 98.5%, or the so-called dark matter, of the human genome is a fundamental challenge for human biology and medicine.

In this talk, I will describe computational methods for systematically dissecting the function of the genomic dark matter. Using pattern recognition, machine learning and comparative genomics, we have uncovered hundreds of novel regulatory motifs in these regions, creating a first systematic catalogue of both transcriptional and post-transcriptional regulatory elements in the human genome. Our systematic analyses also lead to a fundamental change of view on microRNA gene regulation, and the discovery of over 15,000 insulator sequences, which partition the human genome into domains of expression.

In a few years, genome sequences of over 50 mammals will become available. I will discuss how these data will empower our computational methods, and provide an opportunity to unravel all information encoded in the human genome.

BIO: Xiaohui Xie is a computational biologist working at the Broad Institute of MIT and Harvard. His recent research interests are in computational genomics and regulatory motif discovery in particular. Dr. Xie received his M.S. in Computer Science and Ph.D. in Computational Neuroscience, both from MIT, where he continued as a postdoc with Eric Lander.

Follow us: Facebook Twitter Linkedin