Quick links

Colloquium

Stochastic Models on Networks: Reconstruction and Optimization

Date and Time
Wednesday, April 4, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Elchanan Mossel, from UC Berkeley
Host
Bernard Chazelle
Stochastic models on networks introduce novel algorithmic challenges. These challenges arise from diverse application fields, such as molecular biology, computer networks and social networks. In this talk I will survey some recent progress in this area. In particular, I will discuss the problems of estimating statistical quantities on a given network, reconstructing the network topology from observations at a subset of the nodes and optimization problems defined on stochastic networks.

Democratizing content distribution

Date and Time
Wednesday, March 28, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Michael J Freedman, from Stanford / NYU
Host
Vivek Pai
In order to reach their large audiences, today's Internet publishers primarily use content distribution networks (CDNs) to deliver content. Yet the architectures of the prevalent commercial systems are tightly bound to centralized control, static deployments, and trusted infrastructure, thus inherently limiting their scope to ensure cost recovery. This talk describes a number of techniques (and the resulting systems) that realize highly-scalable cooperative content distribution. I show how to build a fully self-organizing CDN that efficiently delivers content using unreliable nodes, describe how to transparently dispatch clients to nearby servers, and touch on how to securely leverage even untrusted resources. These ideas have been implemented, deployed, and tested in production systems currently serving several terabytes of data to more than a million people every day. The view of the Internet gained from deploying these systems answered long-standing questions about network configuration, while providing important insights that helped evolve our systems' designs.

Evolutionary Escape on Fitness Landscapes

Date and Time
Tuesday, March 27, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Niko Beerenwinkel, from Harvard
Host
Mona Singh
The evolution of HIV within individual patients is associated with disease progression and failure of antiretroviral drug therapy. Using graphical models we describe the development of HIV drug resistance mutations and show how these models improve predictions of the clinical outcome of combination therapy. We present combinatorial algorithms for computing the risk of escape of an evolving population on a given fitness landscape. The geometry of fitness landscapes and the underlying gene interactions are analyzed in an attempt to generalize the notion of pairwise epistasis to higher-order genetic systems. Finally, we discuss the new and exciting prospects for analyzing viral genetic variation that arises from recent pyro-sequencing technology.

Object Recognition and Feedforward models of the Ventral Stream in Visual Cortex: What is Next?

Date and Time
Thursday, March 15, 2007 - 4:15pm to 5:45pm
Location
Computer Science Large Auditorium (Room 104)
Type
Colloquium
Speaker
Tomaso Poggio, from MIT
Host
Fei-fei Li
T. Poggio (Center for Biological and Computational Learning, Computer Science and Artificial Intelligence Laboratory and McGovern Institute for Brain Research, M.I.T.)

Understanding the processing of information in our cortex is a significant part of understanding how the brain works and of understanding intelligence itself, arguably one of the greatest problems in science today. In particular, our visual abilities are computationally amazing and we are still far from imitating them with computers. Thus, visual cortex may well be a good proxy for the rest of the cortex and indeed for intelligence itself. But despite enormous progress in the physiology and anatomy of the visual cortex, our understanding of the underlying computations remains fragmentary.

I will briefly review hierarchical feedforward quantitative models of the ventral stream which, heavily constrained by physiology and biophysics, are surprisingly successful in explaining several physiological data and psychophysical results in rapid scene categorization. I will then focus on the limitations of such models for object recognition, suggesting specific questions about the computational role of attention and about recognition tasks beyond scene classification.

The Limits of Quantum Computers

Date and Time
Thursday, March 8, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Scott Aaronson, from University of Waterloo
Host
Boaz Barak
In the popular imagination, quantum computers would be almost magical devices, able to "solve impossible problems in an instant" by trying exponentially many solutions in parallel. In this talk, I'll describe four results in quantum computing theory that directly challenge this view.

First, I'll show that any quantum algorithm to decide whether a function f:[n]->[n] is one-to-one or two-to-one needs to query the function at least n^{1/5} times. This provides strong evidence that collision-resistant hash functions, and hence secure electronic commerce, would still be possible in a world with quantum computers.

Second, I'll show that in the "black-box" or "oracle" model that we know how to analyze, quantum computers could not solve NP-complete problems in polynomial time, even with the help of nonuniform "quantum advice states."

Third, I'll show that quantum computers need exponential time to find local optima -- and surprisingly, that the ideas used to prove this result also yield new classical lower bounds for the same problem.

Finally, I'll show how to do "pretty-good quantum state tomography" using a number of measurements that increases only linearly, not exponentially, with the number of qubits. This illustrates how one can sometimes turn the limitations of quantum computers on their head, and use them to develop new techniques for experimentalists.

No quantum computing background will be assumed.

Learning Predictive Models of Gene Regulation

Date and Time
Wednesday, March 7, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Christina Leslie, from Columbia University
Host
Robert Schapire
Studying the behavior of gene regulatory networks by learning from high-throughput genomic data has become one of the central problems in computational systems biology. Most work in this area has focused on learning structure from data -- e.g. finding clusters or modules of potentially co-regulated genes, or building a graph of putative regulatory "edges" between genes -- and has been successful at generating qualitative hypotheses about regulatory networks.

Instead of adopting the structure learning viewpoint, our focus is to build predictive models of gene regulation that allow us both to make accurate quantitative predictions on new or held-out experiments (test data) and to capture mechanistic information about transcriptional regulation. Our algorithm, called MEDUSA, integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, the algorithm learns to predict up/down expression of target genes by identifying condition-specific regulators and discovering regulatory motifs that may mediate their regulation of targets. We use boosting, a technique from statistical learning, to help avoid overfitting as the algorithm searches through the high dimensional space of potential regulators and sequence motifs. We will report computational results on the yeast environmental stress response, where MEDUSA achieves high prediction accuracy on held-out experiments and retrieves key stress-related transcriptional regulators, signal transducers, and transcription factor binding sites. We will also describe recent results on hypoxia in yeast, where we used MEDUSA to propose the first global model of the oxygen sensing and regulatory network, including new putative context-specific regulators. Through our experimental collaborator on this project, the Zhang Lab at Columbia University, we are in the process of validating our computational predictions with wet lab experiments, with encouraging preliminary results.

Looking at People

Date and Time
Wednesday, February 21, 2007 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
David Forsyth, from UIUC
Host
Fei-fei Li
There is a great need for programs that can describe what people are doing from video. This is difficult to do, because it is hard to identify and track people in video sequences, because we have no canonical vocabulary for describing what people are doing, and because the interpretation of what people are doing depends very strongly on what is nearby.

Tracking is hard, because it is important to track relatively small structures that can move relatively fast, for example,lower arms. I will describe research into kinematic tracking, tracking that reports the kinematic configuration of the body, that has resulted in a fairly accurate, fully automatic tracker, that can keep track of multiple people.

Once one has tracked the body, one must interpret the results. One way to do so is to have a motion synthesis system that takes the track, and produces a motion that is (a) like a human motion and (b) close to the track. Our work has produced a high-quality motion synthesis system that can produce motions that look very much like human activities. I will describe work that couples that system with a tracker to produce a description of the activities, entirely automatically.

It is often very difficult to obtain a rich set of examples of activities, because activities appear to compose in complex ways. I will describe methods to extract a representation of activity that can be searched with a text-like query.

I will speculate on some of the many open problems. What should one report? how do nearby objects affect one's interpretation of activities? how can one interpret patterns of behaviour?

We describe methods to determine what people are doing in a video sequence.

Reach for A*: Shortest Paths on Road Networks

Date and Time
Wednesday, November 15, 2006 - 4:15pm to 5:45pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Renato Werneck
Host
Robert Tarjan
We study the problem of finding point-to-point shortest paths in a setting where preprocessing is allowed. We are interested in exact algorithms only, and our focus is on road networks. Our algorithms use several different strategies to speed up the queries. First, we use A* search to direct the search towards the goal. Second, we use the notion of "reach" to identify and prune local intersections.

Third, we add shortcuts to the graph to reduce the number of highway (i.e., non-local) edges. Our best query algorithm is thousands of times faster than Dijkstra's algorithm on the road networks of the United States and Western Europe, each with roughly 20 million vertices. This is joint work with Andrew Goldberg, Chris Harrelson, and Haim Kaplan.

Models and Algorithms for Genomic Sequences, Proteins, and Networks of Protein Interactions

Date and Time
Wednesday, November 8, 2006 - 4:00pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Serafim Batzoglou, from Stanford
Host
Olga Troyanskaya
This talk has two parts: the first part is on new ways to model and analyze biological sequences; the second part describes methods for constructing and comparing protein interaction networks, which are emerging as canonical data sets of the post-genomic era.

Algorithms for biological sequence analysis. One of the most fruitful developments in bioinformatics in the past decade was the wide adoption of Hidden Markov Models (HMMs) and related graphical models to an array of applications such as gene finding, sequence alignment, and non-coding RNA folding. Conditional Random Fields (CRFs) are a recent alternative to HMMs, and provide two main advantages: (1) they enable more elaborate modeling of biosequences by allowing us to conveniently describe and select rich feature sets. For example, when comparing two residues during protein alignment, using a CRF allows leveraging in a principled manner the chemical properties of the neighborhood of those residues. (2) CRFs allow training of parameters in a way that is more effective for making predictions on new input sequences. I will describe three practical CRF-based tools that improve upon state-of-the-art methods in terms of accuracy: CONTRAlign, a protein aligner; CONTRAST, a gene finder; and CONTRAfold, a method for predicting the secondary structure of non-coding RNAs. Our tools are available at http://contra.stanford.edu.

Networks of protein interactions. Graphs that summarize pairwise interactions between all proteins of an organism have emerged as canonical data sets that can be constructed using multiple sources of functional genomic data. We construct protein interaction networks for all sequenced microbes by integrating information extracted from genomic sequences as well as microarrays and other predictors of pairwise interactions. We then align these networks in multiple species using Graemlin, a tool that we developed for that purpose, and search for modules (subgraphs) of proteins that exhibit homology as well as conservation of pairwise interactions among many organisms. Graemlin provides substantial speed and sensitivity gains compared to previous network alignment methods; it can be used to compare microbial networks at http://graemlin.stanford.edu.

Leaving the Span

Date and Time
Wednesday, October 25, 2006 - 4:00pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
Colloquium
Speaker
Manfred K. Warmuth, from University of California, Santa Cruz
Host
Robert Schapire
When linear models are too simple then the following "kernel trick" is commonly used: Expand the instances into a high-dimensional feature space and use any algorithm whose linear weight vector in feature space is a linear combination of the expanded instances. Linear models in feature space are typically non-linear in the original space and seemingly more powerful. Also dot products can still be computed efficiently via the use of a kernel function.

However we discuss a simple sparse linear problem that is hard to learn with any algorithm that uses a linear combination of the embedded training instances as its weight vector, no matter what embedding is used. We show that these algorithms are inherently limited by the fact that after seeing k instances only a weight space of dimension k can be spanned.

Surprisingly the same problem can be efficiently learned using the exponentiated gradient (EG) algorithm: Now the component-wise logarithms of the weights are essentially a linear combination of the training instances. This algorithm enforces "additional constraints" on the weights (all must be non-negative and sum to one) and in some cases these constraints alone force the rank of the weight space to grow as fast as 2k.

(Joint work with S.V.N. Vishwanathan!)

Follow us: Facebook Twitter Linkedin