Colloquium | Page 15 | Computer Science Department at Princeton University

Knowledge-lean Approaches to Natural Language Processing

Date and Time

Wednesday, April 28, 2004 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Lillian Lee, from Cornell University

Host

Robert Schapire

Website

http://www.cs.cornell.edu/home/llee/

In order to enable computers to understand and use natural language, a massive amount of knowledge, linguistic and otherwise, must be amassed. As a result, much recent research has focused on creating systems that automatically learn high-quality information about language, and about the world, directly from the statistics of unprocessed or minimally-processed language samples alone. As examples, I will focus on two lines of work. The first uses information-theoretic distributional clustering methods trained on large language samples to induce probabilistic models of linguistic co-occurrences. The second utilizes multiple-sequence-alignment algorithms, commonly employed in computational biology, to learn how to generate English versions of computer-generated proofs, creating texts whose quality rivals that of hand-crafted systems.

Portions of this talk are based on joint work with Regina Barzilay and with Fernando Pereira.

Cascading Behavior and Bursty Dynamics in Computational Models of Social Networks

Date and Time

Tuesday, April 27, 2004 - 4:30pm to 6:00pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Jon Kleinberg, from Cornell University

Host

Moses Charikar

Models for the processes by which ideas and influence propagate through a social network have been studied in a number of domains,including the diffusion of scientific and technological innovations,the sudden and widespread adoption of various strategies in game-theoretic settings, and the effects of ``word of mouth'' in the promotion of new products. Emerging on-line media such as Weblogs offer a compelling new setting in which to study such phenomena, since they are characterized by frequent ``topic bursts'' in which a virtual discussion spreads rapidly through the network of individuals who author on-line content.

A natural way to identify highly influential individuals in such network settings is to look for collections of nodes with the power to trigger large outbreaks of cascading behavior. In other words, if we can try to convince a subset of individuals to adopt a new behavior (e.g. to try a new product or take part in an on-line discussion),and the goal is to cause a large cascade of further adoptions, which set of individuals should we target? In this talk, we will consider such problems in several of the most widely studied models in social network analysis, and describe algorithms (obtained in joint work with David Kempe and Eva Tardos) with provable approximation guarantees for the problem of selecting the most influential set of nodes. The development of these algorithms indicates how influential sets must include members from diverse parts of the network, forcing heuristics for this problem to deal, at least implicitly, with the clustering of the underlying opulation. We conclude by discussing the intriguing relationship between models of influence propagation and the problem of identifying ``bursty'' topics, which is emerging as an important issue for searching on-line information resources.

Event (no name)

Date and Time

Thursday, April 15, 2004 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

TBD

Host

Virginia Hogan

Reverse-Engineering the Internet

Date and Time

Wednesday, April 14, 2004 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Neil Spring, from University of Washington

Host

Vivek Pai

Website

http://www.cs.washington.edu/homes/nspring/app/

The Internet is a network of competing networks. Independent network operators have access to the proprietary details of their own networks, but neither researchers nor network operators have had access to the detailed, global picture of the Internet needed to find and correct network vulnerabilities and evaluate new applications and protocols. My thesis work demonstrates that detailed topological and routing information regarding the global Internet and its constituent networks is within the reach of these communities. That is, that reverse-engineering the Internet can be made practical through innovative measurement and inference techniques. In this talk, I present my Rocketfuel system which efficiently maps networks in the Internet using only externally available information. To recover a reasonably complete network graph, Rocketfuel uses hundreds of traceroute servers as vantage points to collect paths through the network. For efficiency, Rocketfuel uses global routing information and prior measurements to guide further measurement. To give the maps structure, Rocketfuel uses the information encoded in router names to assign each router to a geographic location and uses the rest of the network graph to determine the role of each router in the network. Topology is only part of the picture; the rules that govern how packets are directed across the topology are equally important but impossible to measure directly. I will describe how routing can be inferred from paths not taken. Inferred routing allows us to summarize and predict which path a packet will traverse, as well as to find interesting configuration decisions. Understanding global Internet topology and routing allows operators to predict the effects of change and allows researchers to identify problems and demonstrate the effectiveness of their solutions using realistic Internet topologies.

Rational Kernels -- A General Classification Framework for the Analysis of Text and Speech

Date and Time

Monday, April 12, 2004 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Mehryar Mohri, from AT&T Labs

Host

Robert Schapire

Website

http://www.research.att.com/~mohri/

Most classification algorithms were originally designed for fixed-size vectors. However, important learning problems in natural language processing require the analysis of variable-length sequences and more generally distributions over variable-length sequences.

Rational kernels are a new family of similarity measures over variable-length sequences and their distributions. Many similarity measures commonly used in computational biology, such as the edit distance, the convolution kernels of Haussler, and other string kernels, are shown to be special cases of rational kernels.

This talk will describe general and efficient methods for computing rational kernels, and discuss some important convergence and closure properties. It will also report the results of experiments illustrating the successful use of rational kernels for several difficult prediction problems in text and speech processing.

[Joint work with Corinna Cortes and Patrick Haffner]

Internet Traffic Measurement: From Packets to Insight

Date and Time

Thursday, April 8, 2004 - 4:30pm to 6:00pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Cristian Estan, from University of California San Diego

Host

Larry Peterson

One of the main reasons for the success of the internet is its service model that emphasizes flexibility. While this freedom enabled the widespread deployment of the applications popular today such as email and Web, it has also greatly complicated the task of administering these networks. To understand how a network is being used, or whether it is being abused, an administrator must inspect the flow of packets and "infer" the intent of users and applications. Existing measurement solutions either lack the necessary detail, do not scale up to the speeds of today's networks or are not flexible enough to keep up to the speeds of today's networks or are not flexible enough to keep up with the ever changing traffic mix. I will present two approaches to improve the state of the art.

My first approach is to develop fast and accurate algorithmic building blocks that allow routers to collect better measurement data. For example, it is often necessary to identify large flows of traffic, the "heavy hitters". I will present multistage filters which quickly and scalably identify heavy-hitters. A second useful building block scalably estimates the number of active flows or IP addresses using a family bitmap algorithms. I will show theoretical and experimental evaluations of the effectiveness of these building blocks.

My second approach is to improve the flexibility of offline analysis through a new method of traffic characterization. The conventional approach is a static analysis specialized to capture flows, applications, or network-to-network traffic matrices. By contrast, my analysis dynamically and automatically produces hybrid traffic definitions that match the underlying usage. I will describe a publicly available tool called AutoFocus that I built to implement this analysis, and its use on various production networks to infer such varied phenomena as new worms, denial of service attacks, routing changes, and traffic periodicities.

Mechanism Design for Computationally Limited Agents

Date and Time

Wednesday, April 7, 2004 - 4:30pm to 6:00pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Kate Larson, from Carnegie Mellon University

Host

Kenneth Steiglitz

The frameworks of game theory and mechanism design have exerted significant influence on formal models of multiagent systems and electronic markets by providing tools for designing and analyzing systems in order to guarantee certain desirable outcomes. However, many game-theoretic models assume idealized rational decision makers interacting in prescribed ways. In particular, the models often ignore the fact that in many systems the agents are not fully rational, but instead have constraints on their computational capabilities which inhibit them from acting as fully rational agents would. This creates a potentially hazardous gap in game theory and automated negotiation as agents may not be motivated to behave in the expected way, leading to potentially undesirable outcomes.

In this talk I will present work aimed at bridging this gap. By explicitly incorporating the deliberation (computational and information gathering) actions of agents into their strategies, we are able to provide a theory of interaction for self-interested computationally-bounded agents. I will describe a new game-theoretic solution concept, the deliberation equilibrium and use this concept to analyze several classic and commonly used auctions. In particular, I will illustrate the complex strategic behavior which arises when agents have deliberation limitations, which has been overlooked by traditional analysis. I will conclude by discussing mechanism design principles for multiagent systems and electronic markets with resource-bounded agents.

Event (no name)

Date and Time

Tuesday, April 6, 2004 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

TBD

Host

Virginia Hogan

In Search of Seamless Collaboration

Date and Time

Monday, April 5, 2004 - 4:00pm to 5:30pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

Kori Inkpen, from Dalhousie University

Host

Maria Klawe

In real life we seamless interact with others. Humans have strong interpersonal skills that enable them to utilize many subtle cues to coordinate their actions and communication. Now introduce computers to the mix. We often spend more time struggling with our technology than using it to help achieve our goals. This is especially true when people want to work together in a face-to-face environment. Often, we pull out our pens and paper and go to work at the whiteboard,abandoning computer technology all together. However, as more and more of our information is in digital form, we are compelled to find a new way to seamlessly incorporate technology into our rich face-to-face interactions.

Dr. Inkpen and members of the EDGE Lab at Dalhousie University are investigating ways to more effectively support users natural, face-to-face collaborative interactions. This talk will present their research directions in the area of shared space collaboration, exploring novel computing environments, interaction styles, and user interfaces to support small group collaboration in co-located environments. Advances in this area will fundamentally change how people work and play together in the future.

Improving the End-to-End Availability of Internet -Based Systems

Date and Time

Thursday, March 25, 2004 - 4:30pm to 6:00pm

Location

Computer Science Small Auditorium (Room 105)

Type

Colloquium

Speaker

David Andersen, from MIT

Host

Larry Peterson

The end-to-end availability of Internet services is between two and three orders of magnitude worse than other important engineered systems, including the US airline system, the 911 emergency response system, and the US public telephone system. This talk makes two contriutions to improve end-to-end availability. First, a study of three years of data collected on a 31-site testbed explores why failures happen, and finds that access network failures, inter-provider and wide-area routing anomalies, domain name system faults, and server-side failures all have a role to play in reducing availability.

Second, an overlay network with new algorithms for end-to-end path selection improves availabilty by one or two orders of magnitude compared to the current state. A purely overlay-based system, RON (resilient overlay networks), deploys nodes in different organizations and networks, carefully measures and monitors the status of available paths, and relies on them to cooperatively route packets by way of each other to bypass faults. A second system, MONET (multihomed overlay networks), uses a combination of physical path redundancy and overlays within a network of cooperative Web proxies to improve the availability for Web users. Experimental evidence suggests that RON can reduce failures by a factor of six, and that with physical path redundancy, a six-site MONET eliminates almost all network-based failures.