Quick links

CS Department Colloquium Series

Network-Level Spam and Scam Defenses

Date and Time
Thursday, October 29, 2009 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Speaker
Professor Nick Feamster, from Georgia Tech
Host
Jennifer Rexford
This talk introduces a new class of methods called "behavioral blacklisting", which identify spammers based on their network-level behavior. Rather than attempting to blacklist individual spam messages based on what the message contains, behavioral blacklisting classifies a message based on how the message itself was sent (spatial and temporal traffic patterns of the email traffic itself). Behavioral blacklisting tracks the sending behavior of an email sender from a wide variety of vantage points and establishes "fingerprints" that are indicative of spamming behavior. Behavioral blacklisting can apply not only to email traffic, but also to the network-level behavior of hosting infrastructure for scam or phishing attacks. First, I will present a brief overview of our study of the network-level behavior of spammers. Second, I will describe two behavioral blacklisting algorithms that are based on insights from our study of the network-level behavior of spammers. Finally, I will describe our ongoing work applying similar behavioral detection techniques to detecting both online scam hosting infrastructure and phishing attacks.

Bio
Nick Feamster is an assistant professor in the College of Computing at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, including the design, measurement, and analysis of network routing protocols, network operations and security, and anonymous communication systems. He recently received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, and award papers at SIGCOMM 2006 (network-level behavior of spammers), the NSDI 2005 conference (fault detection in router configuration), Usenix Security 2002 (circumventing web censorship using Infranet), and Usenix Security 2001 (web cookie analysis).

Highway Dimension and Provably Efficient Shortest Path Algorithms

Date and Time
Wednesday, October 14, 2009 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Speaker
Andrew Goldberg, from Microsoft Research - Silicon Valley
Host
Robert Tarjan
Computing driving directions has motivated many shortest path heuristics that answer queries on continental scale networks, with tens of millions of intersections, in real time, and with very low storage overhead.

We give the first theoretical analysis of several underlying algorithms on a non-trivial class of networks. To do this, we introduce the notion of highway dimension. Our analysis works for networks with low highway dimension and gives a unified explanation of good performance for several seemingly different algorithms.

Joint work with Ittai Abraham, Amos Fiat, and Renato Werneck

Hidden Grammar: Advances in Data-Driven Models of Language

Date and Time
Wednesday, October 21, 2009 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
David Blei
With the field of computational linguistics' empirical revolution of the 1990s came the realization that human intuitions about language are insufficient for accurate and robust natural language technologies. The move from hand-written, rule-based models to data-driven techniques led to huge advances, yet we still leaned on human intuition for constructing annotated linguistic datasets. Despite major advances in this paradigm (some of which we'll discuss in this talk), we now know that, in the wild world of real and diverse linguistic data, natural language technology raised on expert-made annotations remains insufficient for real, robust applications.

In this talk we adopt the premise that unsupervised learning will, in the long run, be the way forward for learning computational models of language cheaply. We focus on dependency syntax learning without trees, beginning with the classic EM algorithm and presenting several ways to alter EM for drastically improved performance using crudely represented "knowledge" of linguistic universals. We then present more recent work in the empirical Bayesian paradigm, where we encode our background knowledge as a prior over grammars, applying inference to obtain hidden structure. Of course, "background knowledge" is still human intuition. We argue, however, that by representing this knowledge compactly in a prior distribution--far more compactly than the many decisions made in building treebanks--we can experimentally explore the connection between proposed linguistic universals and unsupervised learning.

This talk includes discussion of joint work with Shay Cohen, Dipanjan Das, Jason Eisner, Kevin Gimpel, Andre Martins, and Eric Xing.

On the Internet Someone Knows You Are a Dog

Date and Time
Wednesday, October 7, 2009 - 4:30pm to 5:30pm
Location
Computer Science Small Auditorium (Room 105)
Type
CS Department Colloquium Series
Host
Jennifer Rexford,
We have been examining the leakage of privacy on the Internet: how information related to individual users is aggregated as they browse seemingly unrelated Web sites. Thousands of Web sites across numerous categories, countries, and languages are studied to generate a "privacy footprint". I report on a longitudal study consisting of multiple snapshots of examination of such diffusion over five years. I'll talk about the technical ways by which third-party aggregators acquire data, the depth of user-related information acquired, the techniques for protecting privacy diffusion and limitations of such techniques. Such increasing aggregation of user-related data is carried out by a steadily decreasing number of entities: a handful are able to track users' movement across almost all of the popular web sites. Virtually all the protection techniques have significant limitations highlighting the seriousness of the problem and the need for alternate solutions.

I will also talk about a recent discovery of large-scale leakage of personally identifiable information (PII) via Online Social Networks (OSN). Third-parties can link PII with user actions both within OSN sites and elsewhere on non-OSN sites.

Follow us: Facebook Twitter Linkedin