Research and Projects
My research interests span the areas of distributed systems, security,
and networking. I particularly enjoy devising
technologies that make new functionality broadly available, often by
coupling principled designs with real-world deployments.
Interested in joining my group? Read this first.
Research funded through the National Science Foundation (Awards
#0831374, #0904860, #0904729, #0953197, #1040123, #1040708, ##1111734),
the Office of Naval Research (Young Investigator Program), DARPA (CSSG Program), the
GENI Project Office (#1759 and #1833D), the Alfred P. Sloan Foundation,
Cisco Systems, Google, Intel (Visual Computing and Cloud Computer Centers), and Princeton
University (Grand Challenges Program and Emerson Electric Award).
Ongoing research includes:
- Services in the untrusted cloud: (2009 - )
Designing services that can use servers in public clouds (i.e.,
third-party datacenters), yet provide privacy and security even in the
face of curious or malicious cloud providers. SPORC, for example,
provides a framework for building collaborative applications, such as
online, real-time text editors. Cloud servers in SPORC store and order
clients' encrypted operations, yet its clients use cryptographic
integrity checks to detect if servers act maliciously (by adding,
changing, dropping or reordering operations). Upon detecting faults,
clients can switch to alternative servers and automatically merge (by
reordering and transforming) any divergent state, and the system
supports dynamic access control even in the face of concurrent
membership changes. The work establishes the conceptual connection
between operational transformation and fork consistency.
- Service-centric networks. (2008 - )
Developing a new network architecture, SCAFFOLD, that makes it easier
for networked services to manage and even capitalize on system churn.
SCAFFOLD moves from human-readable host names to machine-readable
service identifiers, from individual packets to flows, and from unicast
communication to anycast with flow affinity. SCAFFOLD hides network
addresses from applications to enable dynamic remapping as end-points
change, (e.g., due to virtual-machine migration, failover, or device
mobility), and directs traffic based on successively-refined
identifiers to scale routing and limit churn. SCAFFOLD can be
incremental deployed in single or geographically-distributed
datacenters, or as a clean-slate Internet architecture. This project
entails the design, implementation, and evaluation of a socket API and
network stack for end-hosts, and a network infrastructure built using
OpenFlow and NOX.
- Virtual worlds: (2008 - )
Researching and building Internet-scale virtual worlds, with a focus on
scalability, extensibility, security, and federation. Ongoing work
leverages geometric- and physical-based constraints, using approaches
from computer graphics and computational geometry, to provide scalable
and efficient inter-object and inter-space communication, object
discovery, world segmentation, distributed storage and caching, and
dynamic content delivery.
- Fault-tolerant datacenter services: (2007 - )
With the trend towards sacrificing consistency for scale, this project
reconsiders the challenge of building distributed storage and
replicated state machines with strong robustness guarantees and at
scale. We do so by leveraging certain trends common to many of these
datacenter services, including simple data models, effective
object-based interfaces, and read-heavy object workloads, and we
achieve scale through a group compositional approach.
In the fail-stop model, CRAQ (Chain
Replication with Apportioned Queries) is capable of good
availability, high throughput, and low latency, while providing a
sliding scale of read consistency options (from eventual to
linearizable). We are also developing support in wide-area key-value
systems for our new notion of multi-key timeline consistency.
This model allows state in different datacenters to diverge, yet
ensures that dependent accesses to dependent state are consistent, even
while avoiding the overhead of transactional locking.
In the Byzantine (malicious) setting, Prophecy
introduces a new form of consistency called delay-once
consistency for replicated state machines.. While slightly weaker
than linearizability, Prophecy can achieve this consistency at a cost
almost identical to fully unreplicated reads for read-mostly workloads.
Finally, we are exploring algorithms that securely partition large
numbers of nodes (of which a certain fraction may be Byzantine) into
small groups, such that each group has a quorum of correct nodes, even
under malicious ``leave-and-join'' attacks.
- Programmable enterprise networks: (2006 - )
Design and implementation contributions to the clean-slate SANE and
backwards-compatible Ethane enterprise network
architecture. Ethane network switches provide connectivity through
on-demand virtual circuits, yet enforce security policies on a per-flow
basis through centrally-managed, atomic, auditable name bindings.
Multi-month deployment at Stanford served hundreds of both wired and
wireless hosts. Technology is now being commercialized by Nicira Networks and supported by
major switch vendors as part of the OpenFlow Switch Consortium.
DIFANE looks at preserving the capabilities of
Ethane-style networks (with fine-grained administrative policies on
flows), while decentralizing the processing of new flows and keeping
all packet processing completely in the dataplane. DIFANE can be
readily implemented with commodity switch hardware, since all dataplane
functions can be expressed in terms of wildcard rules that perform
simple actions on matching packets.
Recent work on Frenetic
focuses on developing a high-level programming language for OpenFlow
networks that enables writing programs in a declarative and
compositional style.
- Replica selection and anycast: (2005 - )
Designed and built OASIS, an
infrastructure for locality- and load-based server selection among
replicated Internet services. OASIS demonstrates using disparate
services to perform (potentially error-prone) network locality
measurement, and it scalably manages their state information. OASIS
was in production use between Nov. 2005--July 2009, handling thousands
of replicas from more than a dozen distributed services. Background
measurement studies examines the geographic locality of IP prefixes
and the efficacy of
virtual coordinate systems.
Its successor, DONAR, demonstrates a
constraint-based optimization that supports flexible service policies,
as opposed to heuristics or centralized solutions. DONAR robustly
decomposes the selection process into a decentralized algorithm.
Deployed since Nov. 2009 at a dozen commercial PoPs worldwide, DONAR
provides server selection (through DNS or HTTP) for a portion of
CoralCDN's traffic, as well as for services on the Measurement Lab testbed, an
open-source platform co-sponsored by the New America Foundation,
PlanetLab, and Google. This includes use by the FCC for its Consumer Broadband
Test.
- Content distribution and peer-to-peer systems: (2002 - )
Conceived and led the Coral Project. Designed and built an
Internet-scale, self-organizing web-content distribution network: CoralCDN uses a network of
cooperating DNS redirectors and HTTP proxies, backed by a decentralized
indexing infrastructure, to allow oblivious clients to transparently
download content from nearby servers, while avoiding distant or
heavily-loaded ones. CoralCDN has been in production use on 300-400
PlanetLab servers since March 2004, currently receiving around 50
million HTTP requests from more than 2 million clients per day, serving
several terabytes of data.
Research on the conceptual design of untrusted peer-assisted CDNs,
employed price
theory to study how peer demand can be efficiently matched to
available supply. This work characterizes the efficiency and
robustness gains that are enabled by price-based multilateral exchange,
as opposed to bilateral exchanges such as BitTorrent.
To scale beyond CoralCDN's limits, we are building a browser-based
P2P-CDN, Firecoral. Firecoral
enables mutually-distrustful users to ``share'' their browser caches,
yet ensures the authenticity of content and enables users to preserve
privacy by expressing flexible content sharing policies. Beta release
available.
- Privacy-preserving protocols. (2000 - )
Developed cryptographic protocols for private matching, which computes the
set intersection between two or more parties' inputs. PM uses the
properties of homomorphic encryption to privately evaluate a polynomial
representation of input sets. Subsequent work led to improved
constructions for private keyword
search based on oblivious pseudorandom functions, as well as
explicit consideration of detecting graph proximity for social networks.
Ongoing work looks at greatly improving performance for privacy-preserving data aggregation
between large numbers of participants. Techniques more logically
centralize functionality, but maintains both strong participant and
data privacy by dividing trust assumptions between two or more
non-colluding service providers.
Prior research includes:
- IP analytics: (2006 - 2007)
By instrumenting CoralCDN and third-party sites,
used active web content to measure and analyze the characteristics of
over 7 million clients with respect to ``edge technologies'' (NATs,
proxies, DNS and DHCP). Results quantify how Internet services can use
IP addresses to identify clients and enforce access-control decisions.
Commercialized historical and real-time techniques for proxy detection
and IP geolocation; acquired
by Quova, Inc. in November 2006 and
currently being tested by large Internet services.
-
Re: Reliable email: (2005 - 2007)
An email acceptance system that leverages social proximity for
automated whitelisting, to be used preceding a mail rejection system
that performs spam filtering. Re: employs our private matching protocols to
maintain user privacy. Recent analysis of privacy for social
networks led to more efficient protocols based only on symmetric-key
operations (or achieving stronger properties using bilinear maps).
- Secure distributed file systems and file dissemination: (2003 - 2005)
With a focus on settings with mutually-distrustful clients, Shark
provides a distributed file system that improves scalability and
performance through cooperative reads and semantic chunking, using
Coral's indexing layer to locate files. Yet Shark preserves traditional
semantics, manageability, and security. Other research provides
integrity guarantees for large files encoded with rateless erasure
codes, via a homomorphic hash
function that can verify downloaded blocks on-the-fly.
- Anonymity systems: (2000 - 2002)
Designed and implemented Tarzan, a peer-to-peer
anonymous IP network layer that is strongly resistant to traffic
analysis. Tarzan helped protect against Sybil attacks through the
selection of neighbors in a constrained and verifiable manner. Helped
design Free Haven, a distributed
system for the anonymous publishing, storage, and retrieval of
information.
|