home · group · pubs · teaching · cv · bio · contact

Research and Projects

    My research interests span the areas of distributed systems, security, and networking. I particularly enjoy devising technologies that make new functionality broadly available, often by coupling principled designs with real-world deployments.

    Interested in joining my group? Read this first.

    Research funded through the National Science Foundation (Awards #0831374, #0904860, #0904729, #0953197, #1040123, #1040708, ##1111734), the Office of Naval Research (Young Investigator Program), DARPA (CSSG Program), the GENI Project Office (#1759 and #1833D), the Alfred P. Sloan Foundation, Cisco Systems, Google, Intel (Visual Computing and Cloud Computer Centers), and Princeton University (Grand Challenges Program and Emerson Electric Award).


    Ongoing research includes:

  • Services in the untrusted cloud: (2009 - )   Designing services that can use servers in public clouds (i.e., third-party datacenters), yet provide privacy and security even in the face of curious or malicious cloud providers. SPORC, for example, provides a framework for building collaborative applications, such as online, real-time text editors. Cloud servers in SPORC store and order clients' encrypted operations, yet its clients use cryptographic integrity checks to detect if servers act maliciously (by adding, changing, dropping or reordering operations). Upon detecting faults, clients can switch to alternative servers and automatically merge (by reordering and transforming) any divergent state, and the system supports dynamic access control even in the face of concurrent membership changes. The work establishes the conceptual connection between operational transformation and fork consistency.

  • Service-centric networks. (2008 - )   Developing a new network architecture, SCAFFOLD, that makes it easier for networked services to manage and even capitalize on system churn. SCAFFOLD moves from human-readable host names to machine-readable service identifiers, from individual packets to flows, and from unicast communication to anycast with flow affinity. SCAFFOLD hides network addresses from applications to enable dynamic remapping as end-points change, (e.g., due to virtual-machine migration, failover, or device mobility), and directs traffic based on successively-refined identifiers to scale routing and limit churn. SCAFFOLD can be incremental deployed in single or geographically-distributed datacenters, or as a clean-slate Internet architecture. This project entails the design, implementation, and evaluation of a socket API and network stack for end-hosts, and a network infrastructure built using OpenFlow and NOX.

  • Virtual worlds: (2008 - )   Researching and building Internet-scale virtual worlds, with a focus on scalability, extensibility, security, and federation. Ongoing work leverages geometric- and physical-based constraints, using approaches from computer graphics and computational geometry, to provide scalable and efficient inter-object and inter-space communication, object discovery, world segmentation, distributed storage and caching, and dynamic content delivery.

  • Fault-tolerant datacenter services: (2007 - )   With the trend towards sacrificing consistency for scale, this project reconsiders the challenge of building distributed storage and replicated state machines with strong robustness guarantees and at scale. We do so by leveraging certain trends common to many of these datacenter services, including simple data models, effective object-based interfaces, and read-heavy object workloads, and we achieve scale through a group compositional approach.

    In the fail-stop model, CRAQ (Chain Replication with Apportioned Queries) is capable of good availability, high throughput, and low latency, while providing a sliding scale of read consistency options (from eventual to linearizable). We are also developing support in wide-area key-value systems for our new notion of multi-key timeline consistency. This model allows state in different datacenters to diverge, yet ensures that dependent accesses to dependent state are consistent, even while avoiding the overhead of transactional locking.

    In the Byzantine (malicious) setting, Prophecy introduces a new form of consistency called delay-once consistency for replicated state machines.. While slightly weaker than linearizability, Prophecy can achieve this consistency at a cost almost identical to fully unreplicated reads for read-mostly workloads. Finally, we are exploring algorithms that securely partition large numbers of nodes (of which a certain fraction may be Byzantine) into small groups, such that each group has a quorum of correct nodes, even under malicious ``leave-and-join'' attacks.

  • Programmable enterprise networks: (2006 - )   Design and implementation contributions to the clean-slate SANE and backwards-compatible Ethane enterprise network architecture. Ethane network switches provide connectivity through on-demand virtual circuits, yet enforce security policies on a per-flow basis through centrally-managed, atomic, auditable name bindings. Multi-month deployment at Stanford served hundreds of both wired and wireless hosts. Technology is now being commercialized by Nicira Networks and supported by major switch vendors as part of the OpenFlow Switch Consortium.

    DIFANE looks at preserving the capabilities of Ethane-style networks (with fine-grained administrative policies on flows), while decentralizing the processing of new flows and keeping all packet processing completely in the dataplane. DIFANE can be readily implemented with commodity switch hardware, since all dataplane functions can be expressed in terms of wildcard rules that perform simple actions on matching packets.

    Recent work on Frenetic focuses on developing a high-level programming language for OpenFlow networks that enables writing programs in a declarative and compositional style.

  • Replica selection and anycast: (2005 - )   Designed and built OASIS, an infrastructure for locality- and load-based server selection among replicated Internet services. OASIS demonstrates using disparate services to perform (potentially error-prone) network locality measurement, and it scalably manages their state information. OASIS was in production use between Nov. 2005--July 2009, handling thousands of replicas from more than a dozen distributed services. Background measurement studies examines the geographic locality of IP prefixes and the efficacy of virtual coordinate systems.

    Its successor, DONAR, demonstrates a constraint-based optimization that supports flexible service policies, as opposed to heuristics or centralized solutions. DONAR robustly decomposes the selection process into a decentralized algorithm. Deployed since Nov. 2009 at a dozen commercial PoPs worldwide, DONAR provides server selection (through DNS or HTTP) for a portion of CoralCDN's traffic, as well as for services on the Measurement Lab testbed, an open-source platform co-sponsored by the New America Foundation, PlanetLab, and Google. This includes use by the FCC for its Consumer Broadband Test.

  • Content distribution and peer-to-peer systems: (2002 - )   Conceived and led the Coral Project. Designed and built an Internet-scale, self-organizing web-content distribution network: CoralCDN uses a network of cooperating DNS redirectors and HTTP proxies, backed by a decentralized indexing infrastructure, to allow oblivious clients to transparently download content from nearby servers, while avoiding distant or heavily-loaded ones. CoralCDN has been in production use on 300-400 PlanetLab servers since March 2004, currently receiving around 50 million HTTP requests from more than 2 million clients per day, serving several terabytes of data.

    Research on the conceptual design of untrusted peer-assisted CDNs, employed price theory to study how peer demand can be efficiently matched to available supply. This work characterizes the efficiency and robustness gains that are enabled by price-based multilateral exchange, as opposed to bilateral exchanges such as BitTorrent.

    To scale beyond CoralCDN's limits, we are building a browser-based P2P-CDN, Firecoral. Firecoral enables mutually-distrustful users to ``share'' their browser caches, yet ensures the authenticity of content and enables users to preserve privacy by expressing flexible content sharing policies. Beta release available.

  • Privacy-preserving protocols. (2000 - )   Developed cryptographic protocols for private matching, which computes the set intersection between two or more parties' inputs. PM uses the properties of homomorphic encryption to privately evaluate a polynomial representation of input sets. Subsequent work led to improved constructions for private keyword search based on oblivious pseudorandom functions, as well as explicit consideration of detecting graph proximity for social networks. Ongoing work looks at greatly improving performance for privacy-preserving data aggregation between large numbers of participants. Techniques more logically centralize functionality, but maintains both strong participant and data privacy by dividing trust assumptions between two or more non-colluding service providers.

    Prior research includes:

  • IP analytics: (2006 - 2007)   By instrumenting CoralCDN and third-party sites, used active web content to measure and analyze the characteristics of over 7 million clients with respect to ``edge technologies'' (NATs, proxies, DNS and DHCP). Results quantify how Internet services can use IP addresses to identify clients and enforce access-control decisions. Commercialized historical and real-time techniques for proxy detection and IP geolocation; acquired by Quova, Inc. in November 2006 and currently being tested by large Internet services.

  • Re: Reliable email: (2005 - 2007)   An email acceptance system that leverages social proximity for automated whitelisting, to be used preceding a mail rejection system that performs spam filtering. Re: employs our private matching protocols to maintain user privacy. Recent analysis of privacy for social networks led to more efficient protocols based only on symmetric-key operations (or achieving stronger properties using bilinear maps).

  • Secure distributed file systems and file dissemination: (2003 - 2005)   With a focus on settings with mutually-distrustful clients, Shark provides a distributed file system that improves scalability and performance through cooperative reads and semantic chunking, using Coral's indexing layer to locate files. Yet Shark preserves traditional semantics, manageability, and security. Other research provides integrity guarantees for large files encoded with rateless erasure codes, via a homomorphic hash function that can verify downloaded blocks on-the-fly.

  • Anonymity systems: (2000 - 2002)   Designed and implemented Tarzan, a peer-to-peer anonymous IP network layer that is strongly resistant to traffic analysis. Tarzan helped protect against Sybil attacks through the selection of neighbors in a constrained and verifiable manner. Helped design Free Haven, a distributed system for the anonymous publishing, storage, and retrieval of information.