Quick links

Highly Available Byzantine Fault Tolerant Distributed Systems

Date and Time
Tuesday, December 1, 2009 - 12:30pm to 1:30pm
Computer Science 402
Atul Singh, from NEC Labs (Princeton)
Michael Freedman
Many distributed services are hosted at large, shared, geographically diverse data centers, and they use replication to achieve high availability despite the unreachability of an entire data center. Recent events show that non-crash faults occur in these services and may lead to long outages, for example, Amazon's S3 service was down for at least 7 hours recently due to a Byzantine fault in their servers. While Byzantine-Fault Tolerance (BFT) could be used to withstand these faults, current BFT protocols can become unavailable if a small fraction of their replicas are unreachable. This is because existing BFT protocols favor strong safety guarantees (consistency) over liveness (availability).

In this talk, I will present a novel BFT state machine replication protocol called Zeno that trades consistency for higher availability. In particular, Zeno replaces strong consistency (linearizability) with a weaker guarantee (eventual consistency): clients can temporarily miss each other's updates but when the network is stable the states from the individual partitions are merged by having the replicas agree on a total order for all requests. Evaluation of a prototype of Zeno shows that Zeno provides better availability than traditional BFT protocols.

Atul Singh is a Researcher at the NEC Labs, Princeton. He received his PhD in Computer Science from Rice University and spent last two years visiting the Max Planck Institute for Software Systems (MPI-SWS), Saarbrucken, Germany. Before that, he spent two years visiting Intel Research Berkeley, working with the P2 group. His interests lie in the area of dependable distributed systems, overlay networks, declarative networking, and is currently focusing on exciting challenges emerging in the cloud computing arena.

Follow us: Facebook Twitter Linkedin