The paper marked as required must be read and summarized before each class. Papers marked as repeat are papers we read for a previous class that are relevant to this class, but do not need to be reread. The unmarked papers are supplementary papers. One member of the class will read, summarize, and present them to the class.

Submit reviews via the HotCRP site. All papers are also available from this site. You will receive an email to your usc email address adding you as a "PC member" for this site.

NOTE: Slides have been moved to Piazza.

Introduction to Distributed Systems

Jan 13 - Introduction

How to Read a Paper. required (no summary)
Srinivasan Keshav.
CCR, 2007.

The Maintenance of Duplicate Databases. (no presenter)
Paul R. Johnson and Robert H. Thomas.
IETF RFC #677, 1975.

Jan 15 - Remote Procedure Calls (RPC) and MapReduce

MapReduce: Simplified Data Processing on Large Clusters. required (no summary)
Jeffrey Dean and Sanjay Ghemawat.
OSDI, 2004.

Implementing Remote Procedure Call. (no presenter)
Andrew D. Birrell and Bruce Jay Nelson.
ACM TOCS, 1984.

Jan 20 - Logical Time

Time, Clocks, and the Ordering of Events in a Distributed System. required
Leslie Lamport.
Communications of the ACM, 1978.

Jan 22 - Decomposing Safety and Liveness

Defining Liveness. required
Bowen Alpern and Fred B. Schneider.
Information Processing Letters, 1985.

Fault Tolerance

Jan 27 - Local Fault Tolerance: Atomicity, Logging, and Recovery

Torturing Databases for Fun and Profit. required
Mai Zheng, Joseph Tucek, Dachuan Huang, Feng Qin, Mark Lillibridge, Elizabeth S. Yang, Bill W Zhao and Shashank Singh.
OSDI, 2014.

All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. (Alice and Bob)
Thanumalayan S. Pillai, Vijay Chidambaram, Ramnatthan Alagappan,
Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau.
OSDI, 2014.
Presenter: Zahaib Akhtar

EXPLODE: A Lightweight, General System for Finding Serious Storage System Errors.
Junfeng Yang, Can Sar, and Dawson Engler.
OSDI, 2006.
Presenter: Jamie Tsao

Jan 29 - Distributed Fault Tolerance: Replicated State Machines & Fault Models

Implementing Fault-Tolerant Services Using the State Machine Approach: a Tutorial. required
Fred B. Schneider.
ACM Computing Surveys, 1990.

Time, Clocks, and the Ordering of Events in a Distributed System. repeat
Leslie Lamport.
Communications of the ACM, 1978.

All about Eve: Execute-Verify Replication for Multi-Core Servers.
Manos Kapritsos, Yang Wang, Vivien Quema, Allen Clement, Lorenzo Alvisi, and Mike Dahlin.
OSDI, 2012.
Presenter: Snigdha Goel

Feb 3 - Distributed Fault Tolerance: Primary Backup

Chain Replication for Supporting High Throughput and Availability. required
Robert Van Renesse and Fred Schneider.
OSDI, 2004.

Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads.
Jeff Terrace and Michael J. Freedman
USENIX Annual Technical Conference (ATC), 2009.
Presenter: Yaz Alabdulkarim

Hypervisor-based Fault-tolerance.
Thomas C Bressoud and Fred B Schneider.
SOSP, 1995.
Presenter: Weichen Zhao

Feb 5 - Continue Primary Backup + Jeff Dean Bekey Lecture

Feb 10 - Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance. (PBFT) required
Miguel Castro and Barbara Liskov.
OSDI, 1999.

Zyzzyva: Speculative Byzantine fault tolerance.
Rama Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong.
SOSP, 2007.
Presenter: Hassan Nawaz

Prophecy: Using History for High-Throughput Fault Tolerance.
Siddhartha Sen, Wyatt Lloyd, and Michael J. Freedman.
NSDI, 2010.
Presenter: Krishna Giri

The Next 700 BFT Protocols.
R. Guerraoui, N. Knezevic, V. Quema, and M. Vukolic.
Eurosys, 2010.
Presenter: Jun Chen

Making Byzantine fault tolerant systems tolerate Byzantine faults. (Aardvark)
Allen Clement, Marco Marchetti, Edmund Wong, Lorenzo Alvisi, and Mike Dahlin.
NSDI, 2009
Presenter: Yufeng Jiang

Group Communication

Feb 12 - Impossibility of Consensus

Impossibility of Distributed Consensus with One Faulty Process. (FLP) required
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson.
Journal of the ACM, 1985.

On the minimal synchronism needed for distributed consensus.
Danny Dolev, Cynthia Dwork, and Larry Stockmeyer.
Journal of the ACM, 1987.
Presenter: Alan Berryhill

Consensus in the presence of partial synchrony.
Cynthia Dwork, Nancy A. Lynch, and Larry Stockmeyer.
Journal of the ACM, 1988.
Presenter: Rui Tong

Feb 17 - Failure Detectors

Unreliable Failure Detectors for Reliable Distributed Systems. required
Tushar Deepak Chandra and Sam Toueg.
Journal of the ACM, 1996.

The Weakest Failure Detector for Solving Consensus.
Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg.
Journal of the ACM, 1996.
Presenter: Haoyu Huang

Detecting Failures in Distributed Systems with the Falcon Spy Network.
Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish.
SOSP, 2011.
Presenter: Khiem Ngo

Improving Availability in Distributed Systems with Failure Informers. (Pigeon)
Joshua B. Leners, Trinabh Gupta, Marcos K. Aguilera, and Michael Walfish.
NSDI, 2013.
Presenter: Karteek Murthy

Feb 19 – No Class, Instead Attend Colloquium

Real Time Analytics at Twitter. required
Karthik Ramasamy (Twitter)
Talk Abstract
4-515pm, SAL 101

Feb 24 - Atomic Commit

A Formal Model of Crash Recovery in a Distributed System. (3PC) required
Dale Skeen and Michael Stonebreaker.
IEEE Trans. Software Engineering, 1983.

Nonblocking commit protocols. (3PC in 1981)
Dale Skeen.
SIGMOD, 1981.
Presenter: Liz Cha

Determing the Last Process to Fail.
Dale Skeen.
ACM ToCS, 1985.
Presenter: Tushar Aggarwal

Mar 3 - Group Communication: Atomic Broadcast, Atomic Multicast, Causal Broadcast (After exam)

Lightweight Causal and Atomic Group Multicast. required
K Birman, A Schiper, P Stephenson.
ACM ToCS, 1991.

Asynchronous Consensus and Broadcast Protocols.
G. Bracha and S. Toueg.
Journal of the ACM, 1985.
Presenter: Hieu Nguyen

Total Order Broadcast and Multicast Algorithms: Taxonomy and Survey.
X. Defago, A. Schiper, and P Urban.
ACM Computing Surveys, 2004.
Presenter: Sivaramakrishnan Ramanathan

Genuine Atomic Multicast in Asynchronous Distributed Systems.
R. Guerraoui and A. Schiper.
Theoretical Computer Science, 2001.
Presenter: Prashant Nittoor

Exam 1

Feb 26

In class.

Consensus

Mar 5 - Paxos

The Part-Time Parliament. required
Leslie Lamport.
ACM TOCS, 1998.

Paxos Made Practical. required (no summary)
David Mazieres.
2007.

Viewstamped Replication Revisited.
Barbara Liskov and James Cowling.
MIT Tech Report, 2012.
Presenter: Lavish

In Search of an Understandable Consensus Algorithm. (RAFT)
Diego Ongaro and John Ousterhout.
USENIX ATC, 2014.
Presenter: Abhinav Sharma

Paxos Made Moderately Complex.
Robbert van Renesse.
2011.
Presenter: Karthik Kumarguru

Paxos Made Live.
Tushar D. Chandra, Robert Griesemer, Joshua Redstone.
PODC, 2007.

Mar 10 - Paxos Optimized

There is More Consensus in Egalitarian Parliaments. required
Iulian Moraru, David G. Andersen, and Michael Kaminsky.
SOSP, 2013.

Generalized Consensus and Paxos.
Leslie Lamport.
Microsoft Research Tech Report, 2005.
Presenter: Fei Yu

Mencius: building efficient replicated state machines for WANs.
Y. Mao, F. P. Junqueira, and K. Marzullo.
OSDI, 2008.
Presenter: Mohsin Ali

Consistency and Transactions

Mar 12 - Strong Consistency (Linearizability, Sequential Consistency)

Linearizability: A Correctness Condition for Concurrent Objects. required
M. P. Herlihy and J. M. Wing.
ACM TOPLAS, 1990.

How to make a multiprocessor computer that correctly executes multiprocess programs. (Sequential Consistency)
Leslie Lamport.
IEEE Trans. Computer, 1979.
Presenter: Sankar Krishnan

Sequential consistency versus linearizability.
H. Attiya and J. L. Welch.
ACM TOCS, 1994.
Presenter: Haonan Lu

Spring Break

Mar 17/19

No class, papers, or assignments.

Consistency and Transactions (continued)

Mar 24 - Weaker Consistency (Eventual, Causal) and CAP

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS. required
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen.
SOSP, 2011.

Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. (CAP proof)
Seth Gilbert and Nancy Lynch
ACM SIGACT News, 2002.

Mar 26 – SOSP Submission Day - No Class

Work on your SOSP submissions, or if you don't one your assignments for class!

Mar 31 - Weaker Consistency (Eventual, Causal) Continued

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS. repeat
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen.
SOSP, 2011.
Presenter: Wyatt Lloyd

Flexible Update Propagation for Weakly Consistent Replication. (Bayou)
K. Petersen, M. Spreitzer, D. Terry, M. Theimer, and A. Demers.
SOSP, 1997.
Presenter: Sankar Krishnan

PNUTS: Yahoo!’s hosted data serving platform.
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni
VLDB, 2008.
Presenter: Sivaramakrishnan Ramanathan

Stronger Semantics for Low-Latency Geo-Replicated Storage. (Eiger)
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen.
NSDI, 2013.
Presenter: Jamie Tsao

Apr 2 – Transactions

Concurrency Control in Distributed Database Systems. required
P. A. Bernstein and N. Goodman.
ACM Computing Surveys, 1981.

Concurrency Control and Recovery.
Michael J. Franklin.
The Computer Science and Engineering Handbook, 1997.

A Critique of ANSI SQL Isolation Levels.
Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil.
SIGMOD, 1995.

Apr 7 - Distributed Transactions

Spanner: Google's Globally Distributed Database. required
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, et al.
OSDI, 2012.
ACM TOCS, 2013

Transaction Management in the R* Distributed Database Management System.
C. Mohan, B. Lindsay, and R. Obermarck.
ACM TODS, 1986.

Transactional Storage for Geo-Replicated Systems. (Walter)
Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li.
SOSP, 2011.
Presenter: Khiem Ngo

Calvin: Fast Distributed Transactions for Partitioned Database Systems.
A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi.
SIGMOD, 2012.
Presenter: Abhinav Sharma

Extracting More Concurrency from Distributed Transactions. (Rococo)
Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, Jinyang Li
OSDI, 2014.
Presenter: Haonan Lu

Distribute Everything

Apr 9 - Distributed File Systems

The Google File System. required
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
SOSP, 2003.

Frangipani: A Scalable Distributed File System.
Chandramohan A. Thekkath, Timothy Mann, and Edward K. Lee.
SOSP, 1997.
Presenter: Karteek Murthy

Replication in the Harp File System.
Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba Shrira, and Michael Williams.
SOSP, 1991.
Presenter: Prashant Rao Nittoor

Apr 14 - Distributed Logs

CORFU: A Shared Log Design for Flash Clusters. required
Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, and John D. Davis.
NSDI, 2012.
ACM TOCS, 2013.

Tango: Distributed Data Structures Over a Shared Log.
Mahesh Balakrishnan, Dahlia Malkhi, Ted Wobber, Ming Wu, Vijayan Prabhakaran,
Michael Wei, John D. Davis, Sriram Rao, Tao Zou, and Aviad Zuck
SOSP, 2013.
Presenter: Jun Chen

Durability with BookKeeper.
Flavio P. Junqueira, Ivan Kelly, and Benjamin Reed.
SIGOPS OSR, 2013.
Presenter: Rui Tong

Apr 16 - Distributed Debugging

Improving Visibility of Distributed Systems through Execution Tracing. (X-trace) required
Rodrigo Fonseca.
PhD Thesis, 2008.

X-Trace: A Pervasive Network Tracing Framework. repeat
Rodrigo Fonseca, George Porter, Randy Katz, Scott Shenker, and Ion Stoica.
NSDI, 2007.

Read Chapters 2, 3, 4, and 6 of Rodrigo's Thesis. You can skim Chapter 4.

The NSDI paper is for reference only, you don't need to read it for this class.

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure.
Benjamin H. Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson,
Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag.
Google Tech Report, 2010.
Presenter: Karthik

Pinpoint: Problem Determination in Large, Dynamic Internet Services.
Mike Y. Chen, Emre Kiciman, Eugene Fratkin, Armando Fox, and Eric Brewer.
DSN, 2002.

Diagnosing Performance Changes by Comparing Request Flows. (Spectroscope)
Raja R. Sambasivan, Alice X. Zheng, Michael De Rosa, Elie Krevat, Spencer Whitman,
Michael Stroucken, William Wang, Lianghong Xu, Gregory R. Ganger.
NSDI, 2011. Presenter: Weichen Zhao

Modern Marvels

Apr 21 - Google Stack Day

You choose your required paper for today. It must be a new paper for you, i.e., it is not a repeat of a paper you've summarized in this or a previous class.

The Google File System. repeat
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
SOSP, 2003.

MapReduce: Simplified Data Processing on Large Clusters. repeat
Jeffrey Dean and Sanjay Ghemawat.
OSDI, 2004.

Bigtable: A Distributed Storage System for Structured Data.
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wal- lach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber.
OSDI, 2006.
ACM TOCS, 2008.
Presenter: Lavish

Large-scale Incremental Processing Using Distributed Transactions and Notifications. (Percolator)
Daniel Peng and Frank Dabek.
OSDI, 2010.
Presenter: Zahaib Aktar

Thialfi: A Client Notification Service for Internet-Scale Applications.
Atul Adya, Gregory Cooper, Daniel Myers, and Michael Piatek.
SOSP, 2011.
Presenter: Fei Yu

F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business.
Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, and Phoenix Tong.
VLDB, 2013.

Spanner: Google's Globally Distributed Database. repeat
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, et al.
OSDI, 2012.
ACM TOCS, 2013.

Apr 23 - Facebook Stack Day

You choose your required paper for today. It must be a new paper for you, i.e., it is not a repeat of a paper you've summarized in this or a previous class.

Finding a Needle in Haystack: Facebook’s Photo Storage.
D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel.
OSDI, 2010.
Presenter: Yufeng Jiang

Scaling Memcache at Facebook.
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy,
M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani.
NSDI, 2013.
Presenter: Alan Berryhill

Tao: Facebook’s Distributed Data Store for the Social Graph.
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo,
S. Kulkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani.
USENIX ATC, 2013.
Presenter: Yazeed Alabdulkarim

Analysis of HDFS Under HBase: A Facebook Messages Case Study.
T. Harter, D. Borthakur, S. Dong, A. S. Aiyer, L. Tang, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau.
FAST, 2014.

An Analysis of Facebook Photo Caching.
Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, and H. C. Li.
SOSP, 2013.

f4: Facebook’s Warm BLOB Storage System.
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu,
Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar.
OSDI, 2014.
Presenter: Hieu Nguyen

RIPQ: Advanced Photo Caching on Flash for Facebook.
Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li.
FAST, 2015.

Apr 28 - Pushing systems to their limits

You choose your required paper for today. It must be a new paper for you, i.e., it is not a repeat of a paper you've summarized in this or a previous class.

FAWN: a Fast Array of Wimpy Nodes.
David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan.
SOSP, 2009.
Presenter: Tushar Aggarwal

TritonSort: A Balanced Large-Scale Sorting System.
Alexander Rasmussen, George Porter, Michael Conley, Harsha Madhyastha, Radhika Niranjan Mysore, Alexander Pucher, Amin Vahdat.
NSDI, 2011.
Presenter: Krishna Giri

SILT: a Memory-Efficient, High-Performance Key-Value Store.
Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky.
SOSP, 2011.
Presenter: Liz Cha

Cache Craftiness for Fast Multicore Key-Value Storage. (MassTree)
Yandong Mao, Eddie Kohler, and Robert T. Morris.
EuroSys, 2012.
Presenter: Hassan Nawaz

MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing.
Bin Fan, David G. Andersen, Michael Kaminsky.
NSDI, 2013.
Presenter: Snigdha Goel

Speedy Transactions in Multicore In-Memory Databases. (Silo)
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden.
SOSP, 2013.
Presenter: Haoyu Huang

MICA: a Holistic Approach to Fast In-Memory Key-Value Storage.
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky.
NSDI, 2014.
Presenter: Mohsin

Exam 2

Apr 30

In class.

Deferred Discussions

Unfortunately, we ran out of time at the end of the semester and we did not get to cover the following topics. If you want to discuss any of the following topics with me (and the interested subset of the class), please email me which topics you are interested in and your availability in May (i.e., if/when you leave for an internship).

Alternatively, I think we can cover these topics in the Fall semester version of the class, so you're welcome to sit in on those classes.

Distributed Hash Tables

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. required
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan.
SIGCOMM, 2001.

A Scalable Content-Addressable Network. (CAN)
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker.
SIGCOMM, 2001.

Kademlia: A Peer-to-Peer Information System Based on the Xor Metric.
Petar Maymounkov and David Mazieres.
IPTPS, 2002.

Canon in G major: designing DHTs with hierarchical structure.
Prasanna Ganesan, Krishna Gummadi, and Hector Garcia-Molina.
ICDCS, 2004.

Peer-to-peer (Bitcoin)

Bitcoin and second-generation cryptocurrencies. required
Joseph Bonneau, Andrew Miller, Jeremy Clark, Arvind Narayanan, Joshua A. Kroll, and Edward W. Felten.
S & P, 2015.

Bitcoin: A Peer-to-Peer Electronic Cash System.
Satoshi Nakamoto.
2008.