required papers must be read and responded to before each class.

repeat papers were read for a previous class and should be reviewed again, but do not require a new response.

supplemental papers will be read, summarized, and presented to the class by one student; you are encouraged to do a first pass (skim) of the paper.

reference papers are a good reference if you want more information on a topic. You are not required to read them.

Submit reviews via the HotCRP site. All papers are also available from this site. You will receive an email to your usc email address adding you as a "PC member" for this site.

Introduction to Distributed Systems

Aug 23 - Class Overview & Introduction

How to Read a Paper. required (no summary or response)
Srinivasan Keshav.
CCR, 2007.

The Maintenance of Duplicate Databases. reference
Paul R. Johnson and Robert H. Thomas.
IETF RFC #677, 1975.

Aug 25 - Logical Time

Time, Clocks, and the Ordering of Events in a Distributed System. required (just summary)
Leslie Lamport.
Communications of the ACM, 1978.

Aug 30 - MapReduce

MapReduce: Simplified Data Processing on Large Clusters. required (just summary)
Jeffrey Dean and Sanjay Ghemawat.
OSDI, 2004.

Dryad: distributed data-parallel programs from sequential building blocks. supplemental
Michael Isard and Mihai Budiu and Yuan Yu and Andrew Birrell and Dennis Fetterly
Eurosys, 2007.

Sep 1 - Remote Procedure Calls & Numbers Everyone Should Know

Implementing Remote Procedure Call. reference
Andrew D. Birrell and Bruce Jay Nelson.
ACM TOCS, 1984.

Sep 6 - Assignment Overview, Go Tutorial, Git Tutorial (Guest Lecture: Haonan Lu)

Assignments available on github.

Fault Tolerance

Sep 8 - Fault Models & Replicated State Machines (Guest Lecture: Haonan Lu)

Implementing Fault-Tolerant Services Using the State Machine Approach: a Tutorial. required (just summary)
Fred B. Schneider.
ACM Computing Surveys, 1990.

Time, Clocks, and the Ordering of Events in a Distributed System. repeat
Leslie Lamport.
Communications of the ACM, 1978.

Sep 13 - Primary Backup Replication

Chain Replication for Supporting High Throughput and Availability. required
Robert Van Renesse and Fred Schneider.
OSDI, 2004.

Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads. supplemental
Jeff Terrace and Michael J. Freedman.
USENIX Annual Technical Conference (ATC), 2009.

Detecting Failures in Distributed Systems with the Falcon Spy Network. supplemental
Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish.
SOSP, 2011.

Sep 15 - Impossibility of Consensus (FLP)

Impossibility of Distributed Consensus with One Faulty Process. (FLP) required
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson.
Journal of the ACM, 1985.

Consensus in the presence of partial synchrony. supplemental
Cynthia Dwork, Nancy A. Lynch, and Larry Stockmeyer.
Journal of the ACM, 1988.

Revisiting the relationship between non-blocking atomic commitment and consensus. reference
Rachid Guerraoui.
Distributed Algorithms, 1995.

Sep 20 - Atomic Commit

A Formal Model of Crash Recovery in a Distributed System. (3PC) required
Dale Skeen and Michael Stonebreaker.
IEEE Trans. Software Engineering, 1983.

Sep 22 - Slack Day

We needed a slack day to catch up.

Sep 27 - Paxos

Paxos Made Simple. required
Leslie Lamport.
ACM SIGACT News, 2001.

Vive la différence: Paxos vs. viewstamped replication vs. zab. supplemental
Robbert Van Renesse, Nicolas Schiper, and Fred B Schneider.
IEEE Transactions on Dependable and Secure Computing.

The Chubby Lock Service for Loosely-Coupled Distributed Systems. supplemental
Mike Burrows.
OSDI, 2006.

The Part-Time Parliament. reference
Leslie Lamport.
ACM TOCS, 1998.

Viewstamped Replication Revisited. reference
Barbara Liskov and James Cowling.
MIT Tech Report, 2012.
(Originally described in PODC 1988 as Viewstamped Replication... by Oki and Liskov)

Zab: High-performance broadcast for primary-backup systems. reference
Flavio P Junqueira and Benjamin C Reed and Marco Serafini.
DSN 2011.

ZooKeeper: Wait-free Coordination for Internet-scale Systems. reference
Patrick Hunt and Mahadev Konar and Flavio Paiva Junqueira and Benjamin Reed.
USENIX ATC, 2010.

In Search of an Understandable Consensus Algorithm. (RAFT) reference
Diego Ongaro and John Ousterhout.
USENIX ATC, 2014.

Sep 29 - More Paxos, Quorums Systems

We'll discuss Paxos, Paxos novel responses, and Quorum Systems.

Mencius: building efficient replicated state machines for WANs. supplemental
Y. Mao, F. P. Junqueira, and K. Marzullo.
OSDI, 2008.

Oct 4 - Paxos Optimized

There is More Consensus in Egalitarian Parliaments. (EPaxos) required
Iulian Moraru, David G. Andersen, and Michael Kaminsky.
SOSP, 2013.

Designing Distributed Systems Using Approximate Synchrony in Data Center Networks. (Speculative Paxos) supplemental
Dan R. K. Ports, Jialin Li, Vincent Liu, Naveen Kr. Sharma, and Arvind Krishnamurthy.
NSDI, 2015.

Oct 6 - Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance. (PBFT) required
Miguel Castro and Barbara Liskov.
OSDI, 1999.

Zyzzyva: Speculative Byzantine fault tolerance. supplemental
Rama Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong.
SOSP, 2007.

Making Byzantine fault tolerant systems tolerate Byzantine faults. (Aardvark) supplemental
Allen Clement, Marco Marchetti, Edmund Wong, Lorenzo Alvisi, and Mike Dahlin.
NSDI, 2009

Oct 11 - Catchup Day

We'll finish discussing BFT and PBFT today.

Exam 1

Oct 13

In class.

Oct 18

Exam review.

Scalability, Consistency, and Transactions

Oct 20 - Distributed Hash Tables & Consistent Hashing

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. required
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan.
SIGCOMM, 2001.

Kademlia: A Peer-to-Peer Information System Based on the Xor Metric. supplemental
Petar Maymounkov and David Mazieres.
IPTPS, 2002.

Maglev: A Fast and Reliable Software Network Load Balancer. supplemental
Daniel E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher,
Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein.
NSDI, 2016.

Oct 25 - Linearizability & Sequential (Strong Consistency)

Linearizability: A Correctness Condition for Concurrent Objects. required
M. P. Herlihy and J. M. Wing.
ACM TOPLAS, 1990.

How to make a multiprocessor computer that correctly executes multiprocess programs. (Sequential Consistency) required (no summary, joint response)
Leslie Lamport.
IEEE Trans. Computer, 1979.

Sequential consistency versus linearizability. supplemental
H. Attiya and J. L. Welch.
ACM TOCS, 1994.

Oct 27 - Eventual Consistency & CAP

Dynamo: Amazon's Highly Available Key-Value Store. required
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman,
Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels.
SOSP, 2007.

Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. (CAP proof) required (just summary)
Seth Gilbert and Nancy Lynch.
ACM SIGACT News, 2002.

PNUTS: Yahoo!’s hosted data serving platform. supplemental
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni.
VLDB, 2008.

Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary. (RedBlue) supplemental
Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno M. Preguiça, and Rodrigo Rodrigues.
OSDI, 2012.

Nov 1 - Causal Consistency

Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS. required
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen.
SOSP, 2011.

Nov 3 - OSDI

No class.

Nov 8 - Quorum Systems, COPS Discussion, some Distributed Transactions

Transactional Storage for Geo-Replicated Systems. (Walter) supplemental
Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li.
SOSP, 2011.

Salt: Combining ACID and BASE in a Distributed Database. supplemental
Chao Xie, Chunzhi Su, Manos Kapritsos, Yang Wang, Navid Yaghmazadeh, Lorenzo Alvisi, and Prince Mahajan.
OSDI, 2014.

Nov 10 - Distributed Transactions Optimized

Spanner: Google's Globally Distributed Database. required
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, et al.
OSDI, 2012.
ACM TOCS, 2013

Extracting More Concurrency from Distributed Transactions. (Rococo) supplemental
Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, Jinyang Li
OSDI, 2014.

Building consistent transactions with inconsistent replication. (TAPIR) supplemental
Irene Zhang and Naveen Kr Sharma and Adriana Szekeres and Arvind Krishnamurthy and Dan RK Ports.
SOSP, 2015.

Scalable Atomic Visibility with RAMP Transactions. reference
Peter Bailis, Alan Fekete, Joseph M. Hellerstein, Ali Ghodsi, and Ion Stoica.
SIGMOD, 2014.

Modern Marvels

Nov 15 - Distributed Processing

Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. required
Matei Zaharia and Mosharaf Chowdhury and Tathagata Das and Ankur Dave and Justin Ma and Murphy McCauley and Michael J Franklin and Scott Shenker and Ion Stoica.
NSDI, 2012.

Nectar: Automatic Management of Data and Computation in Datacenters. supplemental
Pradeep Kumar Gunda and Lenin Ravindranath and Chandramohan A Thekkath and Yuan Yu and Li Zhuang.
OSDI, 2010.

Naiad: a timely dataflow system. supplemental
Derek G Murray and Frank McSherry and Rebecca Isaacs and Michael Isard and Paul Barham and Martin Abadi.
OSDI, 2010.

MapReduce: Simplified Data Processing on Large Clusters. repeat
Jeffrey Dean and Sanjay Ghemawat.
OSDI, 2004.

Dryad: distributed data-parallel programs from sequential building blocks. repeat
Michael Isard and Mihai Budiu and Yuan Yu and Andrew Birrell and Dennis Fetterly
Eurosys, 2007.

Nov 17 - Google Stack Day

Bigtable: A Distributed Storage System for Structured Data. required
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber.
OSDI, 2006.
ACM TOCS, 2008.

The Google File System. supplemental
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
SOSP, 2003.

Large-scale Incremental Processing Using Distributed Transactions and Notifications. (Percolator) supplemental
Daniel Peng and Frank Dabek.
OSDI, 2010.

MapReduce: Simplified Data Processing on Large Clusters. repeat
Jeffrey Dean and Sanjay Ghemawat.
OSDI, 2004.

The Chubby Lock Service for Loosely-Coupled Distributed Systems. repeat
Mike Burrows.
OSDI, 2006.

Thialfi: A Client Notification Service for Internet-Scale Applications. reference
Atul Adya, Gregory Cooper, Daniel Myers, and Michael Piatek.
SOSP, 2011.

F1 - The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business. reference
Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, and Phoenix Tong.
VLDB, 2013.

Spanner: Google's Globally Distributed Database. repeat
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, et al.
OSDI, 2012.
ACM TOCS, 2013.

Nov 22 - Facebook Stack Day

Tao: Facebook’s Distributed Data Store for the Social Graph. required
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo,
S. Kulkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani.
USENIX ATC, 2013.

Scaling Memcache at Facebook. supplemental
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy,
M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani.
NSDI, 2013.

Existential Consistency: Measuring and Understanding Consistency at Facebook. supplemental
Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt,
Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, and Wyatt Lloyd.
SOSP, 2015.

Finding a Needle in Haystack: Facebook’s Photo Storage. reference
D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel.
OSDI, 2010.

An Analysis of Facebook Photo Caching. reference
Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, and H. C. Li.
SOSP, 2013.

f4: Facebook’s Warm BLOB Storage System. reference
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu,
Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar.
OSDI, 2014.

RIPQ: Advanced Photo Caching on Flash for Facebook. reference
Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li.
FAST, 2015.

Analysis of HDFS Under HBase: A Facebook Messages Case Study. reference
T. Harter, D. Borthakur, S. Dong, A. S. Aiyer, L. Tang, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau.
FAST, 2014.

Wormhole: Reliable pub-sub to support geo-replicated internet service. reference
Yogeshwer Sharma, Philippe Ajoux, Petchean Ang, David Callies, Abhishek Choudhary, Laurent Demailly, et al.
NSDI, 2015.

Nov 24 - Thanksgiving

No Class.

Nov 29 - Pushing systems to their limits

MICA: a Holistic Approach to Fast In-Memory Key-Value Storage. required
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky.
NSDI, 2014.

Cache Craftiness for Fast Multicore Key-Value Storage. (MassTree) supplemental
Yandong Mao, Eddie Kohler, and Robert T. Morris.
EuroSys, 2012.

Dec 1 - Papers from OSDI 2016

To Be Determined.

Exam 2

Dec 8