COS 597D, Fall 2013
Questions on Incremental Distributed Processing (Peng and Dabek)

Due at 1:30pm, Wednesday November 20, 2013.
You may hand in a paper copy or email a file to me.
Keep a copy for your use during class discussion.

No credit for late submission.



For our Nov. 20 discussion,  we consider the paper by Peng and Dabek presenting Google's strategy of incrementally updating its search index instead of using versions of the index batch-produced using MapReduce. The questions below ask about the main ideas, and your answers should be brief.  As usual, we may wish to dig deeper in class discussion. 


1.  What are the main techniques (at least 2) used by Percolator to achieve efficient incremental processing? 

2.  What guarantees does snapshot isolation provide?  What doesn't it provide?

3.  What are the gains in using Percolator over MapReduce?

4.  Discuss some of the vulnerabilities or high-cost aspects of the Percolator system.