COS 597D, Fall 2013
Questions on data distribution in noSQL papers

Due at 1:30pm, Wednesday October 23, 2013.
You may hand in a paper copy or email a file to me.
Keep a copy for your use during class discussion.

No credit for late submission.



The sections of papers assigned for Oct. 23 discuss the distributed storage and access of data in Bigtable and Cassandra.  Sections of the paper describing the Google distributed file system (GFS) are included because Bigtable relies on that system to take care of some of the issues of distributed storage.  You should consider GFS to be part of Bigtable for the purpose of answering the questions below.   The questions below ask about the main ideas, and your answers should be brief.  We may wish to dig deeper in class discussion. 

1.  There are many ways one might organize the distributed storage of data structured as rows and columns.   What design decisions are shared by both Bigtable and Cassandra? 

2.  Bigtable uses a "master server" and "tablet servers"  to manage the reading and writing of data;  Cassandra does not have a distinguished master node.   What are the pros and cons of each architecture?

3.  How is replication handled in each of Bigtable and Cassandra?

4.  What are the main steps in reading and writing data in Bigtable?

5. What are the main steps in reading and writing data in Cassandra?