Princeton
University
|
Computer Science 598C
|
Spring 2013 |
Directory
General Information | Syllabus | Projects
2/4: Organizational meet (digital universe)
2/8: MapReduce, datasets and project ideas, (Google paper, Dean’s keynote,
Chapter 2 of MMDS, critic blog which has been removed)
Warm-up exercise:
Follow these instructions to access the C8 cluster. Modify the given WordCount example (source code is at HDFS:/user/fuse/WordCount.java) such that your code will clean up the corpus documents (HDFS:/user/fuse/essays) and output the frequencies of words in a sorted order. Your final program will be submitted to department dropbox (we will setup shortly). Due: 2/22.
Linpeng Tang (reading review, secondary high-level structure, ICLM’12)
Sharvanath Pathak (reading GFS, SOSP’03, secondary Chubby, OSDI’06)
submit reading nodes
Guest lecture by Dr. Phillip Shilane from EMC (reading acceleration for backup, FAST’12)
Brian Tubergen (reading protocol independent dedup, SIGCOMM’00, secondary network dedup acceleration)
submit reading nodes
submit warmup exercise
2/24
Submit
project proposals
Guest lecture by Dr. Ruoming Pang from Google (reading Google’s globally distributed DB, OSDI’12)
Mehmet Basbug
(primary: LSH (VLDB’99), secondary Muti-probe (VLDB’07)).
Guest: Dr. William Hanson (CMO of UPenn Hospital) on medical data and projects
Submit reading notes
Madhuvanthi Jayakumar (Primary: Google’s globally distributed storage OSDI’10)
Nayden Nedev (Primary: Google Data center network, CACM’12, SDN)
Christian Edbank (Primary: KNN-Decent, WWW’11, Secondary: 7.1-7.3 of MMDS,)
Submit reading notes
Akshay Mittal (primary: Ramcloud, secondary: Spark)
Srinivas Narayana (primary: MMDS 4.1-4.4, secondary Opensketch)
Andrew Werner (Primary: Spanner OSDI’12, secondary Paxos)
Submit reading notes
Guest lecture by Dr. Sanjeev Kumar from Facebook (reading Facebook’s photo storage, OSDI’10)
Xiao Li (Primary: Amazon’s key-value store SOSP’07, secondary SILT SOSP’11).
Submit reading notes
Project mid-term
progress reports and short presentations,
submit
Eric First (Primary: MMDS 3.5, Secondary: graph similarity)
Wathsala Wathawana (Primary: layered naming, Secondary: Chord)
Muneeb Ali (Primary: Kahn’s paper, Secondary: Lampson’s paper)
Submit reading notes
Mike Mckeown (Primary: energy-proportional computing; Secondary: energy-efficient mapreduce)
Trevor Bannard (Primary: ImageNet construction; Secondary: classification with 10,000 categories)
Robert Sami (Primary: 9.1-9.2 MMDS book; Secondary: 9.3 MMDS book)
Submit reading notes
Sachin Ravi (Primary: external hashing, secondary: unpublished)
Marcela Melara (Primary: info leak in cloud; secondary: attack-resource)
Tri Nguyen (Primary: energy-proportional storage; secondary: manycore key-value store)
Submit reading notes
Guest lecture by Prof. Moses Charikar (various topics)
Alp Kutlualp (primary: chapter 11.1 and 11.2, secondary: chapter 11.3http://i.stanford.edu/~ullman/pub/ch11.pdf)
Submit reading notes
All: Project demos/presentations (each 15 minutes)
5/19
Project report due, Submit
Mining of Massive Data Sets. Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman. Cambridge University Press. 2011.
You can download the latest book from an author’s webpage.
Google paper, Dean’s keynote, Chapter 2 of MMDS, critic blog (removed but interesting for discussion)
GFS (SOSP’03), BigTable, (OSDI’06), Google cluster(IEEE’03), Google Data center network (CACM’12),
Amazon’s key-value store (SOSP’07), SILT (SOSP’11)
Facebook’s photo storage (OSDI’10), Google’s globally distributed storage (OSDI’10), Google’s globally distributed DB (OSDI’12), Microsoft’s Azure storage (USENIX’12).
Venti (FAST’02), Data domain DDFS (FAST’08), others (TBD).
WAN protocol independent dedup (SIGCOMM’00), LBFS (SOSP’01), WAN acceleration for backup (FAST’12), others (TBD).
Image SIFS (ICCV’99)), Audio MFCC ( ), others (TBD)
review, high-level structure (ICLM’12), others (TBD)
Imagenet (CVPR’09, ECCV’10), others (TBD)
Section 3.5 of MMDS, papers (TBD)
Section 3.2-3.3 of MMDS, Fingerprinting, Document resemblance
Section 3.4 of MMDS, LSH (VLDB’99), Muti-probe (VLDB’07), Posteriori Multi-probe, …
Curse of Dimensionality (TBD), PCA (TBD), Sketches (TBD), others (TBD)
Sampling (Section 4.1 and 4.2 of MMDS), Bloom filter (Section 4.3 of MMDS), Counting (section 4.4 of MMDS), others (TBD)
Section 6.3 of MMDS
K-means (Section 7.3 of MMDS), KNN-Decent (WWW’11), others (TBD)
Parts of Chapter 5 of MMDS
Parts of chapter 10 of MMDS