Princeton University
Computer Science Department

Computer Science 598C
Analytics and Systems of Big Data

 

Spring 2013

 

 


Directory
General Information | Syllabus | Projects


Course Summary

 

The goal of this seminar is to read and discuss classical and recent publications related to big data. Example systems topics include MapReduce absraction, key-value stores, globally distributed storage systems, Google data center infrastructure, deduplication storage system, deduplication for WAN bandwidth optimization.  Data analytics topics include feature extraction and learning, ontology construction, similarity measures, dimension reduction, summary data structures, streaming, minhashing, locality sensitive hashing, clustering in high dimensional space, frequent item sets, and mining social network graphs.  In addition, we will study several data types such as images, audio, medical, neuroscience, wireless network.

 

During the class, students, faculty members, and invited speakers will present papers on specific topics.  Students who are taking the course for credits are required to read all primary papers, give a presentation of papers on a specific topic, and work (individually or jointly) on a small research project.  This course satisfies the programming requirement for our graduate program.

 


Administrative Information

Class meet: Room: 302,  Friday 1:30-4:00pm, Computer Science Building


 

Book

 

We recommend students to read parts of the following textbook:

Mining of Massive Data Sets.  Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman. Cambridge University Press. 2011.

You can download the latest book from an author’s webpage.  We will call this book MMDS.

 

Infrastructure

See http://mmxserver.cs.princeton.edu/cos598c/c8hadoop.html for information about how to access C8 cluster and use Hadoop tools.

Please also see http://mmxserver.cs.princeton.edu/cos598c/hadoop_intro.pdf for Yida’s presentation.

 

Grading Policy

This graduate seminar will be graded roughly as follows:

·       Presentation (25%)

·       Reading and participation in discussions (25%)

·       Warm-up exercise (5%)

·       Project (45%)

Late submissions are due at 11:59pm of the due date.  Late submissions of reading notes will not be accepted.  Late submission of warm-up exercise and project will be graded using our typical formula:

 

grade = original_grade * exp(-time_late/three_days)

 

Reading Requirements

Students who are taking this course for credits are required to read all primary papers (the first paper of each presentation).   For each primary paper, please use no more than 5 sentences to answer each of the following questions.

1.     Summarize the paper.

2.     What are the strengths of the paper?

3.     What are the weaknesses of the paper?

4.     Your thoughts on future directions

Please submit your reading note before each class by using the department dropbox for this course.  The specific dropbox link has been provided on the syllabus web page.

 

Presentation

Each student will give a 30-40 minute presentation and lead discussions.  The presentation should focus on one topic that includes two ore more papers.  All students are required to read the primary paper, but the presenter should read at least two papers on this subject.   See the syllabus for the presentation schedule.

 

After your presentation in class, you should submit your presentation to the dropbox of this class.  We will post the presentation on the blackboard.

 

Projects

Each student is required to work on a project.  The following are the deadlines for your projects:

·       2/24: Submit project proposals via dropbox: Use one page to outline the main idea and the plan of your project.  We suggest that you meet with the faculty to discuss your project plan before submitting your project proposal.

·       3/29 Submit project progress via dropbox: Use powerpoint slides to show the progress of your project.  You should tell where you are in terms of project design and implementation.

·       5/3: Project demos or presentations: Use 10 minutes to do a show and tell.  Your project should be complete by now, except writing a project report.

 

Submit project report: You should write a double-column, conference-paper format a report as if you are writing a short conference paper.  Your report should have the following sections: Abstract, Introduction,  sections on your idea, approach or design, and evaluation, conclusion, and who-did-what. Your report should be more than 4 pages and no more than 10 pages.

·        

Projects can be done by either an individual or a group of students.   If a group of students are working on one project, we expect you to make sure each student make contributions.  In your report, you should clearly write who did what in the last section.