COS424: Interacting with Data

Spring 2007



Computers have made it possible, even easy, to collect vast amounts of data from a wide variety of sources. It is not always clear, however, how to use that data, and how to extract useful information from it. This problem is faced in a tremendous range of business, medical and scientific applications. The purpose of this course is to teach some of the best and most general approaches to this broad problem of how to get the most out of data. The course will explore both theoretical foundations and practical applications. Students will gain experience analyzing several kinds of data, including document collections, biological data, and natural images.

Topics will include:


Tuesday/Thursday 11:00AM-12:20PM, Friend 006

Course Staff


David Blei
204 CS Building
blei [at]
Office hours: by appointment

Robert Schapire
407 CS Building
schapire [at]
Office hours: by appointment, or just stop by

Teaching Assistants

Jonathan Chang
004 CS Building
jcone [at]
Office hours: Wednesdays, 3:00PM-5:00PM

Chenwei Zhu
221 Fine Hall
czhu [at]
Office hours: Fridays, 3:00PM-4:00PM

Mailing list

As soon as possible, please join the course mailing list by visiting here and following the instructions for subscribing. When signing up for the mailing list, please provide your name, especially if you are using a non-Princeton email address. To prevent spam, email addresses that cannot be identified as legitimate will be removed from the list.

This list will be used by the course staff for general announcements such as last minute corrections to the homeworks and changes in due dates. This list can also be used by students for discussing course material and homeworks.

The course staff will monitor and respond to questions on this list. If your question is specific to your own work, please contact them directly.

You can post to the list by sending mail to Note that you can only post to the list using the email address you used to subscribe to it.


The prequesites are MAT101, MAT201, COS126, and some exposure to probability and/or statistics (such as what is covered in COS341 or COS402). In general, you should be comfortable with computer programming and basic linear algebra, and have some familiarity with probability and statistics. Come see one of the professors if you are unsure.


There is no textbook that perfectly fits the material in this course. Instead, students will be asked to take "scribe notes" on the lectures which will be posted on the course website (see below). Additional papers and book chapters will also be provided.

A list of other books for further background reading appears on the Syllabus page, and are being placed on reserve at the Engineering Library.

Course Grades and Workload

The course consists of lectures, readings, homework assignments, and a final project.

There will be about four homework assignments given roughly once every two weeks; these will be a mix of written exercises and programming. Homeworks count for 65% of your grade.

The class project will constitute about one month of work, and count for the remaining 35% of your grade.  For the project, you are required to undertake a thorough piece of applied data analysis and clearly report your findings in a written report and poster presentation. You can work alone or in groups of 2-3. See this page for more details..

The homework assignments will be done using R. More information about R can be found here.

Because there is no perfect textbook for this course, students will be asked to take turns preparing "scribe notes" for posting on the course web site (specifically, on the Syllabus page).  Each class, one student will be the designated "scribe", taking careful notes during class, writing them up, and sending them to the instructor for posting on the web.  Here is more information on how to be a scribe.  These will not be graded; however, assuming there are more students than lectures, anyone who volunteers to scribe will receive extra credit.

Failure to complete any significant component of the course may result in a grade of D or F, regardless of performance on the other components. Final grades may be adjusted upward for positive and regular class participation.

Assignment Policies

Handing in. Please submit code to moodle. Written exercises and some code must be handed in as hard copy to an envelope that will be placed outside Jonathan's office (room 004 of the Computer Science building).

Late days. All assignments are due at 11:59pm on the due date. Each student will be allotted five free days which can be used to turn in homework assignments late without penalty.  For instance, you might choose to turn in the first homework two days late, and the third homework three days late.  Once your free days are used up, late homeworks will be penalized 20% per day.  (For instance, a homework turned in two days late will receive only 60% credit.)  Homeworks will not be accepted more than five days past the deadline, whether or not free days are being used.  Exceptions to these rules will of course be made for serious illness or other genuine emergency circumstances, and free late days should not be used for these purposes; in these cases, please contact a professor as soon as you are aware of the problem. A weekend, that is, Saturday and Sunday together, count as a single late "day".  For instance, a homework that is due on Friday but turned in on Monday would be considered two days late, rather than three.

The final project cannot be turned in late, nor can written material be turned in beyond "Dean's Date" without a dean's permission.

If you are turning in a late homework after hours when no one is around to accept it, please indicate at the top that it is late, and clearly mark the day and time when it was turned in.  Failure to do so may result in the TA considering the homework to be submitted at the time when it was picked up (which might be many hours, or even a day or two after when you actually submitted it).

Grading. Homeworks are graded largely on getting the right answer or getting the program to work. In many homeworks, there are some more "free form" questions asking for exploration and experimentation. These questions will be graded more subjectively (as in the humanities). Ideal answers are thoughtful, perceptive, critical, clear, and concise.

Collaboration. The collaboration policy for this course is based on the overarching objective of maximizing your educational experience, that is, what you gain in knowledge, understanding and the ability to solve problems. Obviously, you do not learn anything by copying someone else's solution. On the other hand, forbidding any and all discussion of course material may deprive you of the opportunity to learn from fellow students. The middle ground between these two extremes also needs to be defined with this basic principle in mind. Before working with another student, you should ask yourself if you would gain more or less by working together or individually, and then act accordingly. Here are some specific guidelines based on this principle: