COS424: Interacting with Data
Computers have made it possible, even easy, to collect vast
amounts of data from a wide variety of sources. It is not always
clear, however, how to use that data, and how to extract useful
information from it. This problem is faced in a tremendous range of
business, medical and scientific applications. The purpose of this
course is to teach some of the best and most general approaches to
this broad problem of how to get the most out of data. The course
will explore both theoretical foundations and practical applications.
Students will gain experience analyzing several kinds of data,
including document collections, biological data, and natural images.
Topics will include:
- Dimensionality reduction
- Advanced topics and applications
Tuesday/Thursday 11:00AM-12:20PM, Friend 006
204 CS Building
blei [at] cs.princeton.edu
Office hours: by appointment
407 CS Building
schapire [at] cs.princeton.edu
Office hours: by appointment, or just stop by
004 CS Building
jcone [at] princeton.edu
Office hours: Wednesdays, 3:00PM-5:00PM
221 Fine Hall
czhu [at] princeton.edu
Office hours: Fridays, 3:00PM-4:00PM
As soon as possible,
please join the course mailing list by visiting
and following the instructions for subscribing.
When signing up for the mailing list, please provide your name,
especially if you are using a non-Princeton email address.
To prevent spam, email
addresses that cannot be identified as legitimate will be removed from
This list will be used by the course staff for general
announcements such as last minute corrections to the homeworks and
changes in due dates. This list can also be used by students for
discussing course material and homeworks.
The course staff will monitor and respond to questions on this
list. If your question is specific to your own work, please contact
You can post to the list by sending mail to firstname.lastname@example.org. Note that you can only post to the
list using the email address you used to subscribe to it.
The prequesites are MAT101, MAT201, COS126, and some exposure to
probability and/or statistics (such as what is covered in COS341 or
In general, you should be comfortable
with computer programming and basic linear algebra,
and have some familiarity with probability and statistics.
Come see one of the professors if you are unsure.
There is no textbook that perfectly fits the material in this course.
Instead, students will be asked to take "scribe notes" on the lectures
which will be posted on the course website (see below).
Additional papers and book chapters will also be provided.
A list of other books for further background reading appears on the Syllabus page, and are being placed on reserve at the Engineering
Course Grades and Workload
The course consists of lectures, readings, homework assignments,
and a final project.
There will be about four
homework assignments given roughly once every two weeks; these
will be a mix of written exercises and programming. Homeworks count
for 65% of your grade.
The class project will constitute about one month of work, and count for
the remaining 35% of your grade. For the project, you are required to
undertake a thorough piece of applied data analysis and clearly report
your findings in a written report and poster presentation. You
can work alone or in groups of 2-3. See this page
for more details..
The homework assignments will be done using
R. More information about R can be found here.
Because there is no perfect textbook for this course, students will
be asked to take turns preparing "scribe notes" for posting
on the course web site (specifically, on the Syllabus page). Each
class, one student will be the designated "scribe", taking
careful notes during class, writing them up, and sending them to the instructor
for posting on the web. Here is
more information on how to be a scribe. These will not be
graded; however, assuming there are more students than lectures,
anyone who volunteers to scribe will receive extra credit.
Failure to complete any significant component of the course may
result in a grade of D or F, regardless of performance on the other
components. Final grades may be adjusted upward for positive and
regular class participation.
Handing in. Please submit code to
Written exercises and some code must be handed in as hard copy to an
envelope that will be placed outside Jonathan's office (room 004 of
the Computer Science building).
All assignments are due at 11:59pm on the due date.
Each student will be allotted five free days which can be used
to turn in homework assignments late without penalty. For instance, you might
choose to turn in the first homework two days late, and the third
homework three days late.
Once your free days are used up, late homeworks will be penalized 20% per
day. (For instance, a homework turned in two days late will receive only 60%
credit.) Homeworks will not be accepted more than five days past the deadline,
whether or not free days are being used. Exceptions to these rules will of
course be made for serious illness or other genuine emergency circumstances, and
free late days should not be used for these purposes; in these
cases, please contact a professor as soon as you are aware of the problem.
A weekend, that is, Saturday and Sunday together, count as a
single late "day". For instance, a homework that is due on
Friday but turned
in on Monday would be considered two days late, rather than three.
The final project cannot be turned in late, nor can written
material be turned in beyond "Dean's Date" without a dean's permission.
If you are turning in a late homework after hours when no one is
around to accept it, please indicate at the top that it is late, and
clearly mark the day and time when it was turned in.
Failure to do so may result in the TA considering the homework to be
submitted at the time when it was picked up (which might be many
hours, or even a day or two after when you actually submitted it).
Homeworks are graded largely on getting the right
answer or getting the program to work. In many homeworks, there are
some more "free form" questions asking for exploration and
experimentation. These questions will be graded more subjectively (as
in the humanities). Ideal answers are thoughtful, perceptive,
critical, clear, and concise.
The collaboration policy for this course is based on the overarching objective of maximizing your educational experience, that is, what you gain in knowledge, understanding and the ability to solve problems. Obviously, you do not learn anything by copying someone else's solution. On the other hand, forbidding any and all discussion of course material may deprive you of the opportunity to learn from fellow students. The middle ground between these two extremes also needs to be defined with this basic principle in mind. Before working with another student, you should ask yourself if you would gain more or less by working together or individually, and then act accordingly. Here are some specific guidelines based on this principle:
You are certainly free (and encouraged) to talk to others about the material in this course, or for general help with R, moodle, etc.
Before working with someone else, you should first spend a substantial amount of time trying to arrive at a solution by yourself. Easier problems, including
many or most of the written exercises, should be solved individually from start to finish.
Discussing harder problems or programming assignments with fellow students is allowed to the extent that it leads all participants to a better understanding of the problem and the material. Following such discussions, you should only take away your understanding of the problem; you should not take notes, particularly on anything that might have been written down. This is meant to ensure that you understand the discussion well enough to reproduce its conclusions on your own.
You should also note on your solution who you worked with.
Needless to say, simply telling the solution to someone else is prohibited, as is showing someone a written solution or a portion of your code. Comparing code or solutions also is not generally permitted. However, comparing and discussing the results of experiments is okay if done in the spirit of the guidelines above.
All writing and programming must be done strictly
on your own. Copying of any sort is not allowed. Unless
instructed otherwise, you may not use
code or solutions taken from any student, from the web, from prior
year solutions, or from any other source.