Information | Syllabus | Assignments

- Summary
- Lectures
- Mailing List
- Prerequisites
- Reading
- Using R
- Course grades and workload
- Assignment policies

Computers have made it possible, even easy, to collect vast amounts of data from a wide variety of sources. It is not always clear, however, how to use that data, and how to extract useful information from it. This problem is faced in a tremendous range of business, medical and scientific applications. The purpose of this course is to teach some of the best and most general approaches to this broad problem of how to get the most out of data. The course will explore both theoretical foundations and practical applications. Students will gain experience analyzing several kinds of data, including document collections, biological data, and natural images.

Topics will include:

- Classification
- Clustering
- Regression
- Dimensionality reduction
- Advanced topics and applications

Tuesday/Thursday 11:00AM-12:20PM, Computer Science 104

David Blei
(Professor)

204 CS Building

blei [at] cs.princeton.edu

659-258-9907

Office hours: by appointment

Indraneel Mukherjee (Teaching Assistant)

103C CS Building

imukherj [at] cs.princeton.edu

Office hours: Monday 6:30PM-8:30PM; 103C Computer Science

Martin Suchara (Teaching Assistant)

103A CS Building

msuchara [at] cs.princeton.edu

Office hours: Wednesday 6:30PM-8:30PM; 103A Computer Science

This list will be used by the course staff for general announcements such as last minute corrections to the homeworks and changes in due dates. This list can also be used by students for discussing course material and homeworks.

The course staff will monitor and respond to questions on this list. If your question is specific to your own work, please contact them directly.

You can post to the list by sending mail to cos424@lists.cs.princeton.edu. Note that you can only post to the list using the email address you used to subscribe to it.

The prequesites are MAT101, MAT201, COS126, and some exposure to probability and/or statistics (such as what is covered in COS341 or COS402). In general, you should be comfortable with computer programming and basic linear algebra, and have some familiarity with probability and statistics. Contact Prof. Blei if have concerns about your prerequisite coursework.

There is no textbook that perfectly fits the material in this course. Instead, students will be asked to take " scribe notes " on the lectures, which will be posted on the course website (see below). Additional papers and book chapters will also be provided.

A lot of the material will be drawn from these two books, which are on reserve at the Engineering library.

- Christopher M. Bishop.

*Pattern Recognition and Machine Learning.*

Springer 2006. - Trevor Hastie, Robert Tibshirani and Jerome Friedman.

**The Elements of Statistical Learning: Data mining, Inference, and Prediction**.

Springer, 2001.

The course consists of lectures, readings, homework assignments, and a final project.

There will be about four homework assignments given roughly once every two weeks; these will be a mix of written exercises and programming. Homeworks count for 65% of your grade.

The class project will constitute about one month of work, and count for the remaining 35% of your grade. For the project, you are required to undertake a thorough piece of applied data analysis and clearly report your findings in a written report and poster presentation. You can work alone or in groups of 2-3. See this page for more details..

Because there is no perfect textbook for this course, students will be asked to take turns preparing "scribe notes" for posting on the course web site (specifically, on the Syllabus page). Each class, 1-2 student will be the designated "scribe", taking careful notes during class, writing them up, and sending them to the instructor for posting on the web. Here is more information on how to be a scribe. These will not be graded; however, assuming there are more students than lectures, anyone who volunteers to scribe will receive extra credit.

Failure to complete any significant component of the course may result in a grade of D or F, regardless of performance on the other components. Final grades may be adjusted upward for positive and regular class participation.

The homework assignments will be done using
**R**. More information about using R for this course can be
found here.

*not* be used for these purposes; in these cases, please
contact a professor as soon as you are aware of the problem. A
weekend, that is, Saturday and Sunday together, count as a single late
"day". For instance, a homework that is due on Friday
but turned in on Monday would be considered two days late, rather than
three.

The final project cannot be turned in late, nor can written material be turned in beyond "Dean's Date" without a dean's permission.

If you are turning in a late homework after hours when no one is
around to accept it, please indicate at the top that it is late, and
*clearly mark the day and time when it was turned in*.
Failure to do so may result in the TA considering the homework to be
submitted at the time when it was picked up (which might be many
hours, or even a day or two after when you actually submitted it).

- You are certainly free (and encouraged) to talk to others about the material in this course, or for general help with R, moodle, etc.
- Before working with someone else, you should first spend a substantial amount of time trying to arrive at a solution by yourself. Easier problems, including many or most of the written exercises, should be solved individually from start to finish.
- Discussing harder problems or programming assignments with fellow students is allowed to the extent that it leads all participants to a better understanding of the problem and the material. Following such discussions, you should only take away your understanding of the problem; you should not take notes, particularly on anything that might have been written down. This is meant to ensure that you understand the discussion well enough to reproduce its conclusions on your own. You should also note on your solution who you worked with.
- Needless to say, simply telling the solution to someone else is prohibited, as is showing someone a written solution or a portion of your code. Comparing code or solutions also is not generally permitted. However, comparing and discussing the results of experiments is okay if done in the spirit of the guidelines above.
- All writing and programming must be done strictly on your own. Copying of any sort is not allowed. Unless instructed otherwise, you may not use code or solutions taken from any student, from the web, from prior year solutions, or from any other source.