The Spring 2010 COS 424 course “Interacting with Data” will be given by Léon Bottou.


Computers have made it possible, even easy, to collect vast amounts of data from a wide variety of sources. It is not always clear, however, how to use that data, and how to extract useful information from it. This problem is faced in a tremendous range of business, medical and scientific applications. The purpose of this course is to teach some of the best and most general approaches to this broad problem. The course will explore both theoretical foundations and practical applications. Students will gain experience with several kinds of data, including text documents, biological data, signal, and images.

Topics will include:

  • Classification
  • Clustering
  • Regression
  • Dimensionality reduction
  • Exploratory statistics
  • Advanced topics and applications


Tuesday/Thursday 11:00AM-12:20PM, Computer Science 105.

Course staff

  • Professor: Léon Bottou — lbottou (at)
    (Office Hours: by arrangement; catch me after class to discuss any question.)
  • Teaching assistant: Sean Gerrish — sgerrish (at)
    (Office Hours in CS413 Fridays 1pm-2pm)

Mailing list

The mailing list for the class is You can register for this list here.


The prequesites are MAT101, MAT201, COS126, and some exposure to probability and/or statistics (such as what is covered in COS341 or COS402). In general, you should be comfortable with computer programming and basic linear algebra, and have some familiarity with probability and statistics.


No textbook perfectly fits the material in this course. Instead, students will be asked to take ”scribe notes” on the lectures, which will be posted on the course website (see below). Additional papers and book chapters will also be provided. A lot of the material will be drawn from these three books, which are on reserve at the Engineering library.

  • Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer 2006.
  • Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning: Data mining, Inference, and Prediction, Springer, 2001.
  • David Hand, Heikki Mannila and Padhraic Smyth. Principles of Data Mining, MIT Press, 2001.

Course grades and workload

The course consists of lectures, readings, homework assignments, and a final project.

There will be about four homework assignments given roughly once every two weeks; these will essentially be written exercises whose solution requires some programming. Homeworks count for 65% of your grade.

The class project will constitute about one month of work, and count for the remaining 35% of your grade. For the project, you are required to undertake a thorough piece of applied data analysis and clearly report your findings in a written report and poster presentation. You can work alone or in groups of 2-3.

Because there is no perfect textbook for this course, students will be asked to take turns preparing scribe notes for posting on the course web site. Each class, 1-2 student will be the designated “scribe”, taking careful notes during class, writing them up, and sending them to the instructor for posting on the web. These will not be graded; however, assuming there are more students than lectures, anyone who volunteers to scribe will receive extra credit.

Failure to complete any significant component of the course may result in a grade of D or F, regardless of performance on the other components. Final grades may be adjusted upward for positive and regular class participation.


We recommend using R for the homework assignments. Since there are many ways to reach the correct solution, using R is not a requirement. However you may find R practical for this course and useful in the future. More information about using R can be found here.

start.txt · Last modified: 2010/02/18 22:51 by lbottou
Recent changes RSS feed Creative Commons License DjVu Enabled Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki