Princeton University
Computer Science Department

Computer Science 511
Theoretical Machine Learning

Spring 2019

 


Directory
General Information | Schedule & Readings | Assignments | Final Project | blackboard

Proposal due:  Friday, April 5.
Final report due:  Tuesday, May 7.


Proposal
Choosing a topic
Software and data
Writing (and submitting) a final report
What you will be graded on
Books on reserve


The final project for this class is completely open ended.  You can pick just about any topic you wish so long as there is some direct connection to machine learning and its mathematical foundations.  For your project, you can run an experiment, or you can think about a theoretical problem or algorithm, or you can do a blend of both. 

Projects are to be done in groups of 2, 3 or 4 students, where the size of the group should be appropriate for the ambitiousness of the project.  With special permission, groups consisting of a single individual may also be allowed, but only for compelling reasons and only in exceptional cases.  (See the Proposal section below for how to petition for permission to do a one-person project.)

This project is due on Tuesday, May 7 (with a proposal due on Friday, April 5).  Please make every effort to turn these in on time.  "Free" late days cannot be used for the final project (nor can they be used for the proposal).  Extensions beyond the due date will only be given for genuine and unforeseen emergencies, such as serious illness or death in the family.  To avoid last-minute problems with laptops, hard drives, etc., be sure to keep your work regularly and frequently backed up as I do not generally consider hardware or computer difficulties to be "emergencies."

I strongly advise starting early on your project.  Running experiments takes time, as does thinking about theoretical problems.


Proposal

As soon as you know what you want to do, but no later than Friday, April 5, please write and submit one brief paragraph outlining your project.  Your proposal should be submitted electronically on TigerFile using this link.  Submit a single pdf file called "proposal.pdf".  Note that, to submit, you will need to first create a "group" on TigerFile.  You will then be able to submit a single file for your entire group.  Be sure to include name and netid for all group members on the submission itself.  Feel free to come to office hours or contact me or the TA's if you want to discuss your ideas, or have questions.

Sometime after you submit your proposal, your project will be assigned to one of the course instructors who will contact you and might possibly provide brief feedback.  Nevertheless, this process may take a week or more, so you certainly should not wait to hear from us, but should instead begin working on your project forthwith.  Your assigned instructor will be your point of contact going forward, and will also read and grade your final report.

Groups consisting of a single individual may be allowed, with special permission, if you can make a compelling case that doing the project with a group would be very difficult or impossible, or would have significant negative consequences.  If you are seeking permission to do a one-person project, your proposal must also include a separate paragraph with detailed answers to the following:  (1) What are your reasons for wanting to do a one-person project?  (2) What steps have you taken to find a way to do a group project, and why were those steps unsuccessful?  (2) If you are not allowed to do a one-person project, what will be the consequences for you, if any, and what will you do instead for the project?  Note that, depending on how many students apply to do individual projects and their reasons, we might find it necessary to ask some students who have applied to join together into teams. 


Choosing a topic

For your project, you should start by doing some reading on a topic, and then you might run an experiment, or try to simplify or improve or extend some result, or you might try applying an algorithm to a particular application, or you might think about how two different approaches or algorithms are related to each other.  Or you can do something different from any of these.

Here are examples of possible types of projects:

Places to look to get ideas for topics include the following.  (Note that for some of the links below, you will need to be on the Princeton intranet to download entire papers.)

Here are a few topic areas which have gotten a lot of attention in recent years.  But these are only examples, and you certainly should feel free (and encouraged) to do something different from any of these.

If you are doing a theoretical project, it may be that you read a paper, try improving it, and are not able to make progress.  In that case, it is okay to fall back on just explaining the paper as clearly as you can, in your own words.

Doing theoretical research is challenging and progress can be unpredictable.  It is usually not effective to immediately attempt to solve the most difficult and general case first.  Rather, try to think about what the easiest and simplest special cases are, and begin with those.  Sometimes, solving easier cases can suggest a path for solving the more general ones.  Or they might give clues about what aspects of the general problem make it hard to solve.  Along the way, try to make observations and develop insights about the problem that might be useful later, or which tell you something about its structure.  Try to understand the problem and the research that has previously been done on it in a new way based on your own fresh perspective.  All of these smaller steps forward, though perhaps well short of a complete solution, could make for a very nice project.

It is okay to do a project that is related to your primary research.  In this case, you will need to carve out a project that is focused and relevant to machine learning and its mathematical foundations.  Needless to say, turning in a project based on previously completed research would not be appropriate.


Software and data

You may use software that you legitimately find online.  If you do, please note this in your report, and, as with any project, demonstrate in your report that you understand how the underlying learning algorithm works.  If you implement code yourself, be aware that it can be tricky to be sure that a machine learning program is actually working properly.  Be sure that it is carefully tested before running your experiments.  For instance, check the output of the program carefully on tiny datasets where you know what the output should be (for instance, you have computed it by hand, or you have found or implemented another program (say, in another language or using a different technique) that computes it for you).  Also keep an eye out for clues that your program might have problems, for instance, if the results violate proven theorems.  Your report should describe briefly what measures you took to be sure that your program is working properly.

A good place for obtaining "real" data is the repository at University of California, Irvine.  Within this repository, there are many, many datasets to choose from.  Some of these datasets have separate test sets.  Others only provide a training set.  In this case, you can randomly partition the dataset into a training set and test set.  If you end up with a rather small test set, you will probably want to repeat this many times to get reliable results.  You can also use synthetic data of your own creation, in which case there is no problem generating a large test set.  Usually, when evaluating a machine learning algorithm, you will want to see how it performs on several datasets.

If you have access to more specialized data (for instance, as part of your regular research), feel free to use it.  However, if you plan to use data that is private, confidential, classified, copyrighted, controlled, sensitive, etc., it is your responsibility to be sure that it is legally and ethically okay for you to use the data for the purposes of this project (including possibly sharing the data with the COS511 course staff, should the need arise).  Please do not use any data or software in any way that might be considered illegal, unethical, immoral, offensive, or inappropriate.


Writing (and submitting) a final report

In every case, the end result of your project should be a written report clearly and concisely describing what you did, what results you got and what the results mean.  The page limit for your report is based on the size of your group, as given in the following table:

group size

page limit

1
(requires special permission)

5

2

6

3

7

4

8

These page length limits do not include figures, citations or acknowledgments.  The report should use 12pt font, 1-inch margins, and single spacing.  Papers that deviate from these guidelines risk receiving a substantial grade deduction and/or some sections not being read.

For excellent guidance on writing clearly and concisely (which will help you to stay within these page limits), I highly recommend On Writing Well by William Zinsser (at least the first few chapters).

Your report must be submitted by Tuesday, May 7.  Reports should be submitted electronically on TigerFile using this link. As with your proposal, you will need to submit your report as part of a group, and should be sure to include name and netid for all group members on the submission itself.  Submit a single pdf file called "report.pdf". Although it should not be necessary in most cases, if you wish, you can also submit other electronic materials in addition to your report.

Your report should follow the general format of a scholarly paper in this area.  You should write your report as clearly as possible in a manner that would be understandable to a fellow COS511 student.  In other words, you should not assume that the reader has background beyond what has been covered in class (as well as a general computer science background).

Your report should begin by describing the problem you are studying, some background (what has been done before) and the motivation for the problem, i.e., why it is worth studying.  Previous work and outside sources should be cited throughout your report in a scholarly fashion following the style of academic papers in this area.  (See the proceedings of some of the conferences referenced above for examples.)

Next, you should clearly explain what you did, both from a high level, and then in more detail.  For an experimental paper, you should explain the experiments in enough detail that there is a reasonable possibility that a motivated reader would be able to replicate them.  You also should outline some of the theory underlying the algorithms you are studying.  State your results clearly, and think about graphical tools you can use to make your results clearer (a table of numbers is usually less compelling than a graphical representation of the same data).  Look through published papers for ideas.  For a theoretical paper, the learning model and other mathematical details should be explained well enough for the results to be stated with mathematical precision and clarity.

In every case, be sure to explain the meaning of your results.  Don't just give a table of results or a dry mathematical formula.  Explain what the results mean, and what conclusions can be drawn from them.  Again, do all this in a way that would be understandable and interesting to a fellow COS511 student.  What did you expect to find?  What did you find instead?  What are the implications?  If you found something surprising, can you think of how it might be explained?

If appropriate, include an acknowledgment section briefly describing any help you received (other than from the course staff).

 


What you will be graded on

Projects will be graded along the following dimensions:

More specifically, you will be assigned a numerical grade for each one of these dimensions, and your final grade on the report will be the average of these.  (If necessary, we may also adjust grades slightly to account for differences among the various instructors.)


Books on reserve

The following books are (or soon will be) on reserve at the Engineering library and/or are available online when logged on to the Princeton intranet.