Princeton University
Computer Science Dept.

Computer Science 435
Information Retrieval, Discovery, and Delivery

Andrea LaPaugh

Spring 2016

General Information | Schedule and Assignments |  Project Page | Announcements

Information about the Course Project

Each pair of students will do a final project of their choosing related to the material of the course.

Project requirements:

Preliminary proposal due 11:55 pm Wednesday, Mar. 9, 2016:

Submit via CS DropBox a paragraph describing your proposed project to Prof. LaPaugh.  Submit as a plaintext file projectProp.txt.  One partner should submit the proposal and the other partner should submit a statement confirming the partnership (using same file name).  Be sure that both partners' names are on the submission.  Include as much detail as possible, but no more than a page.   Prof.  LaPaugh will reply with any concerns about the content or scope of the project. 

Progress report
between April 11 and April 15, 2016:

Meet with Professor LaPaugh to discuss your progress on your project; partners come together. Expect to spend about 15 minutes discussing your work to date.   You will not give a formal presentation, but you should prepare slides (about  8) that summarize any algorithms, system architecture, or experiments you are developing for the project.   Email these to Professor LaPaugh ahead of your meeting time.  She will review them before your meeting.

You will sign up for your appointment using OIT's office hours scheduling system WASS.   Wait until the availability of appointment blocks is announced.  To use WASS, log in and click the "Make an Appointment" menu button.  Search for the  calendar under name "LaPaugh" or NetId "aslp" entitled LaPaugh course calendar.  Once the calendar is found, click "Make Appointment".    If you have conflicts with all available times, email Professor LaPaugh.   Caution: do not use the calendar entitled Advising calendar for Andrea LaPaugh.

UPDATE: Project Demonstration will be after Dean's date.  Please see below.

Project Report due 5:00 pm Dean's Date, Tuesday May 10, 2016: 

You are required to submit a final report that describes your project (one report for both partners). This must include the statement of the topic and the goals of the project, your methodology and the results. If it is an experimental project, you need to describe what was implemented, the major implementation decisions,  how you designed the experiments, and the experimental results. If you developed a system or tool, you may not have experiments per se, but you must describe how you are evaluating the project and the outcome.  You should also relate your work to other work on the problem.  Your code should be in an appendix or posted on a Web page with the URL provided (Web posting is preferred).  For any type of project, be sure to include a bibliography of all the sources you used, including software packages.

Your project should be typeset in 12pt Times-Roman font, 1-inch margins, double-spaced.  Projects are typically 10-15 pages long, including figures.  You may go longer, but not more than 25 pages.  If your paper is much less than 10 pages, you probably have not done justice to some of the elements above.
Projects will be graded on thoroughness and depth of thought. Difficulty will be taken into consideration.

Keep in mind that evaluation is an important part of any project. Be clear on the goals of your project and how you demonstrate or measure success.

Project Demonstration between Wednesday May 11 and Wednesday May 17 (revised dates)

After submitting your final report, you and your partner meet together briefly with Professor LaPaugh to discuss the results of your project.  If you have implemented something that lends itself to live demonstration, this is the time to show it.  This is not a formal presentation.  You do not need to prepare anything unless you have a demo.  As for the progress report meetings, you will be able to sign up using the WASS scheduling system.

List of suggested projects:

These topics are fairly broad and need further refinement based on students' particular interests. Students are encouraged to suggest other project topics based on their own interests.  Check back for updates and additions.

  1. There are many properties that can be measured for graphs in general and social networks in particular, including PageRank, HITS, connectivity measures and clustering measures.  Explore the use of one or more of these measures in the graph/network model for an application domain not discussed in class.  This is intended to be an experimental project, but the literature for the application should be explored as well.
  2. Investigate the use of link analysis to determine the subject of non-text pages.  For example, if a Web page contains only an image, not only the anchor text of links pointing to the page but the subject matter of pages pointing to the page may allow one to decide the general subject of the image.  Can this be done without informative anchor text?  (An example of uninformative anchor text is here.)
  3. Investigate the use of dependence among index terms (e.g. co-occurrence) in the literature and by your own experiments.  Latent Semantic Indexing is one example of a technique that uses co-occurrence.
  4. Investigate probabilistic models for information retrieval.  For example, compare the performance of a probabilistic model to the vector model.
  5. Investigate the state of the art of compression in search engines for large corpora like those of Web search engines.  Implement and compare competing methods with respect to compression effectiveness.
  6. Experiment with image retrieval by image properties, not text labels.  You should include a summary of image retrieval techniques currently in use.  Any other non-text media can be substituted for images.  
  7. Investigate the use of clustering in some application.  For example, what ways can tweets be clustered. Experiment to determine which  methods give credible clusters.  Are these clusters helpful to the user?
  8. Propose new visualizations of search results and investigate their effectiveness.  Compare these to current techniques used by search engines, e.g. visualizing results clustered by topic.  The goal is to improve the user experience.
  9. Experiment with techniques for detecting duplicate documents.  
  10. Investigate personalized or topic-directed crawling techniques and their effectiveness.
  11. Apply recommendation techniques to a domain we do not consider in class.  Compare effectiveness of different techniques.  Explore combinations of techniques.
  12. Build an application that uses a customized information retrieval system.

A sample of recent projects (can be re-used):


Please see this Resources for COS 435 Projects Web page for a list of available data sets and software.  If you need something and can't find it, ask for help!

last revised Wed May  4 17:06:05 EDT 2016
Copyright  2008-2016 Andrea S. LaPaugh