|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea
LaPaugh
|
Spring 2017
|
Directory
General Information | Schedule and Assignments |
Project Page | Announcements
Information about the Course Project
Each student or pair of students
will do a final project of their choosing related to the
material of the course.
Project requirements:
Preliminary proposal due
11:55 pm Monday, Mar. 13, 2017:
Submit via CS DropBox a paragraph
describing your
proposed project to Prof. LaPaugh. Submit
as a plaintext file projectProp.txt. One
partner should submit the proposal and the other partner should
submit a statement confirming the partnership (using same file
name). Be sure that both partners' names are on the
submission. Include as much detail as possible, but no more
than a page. Prof. LaPaugh will reply with any
concerns about the content or scope of the project.
Progress
report April 14, April 17, or April 18, 2017:
Meet with Professor LaPaugh to discuss your progress
on your project; partners come together. Expect to spend about
15 minutes discussing your work to date. You will
not give
a formal presentation, but you should prepare slides
(about 8)
that summarize any algorithms, system architecture, or
experiments you are developing for the project.
Email these to Professor LaPaugh ahead of your meeting time.
She will review them before your meeting.
You will sign up for your appointment using OIT's office hours
scheduling system WASS.
Wait until the availability of appointment blocks is
announced. To use WASS, log in and click the "Make an
Appointment" menu button. Search for the calendar under name
"LaPaugh" or NetId "aslp" entitled LaPaugh course calendar. Once the calendar is found, click
"Make Appointment". If you have conflicts with
all available times, email Professor LaPaugh. Caution: do not use the calendar
entitled Advising calendar for
Andrea LaPaugh.
Project Report due 5:00 pm Dean's
Date, Tuesday May 16, 2017:
You are required to submit a final report that describes your
project (one report for both partners). This must include the
statement of the topic and the goals of the project, your
methodology and the results. If it is an experimental project, you
need to describe what was implemented, the major implementation
decisions, how you designed the experiments, and the
experimental results. If you developed a system or tool, you may not
have experiments per se, but you must describe how you are
evaluating the project and the outcome. You should also relate
your work to other work on the problem. Your code should be in
an appendix or posted on a Web page with the URL provided (Web
posting is preferred). For any type of project, be sure to include a
bibliography of all the sources you used, including software
packages.
Keep in mind that evaluation is
an important part of any project. Be clear on the goals of your
project and how you demonstrate or measure success.
Your project should be typeset in 12pt Times-Roman font, 1-inch
margins, double-spaced. Projects are typically 10-15 pages
long, including figures. You may go longer, but not more than
25 pages. If your paper is much less than 10 pages, you
probably have not done justice to some of the elements above.
Be sure the names of all partners appear on the title page and that
all partners sign the "this is my own work" pledge.
Projects will be graded on
thoroughness and depth of thought. Difficulty will be taken into
consideration.
Submit your final report using the
Computer Science Department DropBox submission system for COS435
at https://dropbox.cs.princeton.edu/COS435_S2017/Project_Report.
Name
your file projectReport.pdf.
Project
Demonstration between Wednesday May 17 and Friday May 19
After submitting your final report, meet briefly with Professor
LaPaugh (partners together) to discuss the results of your
project. If you have implemented something that lends itself
to live demonstration, this is the time to show it. This
is not a formal presentation. You do not need to
prepare anything unless you have a demo. As for
the progress report meetings, you will be able to sign up using the
WASS scheduling system.
List of suggested
projects:
These topics are fairly broad and
need further refinement based on students' particular
interests. Students are
encouraged to suggest other project topics based on their
own interests. Check back for updates and
additions.
- There are many properties that can be measured for graphs in
general and social networks in particular, including PageRank,
HITS, connectivity measures and clustering measures.
Explore the use of one or more of these measures in the
graph/network model for an application domain not discussed in
class. This is intended to be an experimental
project, but the literature for the application should be
explored as well.
- Investigate the use of link analysis to determine the
subject of non-text pages. For example, if a Web
page contains only an image, not only the anchor text of links
pointing to the page but the subject matter of pages pointing to
the page may allow one to decide the general subject of the
image. Can this be done without informative anchor
text? (An example of uninformative anchor text is here.)
- Investigate the use of dependence among index terms (e.g.
co-occurrence) in the literature and by your own
experiments. Latent Semantic Indexing is one example of a
technique that uses co-occurrence.
- Investigate probabilistic models for information
retrieval. For example, compare the performance of a
probabilistic model to the vector model.
- Investigate the state of the art of compression in search
engines for large corpora like those of Web search
engines. Implement and compare competing methods with
respect to compression effectiveness.
- Experiment with image retrieval by image properties, not text labels. You
should include a summary of image retrieval techniques currently
in use. Any other non-text media can be substituted for
images.
- Investigate the use of clustering in some application.
For example, what ways can tweets be clustered. Experiment to
determine which methods give credible clusters. Are
these clusters helpful to the user?
- Propose new visualizations of search results and
investigate their effectiveness. Compare these to current
techniques used by search engines, e.g. visualizing results
clustered by topic. The goal is to improve the user
experience.
- Investigate the state of the art in techniques for
detecting duplicate documents, including experimentation.
- Investigate personalized or topic-directed crawling
techniques and their effectiveness.
- Apply recommendation techniques to a domain we do not
consider in class. Compare effectiveness of different
techniques. Explore combinations of techniques.
- Build an application that uses a customized information
retrieval system.
A sample of recent projects (can be re-used):
- "An Analysis of Approximate Page Ranking"
- "Newsfeed clustering"
- "Creating Image Photomosaics"
- "Using WordNet Post-Processsing to Improve Information
Retrieval Precision"
- "The effectiveness of algorithms in sub-clustering tweets"
- "Personalized reddit search app"
- "An algorithmic approach to trending tweet prediction and
recommendation"
Resources:
Please see this Resources
for COS 435 Projects Web page for a list of available data
sets and software. If you need something and can't find it, ask for help!
last revised Fri Mar 31
11:37:27 EDT 2017
Copyright
2008-2017
Andrea S. LaPaugh