|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea LaPaugh
|
Spring 2009
|
Directory
General Information |
Schedule
and Assignments |
Project Page |
Announcements
Information about the Course Project
Each student will do a final project of his or her choosing related to
the material of the course.
Information on Project requirements:
Proposal due 5:00pm Thursday,
Feb. 26, 2009:
Email a paragraph describing your
proposed project to Prof. LaPaugh. Include
as
much detail as possible. This will be the starting point of a
discussion with Professor LaPaugh to make sure the project is of the
appropriate scope for a class project.
Checkpoint presentation April 16, 21, or 23:
Each student will give a 10 minute
presentation
on his or her project. The
presentation should include the motivation and goals of the project,
brief background on the topic, and progress to date. The
presentation
will be given a separate grade from the final project
grade. Pairs doing a joint project should prepare one
presentation during which each student speaks. Pairs may have 15
minutes for their combined presentation. Two to three
minutes are alloted after each presentation for questions and comments.
Prepare slides to use with your
presentation; you will submit these slides after the
presentation. You may, but are not required to, see
Prof. LaPaugh or Chong Wang to discuss your slides and presentation
before your presentation. Note that Professor
LaPaugh is away April 13-15.
Sign up to speak on one of April 16, 21, or 23 using OIT's office
hours scheduling system WASS.
Search for the (only) calendar under
name "LaPaugh" or NetId "aslp", and click "Make Appointment". At most one
pairs presentation can be accommodated each day. Blocks
are 15 minutes to allow for transitions between speakers. Pairs
sign up for one block; the extra time is accounted for in the
overall schedule.
Project Report due 5:00 pm Dean's Date, Tuesday May
12, 2009:
You are required to submit a report that describes
your project. This must include the statement of the topic and the
goals of the project, your methodology
and the results. If it is an experimental project, you need to describe
what was implemented, the major implementation decisions, how you
designed
the experiments, and the experimental results. If you developed a
system or tool, you may not have experiments per se, but you must
describe how you are evaluating the project and the outcome. You
should also relate your work to other work on the problem. Your
code should be in an
appendix or posted on a Web page with the URL provided (Web posting is
preferred). If your project is a theoretical study, you need to
describe the problem,
review what was known about the problem before your analysis, and give
the details and the results of your theoretical analysis. If your
project is a literature-based
project, you need to describe the major issues under study, summarize
the
major techniques and the theoretical and/or experimental results
presented in the literature and critically
analyze the results. For any type of
project, be sure to include a bibliography of all the sources you used.
Projects will be graded
on thoroughness and depth of thought. Difficulty
will be taken into consideration. Keep in mind that evaluation
is an important part of any project. Be clear on the goals of your
project
and how you demonstrate or measure success.
Project demonstration:
If you have implemented something that lends itself to live
demonstration, I would like to see a final demonstration after I
receive your report
and before 5pm Mon. May 18, 2009.
List of suggested projects:
These topics are fairly
broad and need further refinement based on a student's particular
interests. Students are
encouraged to suggest other project topics based on their
own interests. Check back
for updates and additions.
- PageRank and/or HITS can be applied to any directed graph.
Explore the use of one or both of them in another application
domain. This is intended to be an experimental project, but the
literature for the application should be explored as well.
- Investigate the use of link analysis to determine the subject of
non-text pages. For example, if a Web page contains only an
image,
not only the anchor text of links pointing to the page but the
subject matter of links pointing to the page may allow one to decide
the general subject of the image. Can this be done without
informative anchor text? (An example of uninformative anchor text
is here.)
- Investigate algorithms for stemming. Implement one that is
not too involved and study
its effects on aspects of retrieval, e.g. term frequency,
document frequency and query satisfaction, using a small document
collection.
- Investigate the use of dependence among index terms (e.g.
co-occurrence) in the literature and by your own
experiments. LSI is one example of a technique that uses
co-occurrence.
- Explore the success of doing query expansion by adding
synonyms. Web search engines will do this, and you can test
results with and without expansion as part of your project. There
are also studies reported in the literature. WordNet, a lexical database
developed and maintained here at Princeton, is used in many query
modification tools.
- Propose and implement a visualization of the relationship between
some collection of objects (text documents, images, Web pages, etc.)
- Investigate searches for handheld display. What special
things are done now by companies providing service? How do
search engines perform? Are special ranking algorithms needed
that do really well at
getting the top few ( 5? 7?)? Are there things
that can be done? Propose one and test.
- Investigate probabilistic models for information retrieval.
- Do a literature search and analysis of the state of the art of
image retrieval by image properties, not
text labels. You should include an analysis of such retrieval
systems available on the Web. Any
other non-text media can be substituted for images. We will
briefly discuss such not-text retrieval in class; your research
must be substantially more thorough.
- Investigate the use of clustering in some application. For
example, can snippets be clustered in a way that is helpful for search
results?
- Several search engines currently use clustering in their
presentation of search results. Find out what you can about the
clustering techniques used and assess their effectiveness from a user
perspective.
- Experiment with techniques for detecting duplicate
documents.
- Investigate personalized or topic-directed crawling techniques
and their effectiveness.
- Do
an in-depth investigation of cluster machine architectures for
indexing and query-processing on large collections. Find
and compare state-of-the-art alternatives. Some simulation may be
a part of this project. Recent publications should be the primary
source of information on the state of the art..
last revised Mon Apr 6
13:44:20 EDT 2009.
Copyright
2008-2009 Andrea S. LaPaugh