Computer Science 435
Information Retrieval, Discovery, and Delivery
General Information |
and Assignments |
Project Page |
This course examines the methods used to search for information in
digital collections (e.g. Google) and how digital content is gathered
search engines. We study classic techniques of indexing documents and
searching text and also new algorithms that exploit properties of the
Web (e.g. links) and other digital collections, including multimedia
collections. Techniques include those for relevance and ranking of
document, exploiting user history, and information clustering. We also
of search technology: how distributed computing and storage are used to
make information delivery efficient.
Meeting time: Monday,
Friend Center 006
Extra meetings: If we need to make up a class due to my
schedule, we may have a class during reading
period and/or an evening class during the semester. Class
will be consulted before any make-up class time is chosen.
LaPaugh, aslp@ ...
304 Computer Science Building, 258-4568
Office hours: Monday 3-4:30pm or
appointment. Easiest way to make an appointment is by email.
Teaching Assistant: Siyu Yang, siyuy@ ...
313 Computer Science Building
Office hours: Tuesday
1-2:30pm or by
Course secretary: Mitra Kelly, 323 CS building, 258-4562,
All email addresses are at cs.princeton.edu
- Manning, Christopher D.; Raghavan, Prabhakar; Schütze,
University Press, 2008.
- The link is to a complete online version
of the textbook.
will also use reprints and other
Supplemental reading (check back for additions as we progress in the semester):
On reserve at Engineering Library:
- Croft, Bruce; Metzler, Donald; Strohman, Trevor, Search Engines: Information Retrieval in
Practice, Addison Wesley, 2010.
- Grossman, David and Frieder, Ophir, Information Retrieval : Algorithms and
Heuristics, 2nd edition, Springer, 2004.
- Chakrabarti, Soumen, Mining
the Web: Discovering Knowledge from Hypertext Data, Elsevier
(Morgan_Kaufmann Division), 2003.
- Langville, Amy N. and Meyer, Carl D. Google's PageRank and Beyond : the Science
of Search Engine Rankings,
Princeton University Press, 2006.
Work of the Course
The course will have the following components weighted as
indicated (note that thes are slightly different from those in Course Offerings):
- Problem sets 25%
- Midterm exam 15%
- Second exam 20%
- Class participation 5%
- Project 35%
There will be 5 to 6 problem sets distributed throughout the
There will two take-home exams during the semester, each covering
the course material. There is no exam during final exam period.
Each student or pair of students will do a final project of his/her or
their choosing related to
the material of the course.
The project must be approved in advance by the course instructor.
the project page for more
information and a list of suggested projects.
All assignments will be
made available on the course Web site (see Schedule
Assignments). ``Handouts'' and copies of any
used in class will be posted on the course Web site as well. Important
announcements on all aspects of the course will be made on the Announcements
Students are responsible for monitoring the postings
under ``Announcements''. Schedule changes will be made on Schedule
and Assignments and announced on Announcements.
You are encouraged to use electronic mail to set up appointments,
messages, and ask quick questions (like ``What was that reference you
today in class?'' or ``I've been at McCosh Infirmary all week; can I
an extension on my assignment?'') However, an old fashioned
meeting is still best for clarifying confusions and other technical
(This is the general list of topics and probably a superset of what
we will have time to cover. Please see Schedule
for specific topics and reading assignments as the
- Models of documents
- Query models for searching (focus on keyword-based search)
- Indexing and inverted files
- Ranking documents
- Using linking structure for Web content analysis
- Semantic and feedback techniques
- User behavior-based relevance criteria
- Privacy issues
- Manipulating search engine results (SEOs)
- Distributed computation for by search engines
- Evaluating retrieval systems
- Web crawling
- Document similarity
- Non-text media search: e.g. music, images
- Adding structure to information: databases, XML, the
- System design of search engines: distributed storage and
- Searching dynamic information sources
- Information caching
- Reliability and permanence of information
A.S. LaPaugh content last changed Sun Jan 30 12:51:09 EST 2011