|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea LaPaugh
|
Spring 2012
|
Directory
General Information | Schedule and Assignments | Project Page | Announcements
Course Summary
This course examines the methods used to search for information in
large digital collections (e.g. Google) and how digital content is
gathered by search engines. We study classic techniques of indexing
documents and searching text and also new algorithms that exploit
properties of the Web (e.g. links) and other digital collections,
including multimedia collections. Techniques include those for
relevance and ranking of document, exploiting user history, and
information clustering. We also examine systems aspects of search
technology: how distributed computing and storage are used to make
information delivery efficient.
Prerequisites
COS 226.
Administrative Information
Meeting time: Monday,
Wednedsay 1:30-2:50pm
Meeting place: Friend
Center 004
Extra meetings: If we need to make up a class due to my
schedule, we may have a class during reading period and/or an evening
class during the semester. Class participants will be consulted before
any make-up class time is chosen.
Professor: Andrea
LaPaugh, aslp@ ...
304 Computer Science Building, 258-4568
Office hours: Monday 3-4:30pm or by appointment.
Easiest way to make an appointment is by email.
Teaching Assistant: Yiming Liu, yimingl@ ...
414 Computer Science Building
Office hours: Tuesday
10am- noon or by appointment.
Course secretary: Mitra Kelly, 323 CS building, 258-4562,
mkelly@ ...
All email addresses are at cs.princeton.edu
Reading
Required reading:
- Manning, Christopher D.; Raghavan, Prabhakar; Schütze,
Hinrich, Introduction
to Information Retrieval, Cambridge University Press,
2008.
- The link is to a complete online version
of the textbook.
- We will also use reprints and
other online material.
Supplemental reading (check back for additions as we progress in the semester):
On reserve at Engineering Library:
- Croft, Bruce; Metzler, Donald; Strohman, Trevor, Search Engines: Information Retrieval in
Practice, Addison Wesley, 2010.
- Grossman, David and Frieder, Ophir, Information Retrieval : Algorithms and
Heuristics, 2nd edition, Springer, 2004.
- Chakrabarti, Soumen, Mining
the Web: Discovering Knowledge from Hypertext Data, Elsevier
(Morgan_Kaufmann Division), 2003.
- Langville, Amy N. and Meyer, Carl D. Google's PageRank and Beyond : the Science
of Search Engine Rankings,
Princeton University Press, 2006.
Work of the Course
The course will have the following components weighted as indicated:
- Problem sets 25%
- Midterm exam 15%
- Second exam 20%
- Class participation 5%
- Project 35%
Problem sets
There will be 6 problem sets distributed throughout the semester.
Exam
There will two take-home exams during the semester, each covering
roughly half the course material. There is no exam during final exam
period.
Project
Each student or pair of students will do a final project of his/her or
their choosing related to the material of the course. The project must
be approved in advance by the course instructor. See the project page for more information and a
list of suggested projects.
Communication
All assignments will be made available on the course Web site (see Schedule and Assignments). ``Handouts'' and
copies of any transparencies used in class will be posted on the course
Web site as well.
We will use Piazza for all course
announcements and quick questions. Students are
responsible for registering on Piazza and adding themselves to cos 435.
Students are also responsible for monitoring the postings on the
Piazza cos 435 site for important course announcements.
Piazza is great for sharing questions and answers with the class
(private questions addressed only to the instructors are also
possible). However, an old fashioned face-to-face meeting is
still best for addressing deeper confusions and other technical
discussions.
Schedule changes will be made on Schedule and Assignments
and announced on Piazza.
Syllabus
(This is the general list of topics and probably a superset of what
we will have time to cover. Please see Schedule
and Assignments for specific topics and reading assignments as the
semester progresses)
- Models of documents
- Query models for searching (focus on keyword-based search)
- Indexing and inverted files
- Ranking documents
- Using linking structure for Web content analysis
- Semantic and feedback techniques
- User behavior-based relevance criteria
- Privacy issues
- Manipulating search engine results (SEOs)
- Distributed computation for by search engines
- Evaluating retrieval systems
- Web crawling
- Document similarity
- Clustering
- Non-text media search: e.g. music, images
- Adding structure to information: databases, XML, the
sematic Web
- System design of search engines: distributed storage and
computing
- Searching dynamic information sources
- Information caching
- Reliability and permanence of information
A.S. LaPaugh content last changed Sun Feb 5 12:33:21 EST
2012