|
Computer Science 435
Information Retrieval, Discovery, and Delivery
Andrea
LaPaugh
|
Spring 2013
|
Directory
General Information |
Schedule
and Assignments | Project
Page | Announcements
Course Summary
This course examines the methods used to gather, organize and search
for information in large digital collections (e.g. web search
engines). We study classic techniques of indexing documents and
searching text and also algorithms that exploit properties of the
Web (e.g. links), of social networks and of other digital
collections, including multimedia collections. Techniques include
those for relevance and ranking of documents, exploiting user
history, and information clustering. We also examine systems aspects
of search technology: how distributed computing and storage are used
to make information delivery efficient.
Prerequisites
COS 226.
Administrative Information
Meeting time: Monday,
Wednesday 1:30-2:50pm
Meeting place: Friend
Center
008
Extra meetings: If we need to make up a class due to my
schedule, we may have a class during reading period and/or an
evening class during the semester. Class participants will be
consulted before any make-up class time is chosen.
Professor: Andrea
LaPaugh, aslp@ ...
304 Computer Science Building, 258-4568
Office hours: TBA or by appointment. Easiest
way to make an appointment is by email.
Teaching Assistant:
Arpan Ghosh, akghosh@ ...
003 Computer Science Building
Office hours: TBA
Course secretary: Mitra Kelly, 323 CS building, 258-4562,
mkelly@ ...
All email addresses are at cs.princeton.edu
Reading
Required reading:
- Manning, Christopher D.; Raghavan, Prabhakar;
Schütze, Hinrich, Introduction
to
Information Retrieval, Cambridge University
Press, 2008.
- The link is to a complete online version of the
textbook.
- We will also use reprints and
other online material.
Supplemental reading (check back for additions as we progress in the
semester):
On reserve at Engineering
Library:
- Croft, Bruce; Metzler, Donald; Strohman, Trevor,
Search Engines: Information
Retrieval in Practice, Addison Wesley, 2010.
- Chakrabarti, Soumen, Mining
the
Web: Discovering Knowledge from Hypertext Data,
Elsevier (Morgan_Kaufmann Division), 2003.
- Langville, Amy N. and Meyer, Carl D. Google's PageRank and Beyond :
the Science of Search Engine Rankings, Princeton
University Press, 2006.
Work of the Course
The course will have the following components weighted as indicated:
- Problem sets 25%
- Midterm exam 15%
- Second exam 20%
- Class participation 5%
- Project 35%
Problem sets
There will be 6 problem sets distributed throughout the
semester.
Exam
There will two take-home exams during the semester, each covering
roughly half the course material. There is no exam during final exam
period.
Project
Each student or pair of students will do a final project of his/her
or their choosing related to the material of the course. The project
must be approved in advance by the course instructor. See
the project page for more information and a list of
suggested projects.
Communication
We will use the Blackboard
course Web site for all course announcements and
course materials, including assignments, "handouts" and pdf versions
of any transparencies used in class. All important
course announcements will appear on the Blackboard course
home page. All materials will be found in the Course
Materials area of the Blackboard course site
(use left menu). The Schedule
and Assignments page in Course Materials will
give all reading and homework assignments. Schedule
changes will be made on the Schedule and Assignments page
and announced on Announcements.
We will use Piazza for quick
questions and discussions. Students are responsible for
registering on Piazza and adding themselves to the Spring 2013 COS
435 enrollment. This can be done by logging in on Blackboard,
selecting
COS435_S2013 Information Retrieval, Discovery, and Delivery
and selecting Piazza from the menu at left. Students
are also responsible for monitoring the announcements on Blackboard
and the postings on the Piazza COS 435
Q&A page.
Piazza is great for sharing questions and answers with the class
(private questions addressed only to the instructors are also
possible). However, an old fashioned face-to-face meeting is
still best for addressing deeper confusions and other technical
discussions.
Syllabus
(This is the general list of topics and probably a superset of
what we will have time to cover. Please see
Schedule and Assignments for specific topics and reading
assignments as the semester progresses)
- Models of documents
- Query models for searching (focus on keyword-based search)
- Indexing and inverted files
- Ranking documents
- Using linking structure for Web content analysis
- Semantic and feedback techniques
- Using user behavior
- Using social network information
- Privacy issues
- Evaluating retrieval systems
- Web crawling
- Document similarity
- Clustering
- Non-text media search: e.g. music, images
- System design of search engines: distributed storage and
computing
- Searching dynamic information sources
- Information caching
- Reliability and permanence of information
A.S. LaPaugh content last changed Tue Feb 5
12:21:20 EST 2013