WEEK 1
Tues. Feb. 3:Overview of
course topics and organization. Inspiration:
discussion of As
We May Think.
Begin classic
information retrieval of text if time.
Note: Introduction
to Information Retrieval discusses building indexes for
document collections before it discusses models of documents and
queries. Therefore, we are reading later chapters first.
Please don't worry about the exact content of the index for now.
If you prefer to read a text that treats the topic in the same order as
we are going to, you can read the chapters assigned below in Modern
Information Retrieval instead.
*Modern Information Retrieval Chapter 1, section
1.4; Chapter 2 sections 2.1-2.4 and 2.5.1 - 2.5.3.
(HITS algorithm) Kleinberg, Jon,
Authoritative sources in a hyperlinked environment, Journal of
the ACM, Vol. 46, No. 5(Sept. 1999), pp.604-632. (Earlier
versions appeared in Proc. 9th
ACM-SIAM Symposium on Discrete Algorithms, 1998 and as IBM
Research Report RJ 10076, May 1997.)
(PageRank algorithm) Page, Larry and Sergey Brin, R. Motwani,
T. Winograd, The PageRank Citation Ranking: Bringing
Order to the Web, Stanford Digital Library
Technologies Project TR, Jan. 1998. (Early version: L.
Page. PageRank: Bringing order to
the web. Stanford Digital
Libraries Working Paper 1997-0072, Stanford University, 1997. )
Thurs. Feb. 19:Evaluation of
retrieval systems; spamming search engines
Description of bit-level variable-length encodings of
positive integers (Elias gamma-code and delta-code and Golomb code
covered in
class) in *Modern Information Retrieval Section 7.4.5.
MapReduce: simplified
data processing on large clusters, Jeffrey
Dean
and Sanjay Ghemawat, Communications of the
ACM, 51(1), Jan. 2008. (Special
50th Anniversary issue: Breakthrough
research: a preview of things to come.)
Spring
break WEEK 7
Tues. March 24: Search refinement;using users behavior
The following book contains several relevant chapters. The chapters are available
as pdf files to members of the Princeton University community by
accessing them from the princeton.edu domain:
The Adaptive Web;P. Brusilovsky,
A. Kobsa, W. Nejdl, eds., Lecture
Notes in Computer Science book series Vol 4321, Springer, 2007.
The following chapter in The Adaptive Web is of particular
interest:
Introduction
to Information Retrieval, Chapter 16, Section 6.3 is
recommended if you are going to read research papers on
clustering. We will touch on external evaluation criteria very briefly.
An XQuery Sandbox example tool can be found on the eXist
Project Web site. The eXist Project is centered around eXist-db, which is (in their
words) "an open source database management system entirely built
on XML technology."
WEEK 9
Assignment
5 (pdf)is now available.
Tues. April 7:Detecting
near-duplicate documents
WEEK 12 Take-home EXAM 2: DISTRIBUTED end
of class Tuesday, April 28. DUE beginning of class Thursday,
April 30.
Tues. April 28:Non-text retrieval: image retrieval
Also of interest - references
for today:
Multimedia
IR: Indexing and Searching, Christos Faloutos, Chapter 12
in *Modern Information Retrieval. Includes a discussion
of characterizing images using color histograms.
Harnessing
the Deep Web: Present and Future (pdf), Jayant Madhavan,
Loredana Afanasiev, Lyublena Antova, and Alon Halevy, 4th Biennial Conference on Innovative Data
Systems Research (CIDR), Jan. 2009.
Searching
the deep web, Alex Wright, Communications
of the ACM, Vol. 51 No. 10 (Oct. 2008), pages 14-15.
Google's
Deep-Web Crawl (pdf),Jayant Madhavan, David Ko, Lucja Kot,
Vignesh Ganapathy, Alex Rasmussen, and Alon Y. Halevy, 34th
Intern. Conf. on Very
Large Data Bases, VLDB Endowment, Aug. 2008.
Accessing
the deep web, Bin He, Mitesh Patel, Zhen Zhang, and Kevin
Chen-Chuan Chang, Communications
of the ACM, Vol. 50 No. 5 (May 2007), pages 94-101.
* on reserve in the Engineering Library
last revised Fri May 1
11:20:32 EDT 2009 Copyright
2009 Andrea S. LaPaugh