Chapter 15: Section 15.4 (read only for basic ideas)
Also of interest, original papers:
(HITS algorithm) Kleinberg, Jon, Authoritative
sources in a hyperlinked environment, Journal of the
ACM, Vol. 46, No. 5(Sept. 1999), pp.604-632.
(Earlier versions appeared in Proc. 9th ACM-SIAM Symposium on Discrete Algorithms,
1998 and as IBM Research Report RJ 10076, May 1997.)
(PageRank algorithm) Page, Larry and Sergey Brin, R.
Motwani, T. Winograd, The PageRank Citation Ranking:
Bringing Order to the Web, Stanford
Digital Library Technologies Project TR, Jan. 1998.
(Early version: L. Page. PageRank: Bringing order to the web.
Stanford Digital Libraries Working Paper 1997-0072, Stanford
University, 1997. )
Clarke, Charles L. A. ,Craswell,
Nick and Voorhees, Ellen M., Overview
of
the
TREC
2012
Web
Track
(pdf), Proc. Twenty-First Text REtrieval Conference
(TREC 2012), National Institute of Standards and
Technology (NIST), 2012.
Wed. Feb. 19: Evaluation,
cont.; Index structure and use.
details on B+ trees in Database
Management
Systems by Raghu Ramakrishnan and Johannes Gehrke (Third
Edition, McGraw-Hill, 2003): Chapter 10, Sections 3-6
(pp. 344-356). Book on reserve in Engineering Library.
MapReduce: simplified
data processing on large clusters, Jeffrey Dean and Sanjay Ghemawat, Communications of the ACM, 51(1), Jan. 2008. (Special
50th Anniversary issue: Breakthrough research: a preview of things to come.)
The Adaptive Web, P.
Brusilovsky, A. Kobsa, W. Nejdl, eds., Lecture Notes in Computer
Science book series Vol 4321, Springer, 2007.
This book contains several relevant chapters. Chapter
6: Personalized Search on
the World Wide Web by A. Micarelli, F.Gasparetti,
F.Sciarrone and S. Gauch is of particular interest. The
chapters are available as pdf files to members of the
Princeton University community by accessing them from the
princeton.edu domain.
Introduction
to Information Retrieval, Chapter 16: Section 6.3
is recommended if you are going to read research papers on
clustering. We will touch on external evaluation
criteria very briefly.
Also of interest - today's
material drawn from these references:
Structured
Data on the Web, Michael J. Cafarella, Alon
Halevy, and Jayant Madhavan, Communications
of
the ACM (CACM), Vol 54 (2) February 2011, pp 72-79.
Harnessing the Deep
Web: Present and Future (pdf), Jayant Madhavan,
Loredana Afanasiev, Lyublena Antova, and Alon
Halevy, 4th Biennial
Conference on Innovative Data Systems Research (CIDR),
Jan. 2009.
Searching
the deep web, Alex Wright, Communications of the ACM, Vol. 51 No. 10
(Oct. 2008), pages 14-15.
Google's
Deep-Web Crawl (pdf),Jayant Madhavan, David Ko,
Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Y.
Halevy, 34th Intern. Conf. on Very Large Data
Bases, VLDB Endowment, Aug. 2008.
Crawling deep web entity pages, Yeye He, Dong Xin, Venkatesh Ganti, Siriram
Rajaraman, and Nirav Shah, Proc. Intern. Conf. on Web Search and Data
Mining (WSDM), ACM, 2013, pp.
355-364.
Accessing
the deep web, Bin He, Mitesh Patel, Zhen Zhang,
and Kevin Chen-Chuan Chang, Communications of the ACM, Vol. 50 No. 5 (May
2007), pages 94-101.
Searching
for Hidden-Web Databases, Luciano Barbosa and
Juliana Freire., Proceedings
of the 8th ACM SIGMOD International Workshop on Web and
Databases (WebDB), pp. 1-6, ACM 2005. (A more
recent, more complicated version of the crawler is described
at the 2007 WWW conf.)
Cognetic
Systems Inc. is a company that sells an XML database
product called XQuantum, which uses XQuery.
They have an XQuery
demo site that offers XML versions of Shakespeare plays
(all 37) - choose "Full-Text Search" at left.
We Feel
Fine: An Almanac of Human Emotion by Sep Kamvar and
Jonathan Harris, video on YouTube. See also the book We Feel Fine: An Almanac of
Human Emotion by Sep Kamvar & Jonathan Harris,
Scribner, Dec. 2009.
Learning
from Bullying Traces in Social Media, Jun-Ming Xu,
Kwang-Sung Jun, Xiaojin Zhu and Amy Bellmore, Proc.
Conf. of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies
(NAACL HLT), Assoc. for Computational Linguistics, 2012,
pp. 656-666.
WEEK 12
Mon. April 28
Social Networks: structure; Privacy Issues in Information Systems:
technical aspects
Second
take-home
exam
distributed
Wednesday April 30, 2014 at the end of
class, due 4:30 PM sharp Friday May 2, 2013. Project Reportdue
5:00 pm Dean's Date, Tuesday May 13, 2014 Project
Demonstrations between May 14 and May 19
* on reserve in the Engineering Library
last revised Wed
Jun 4 17:20:03 EDT 2014 Copyright
2010,
2011, 2012, 2013, 2014 Andrea S. LaPaugh