Instead of meeting today, we will have
a joint
meeting this Friday (2/18), 1:30 - 4:00 in Room 301, with Kai Li's
seminar COS 598E. This will be a guest presentation by Dr.
Andrei Broder of IBM Watson Research Center on "Resemblance,
Containment, and Min-Wise Permutations." Tentative papers
can be found on the COS598E
Web site; check that web site close to Feb.18 for updates.
Mon. Feb 21:
Content retrieval on peer-to-peer networks
Reading:
A survey of Web metrics Devanshu
Dhyani, Wee Keong Ng, Sourav S. Bhowmick ACM Computing Surveys
December 2002
Mon. Feb 28: Using graph
characteristics, continued; using Web characteristics
Reading:
(MOVED
FROM LAST WED.) Block-level link analysis Deng
Cai, Xiaofei He, Ji-Rong Wen, Wei-Ying Ma SIGIR 2004
(Related paper: Block-based web search Deng
Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma SIGIR 2004.)
Text
Classification from Labeled and Unlabeled Documents using EM, K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, Machine Learning (Kluwer,
now Springer), 39(2 - 3)
May 2000, pp.103 - 134. (Note pdf or html available from given link to articlie
summary page.)
Wed. March 9: Data
streams, continued.
Reading:
(MOVED FROM LAST
WED.) Detecting
Change in Data Streams.Dan
Kifer, Shai Ben-David, and Johannes Gehrke, Proceedings
of the 30th International Conference on Very Large Data Bases (VLDB
2004). Toronto, Canada. August 2004. (presented by
Anish Muttreja).
background for
"Deploying Large ...": Reliability
and Security in the CoDeeN Content Distribution Network (link to
KyoungSoo Park's Web Site containing PDF) Limin Wang, KyoungSoo
Park, Ruoming Pang, Vivek Pai, and
Larry Peterson, USENIX
Annual Technical
Conference 2004.
Wed. March 23: Content
delivery, cont. and proxy caching
Reading:
Implications of proxy
caching for provisioning networks and servers, Mohammad S.
Raunak, Prashant Shenoy, Pawan Goyal, Krithi Ramamritham, Proceedings
of the 2000 ACM SIGMETRICS international conference on Measurement and
modeling of computer systems (SIGMETRICS 2000).
Clustering Data
Streams: Theory and Practice (link to Guha's paper list), Guha,
S., A. Meyerson, N. Mishra, R. Motwani, and
L. O'callaghan, IEEE Transactions on Knowledge and
Data
Engineering (IEEE TKDE). Special issue on
Online
Analysis and Querying of Continuous Data Streams. 15(3): 515-528,
2003. (Combines
two conference papers: Clustering data streams, S. Guha, N. Mishra, R. Motwani, L. O'Callaghan, 41st
Annual Symposium on
Foundations of Computer Science (FOCS 2000) and Streaming-Data
Algorithms for High-Quality
Clustering, Liadan O'Callaghan, Adam
Meyerson, Rajeev Motwani, Nina Mishra, Sudipto Guha, 18th
International Conference on Data
Engineering (ICDE'02) ) (Related
paper: Better Streaming Algorithms for Clustering
Problems (link to PostScript on Moses Charikar's site),
M. Charikar, L. O'Callaghan and R. Panigrahy, Proc. of the ACM
Symposium on Theory of Computing, 2003.
(improves Guha's k-median result).)
Wed. March 30: Data
streams: algorithms continued
Reading:
Mining
Surprising Periodic Patterns Jiong Yang, Wei Wang, Philip
S. Yu Data
Mining and Knowledge Discovery(DMKD),
Volume 9, Number 2 September 2004pp. 189 - 216.