Questions for class discussion (
our discussion is not limited to these, but they will help you prepare):
In "As we may think.," Vannevar Bush
clearly got the technology wrong: he could not know about the
coming digital technology revolution. But ignoring the technology
used,
*What
of Vannevar Bush's vision have we achieved?
*What of Vannevar
Bush's vision do you expect we will eventually achieve?
*What do you think Vannevar
Bush "got wrong" in terms of his vision?
*Are there any parts of his
vision that you think are impossible?
What is your concept of "ideal search"?
In The Search: How
Google and Its Rivals Rewrote the Rules of Business and Transformed Our
Culture, one of Battelle's overarching themes is trust. In Chapter 1, he
discusses several aspects of trust in the context of search. Do
you agree with his assessment? Are there aspects of trust
that he does not discuss?
Written
assignment due this week: NONE
Reading
for discussion today:
*Bush,
Vannevar,
As
we may think,
Atlantic
Monthly, July 1945.
*Battelle,
The Search:
Chapter 1
References for technical
material:
*Howstuffworks
"Computer Memory Basics"
*Howstuffworks
"Types of Computer Memory"
*Howstuffworks
"How Bits and Bytes Work"
*American Standard
Code for Information Interchange - Wikipedia, the free
encyclopedia: "Overview" and "ASCII printable characters"
Methodology of computer search
before the Web
Model of the Web - Graph
structures
Web pages
information in HTML
Using the Web in search, Part I
Class discussion: This
week we will begin by looking at the methods that search engines use to
retrieve and rank text documents (anything consisting primarily of
written words). We will then examine how things change when
documents go on the Web. Think about how you decide if a document
is relevant and how that might be turned into an automated
method. Also bring your questions about why documents get ranked
the way they do. Since Google and the other search engines use
"secret formulas", we won't know, but we can take an educated guess at
what is going on.
Written
assignment due this week: Please visit the
Assignment 1 page.
Non-technical
reading for today:
*Battelle
Chapters 2, 3, 4. Battelle covers a lot of ground quickly because
he concentrates on the history and only mentions the technical
aspects. We'll spend more time understanding the key ideas of
the technical aspects. The history is fun, and we will
certainly include some in our discussion (but little of the history of
all the Web search engines before Google).
References for technical
material:
*(Originally
for week 1)
Information
retrieval - Wikipedia, the free encyclopedia: see the timeline. We will not discuss the
technical development in this entry.
*(Originally for week 1) Amit
Singhal,
Modern
Information Retrieval: A Brief
Overview, In
Bulletin of
the
IEEE Computer Society
Technical
Committee on Data Engineering,
2001, pp. 35-43. (pdf; access limited to Princeton
University.) The mathematical
development in this article is more sophisticated than that which we
will use, especially Section 2.2 on
Probabilistic Models. Read for the main ideas. Read
the math if you are interested.
Class discussion: During
our last class we discussed a lot of technical material. Your
first task is to bring to class your questions on that material.
Also think about the issues Battelle raises in the part of Chapter 7
that I have assigned below. Does Google have any obligation to
have search results change "gracefully" as their ranking algorithm
changes? What is spamming (bad) of search engine results versus
effective presentation (good) to obtain a good ranking for your page?
Written
assignment due this week:
Please visit the Assignment 2 page (pdf).
Reading
for discussion today:
References for technical
material:
*(originally for week 2)
Sergey Brin and Lawrence
Page,
Anatomy of a
search
engine Proc. Intern. World-Wide
Web Conference (WWW7) 1998.
This is the original public description of Google.
Amended guidance on reading this
article (update from week 2 posting):
It is a
technical article, and there are many details I expect you to skip; in
particular, you can skip Sections 4.1 - 4.4 and the appendices.
We are scheduled to start discussing crawling and index building today
(although we may not get to them) and Sections 4.1 - 4 .4 are
relevant. However these sections are terse and
filled with technical jargon. We will discuss the important
points in class. The article is of historical interest and
provides a good outline of the issues of Web search even if you skip
all the technical details.
*(originally for week 2)
Scientific
American: Feature Article: Hypersearching the Web: June
1999