|
Richard L. Smith '70 Freshman Seminar
Google and Ye Shall Find???
|
FRS 117
Fall 2007
|
Directory
General
Information | Schedule and
Assignments
| Blog (login for announcements)
click
here for
weeks 1 through 3
Week 4, Oct. 10:
Topics:
Finding content: Web crawlers
Building an Index
Invisible Web
Economics of search
Class discussion: We will finish our study of the
principal methods search engines use to get documents, index them, and
provide search results. Bring your questions on the technical
material we discussed last week. We will then consider the
economics of search. Battelle raises many issues in Chapters 5,
6, and 7 as he describes the new economics of search. Consider
the magnitude of the change in business model brought about by
click-through ads. What do we gain and lose from this
model? Can this model survive click-through spam? Given
Google's AdSense, is Google a media company or an advertisement
company,
i.e. is its primary business the delivery of content or of ads?
Is news becoming a commodity? These are just some of the
questions to consider. Bring your own issues for discussion as
well.
Reading
for discussion today:
*Battelle Chapters 5, 6, 7
References
for technical
material:
*(originally
for week 3)
Web
search engines. Part 1, IEEE
Computer, Vol. 39(6), 2006. (Access the pdf file
from the top of the menu at the left of the page you reach.) A
very concise summary of crawling.
Use it for the outline; don't worry about the technical details
about computing resources and the technical jargon. We will
discuss the important ideas in class.
*(originally for week 3)
Web
Search Engines: Part 2, IEEE Computer, Vol. 39(8),
2006. (Access the pdf file from the top of the menu at the
left of the page you reach.) A
very
concise summary of indexing. Same remarks as for Part
I.
You can skip the section "Speeding things up" altogether.
*(originally for week 3)
Googlebot - Wikipedia, the free
encyclopedia
*Sites discussing size of
the Web:
*Some Google documentation
on AdWords:
Week 5, Oct. 17:
Topics:
Economics of search, continued
Postponed:
Computing
resources for search
computing power
networking
Distributed computing
Scaling resources
Class discussion: Battelle
covers a lot of history in Chapters 3-7. In our focus on how
search works now, we haven't paid much attention to why earlier Web
search engines failed. We already know one reason: Google
introduced a better ranking algorithm. But economics also played
a role: other search engines either could not become profitable or
could not stay profitable. Re-examine Chapters 3-6 with an eye to
the economic factors involved. Where did economics play a role
and how? Also consider the issues I raised for last week that we
did not discuss fully: Consider
the magnitude of the change in business model brought about by
click-through ads. What do we gain and lose from this
model? Can this model survive click-through spam? Given
Google's AdSense, is Google a media company or an advertisement
company,
i.e. is its primary business the delivery of information or of ads?
Does it matter?
By the time we reach Chapter 7 of Battelle, he is raising issues
of more fundamental changes in the economics of our society. What
are the effects on retailing? Does trademarking become obsolete?
What is the state of the news media? (We will come
back to issues of intellectual property again in a week or two.
Intellectual property is not exculsively an economic issue, but the
debate is certainly driven by economics.)
What issues related to search and economics do you find most important?
Reading
for discussion today:
*review
Battelle Chapters 3- 7
*Chris
Anderson,
The Long Tail (pdf
file). This article is about 15 pages if formated as a typical
print article. We will go more deeply into long-tail economics
when we talk about intellectual property. So, if you are pressed
for time, read this
short
summary by Anderson now, and come back to the article when I post
it again for our discussion of new economic models for
intellectual property.
*Google
seeks dismissal of AA trademark suit, Dallas
Morning News
*Watchdog
drops Google Australia from suit, Herald
Sun, Australia (now only suing parent Google)
*Google/DoubleClick
buy set for EU phase 2; Microsoft, Yahoo to oppose, Euro2day
*Google, DoubleClick EU Review Will Focus on
Market, Not Privacy, Bloomberg.com
References
for technical
material:
Week 6, Oct. 24:
Topics:
Computing resources for search
computing power
networking
Distributed computing
Scaling resources
Postponed:
Quality
of search engine results
trust in results
quality of results versus
goals of search
Improving search engine results
Class discussion: We
will begin the class with final thoughts from last week. I have
listed Battelle Chapter 9, discussing Google's IPO, for discussion. (We
will return to Chapter 8 after fall break.) Does Google's IPO
process say anything about how fundamental a change we are seeing in
the economics of search?
Our first new topic for the week is the computing resources needed for
search. The third of three components of Google's success
is their exceptional managment of their computing resources - getting
the most for their investment. A key to this is
distributing the computing the search engine needs to do among many,
many computers. Think about how all the activities of a search
engine can be shared among many computers.
After covering computing resources, we will come back to the quaility
of search results. Think about the kinds of searches that have
been easiest and those that have been most difficult for
you. Do the easy searches share common features? the
difficult searches? What improvements to search - either new
options for specifying the search or new ways to rank or present the
results - would be helpful? We discussed trusting the results of
search a bit earlier in the semester; what issues in trusting
results would you like to revisit or introduce?
For our discussion of improving search engine results, we will start
considering what search engines other than Google have to offer that is
different, beginning with some well-known techniques that Google
chooses not to implement.
No
written
assignment due this week, but:
Final paper topic
description due Monday,
November 5, 2007 at 5pm. See Project Guidelines online.
Reading
for discussion today:
References
for technical
material:
*(Originally
for week 5)
Barroso, L.A.,
Dean, J. and Holzle, U., Web
search for a planet: The Google cluster architecture, IEEE
Micro, Vol. 23(2), March-April 2003, pp 22-28.
(Access the pdf file
from the top of the menu at the left of the page you reach.) Don't
worry about the technical details. We will
discuss the important ideas in class.
*(Originally
for week 5)
Behold the server farm, Fortune
Magazine, Jul. 27, 2006. A non-technical piece about where where
all those bits sit. Here's the lead-in:
"They're ugly. They require a small
city's worth of electricity. And
they're where the Web happens. Microsoft, Google, Yahoo, and others are
spending billions to build them as fast as they can."
click
here for
current weeks
last revision Fri Oct 26 12:07 EDT
2007
Copyright
2007, Andrea S. LaPaugh