Princeton University
Computer Science Dept.

Richard L. Smith '70 Freshman Seminar

Google and Ye Shall Find???


Andrea LaPaugh

FRS 117

Fall 2007


Directory
                   General Information   |   Schedule and Assignments    |    Blog  (login for announcements)

 

click here for weeks 1 through 3

Week 4, Oct. 10:

Topics: 
Finding content:  Web crawlers
Building an Index
Invisible Web
Economics of search

Class discussion:  We will finish our study of the principal methods search engines use to get documents, index them, and provide search results.  Bring your questions on the technical material we discussed last week.  We will then consider the economics of search.  Battelle raises many issues in Chapters 5, 6, and 7 as he describes the new economics of search.  Consider the magnitude of the change in business model brought about by click-through ads.  What do we gain and lose from this model?  Can this model survive click-through spam?  Given Google's AdSense, is Google a media company or an advertisement company, i.e. is its primary business the delivery of content or of ads?  Is news becoming a commodity?  These are just some of the questions to consider.  Bring your own issues for discussion as well.

Written assignment due this week:  The technical problems and the blog assignment are described in separate documents: Assignment 3 technical problems(pdf), and Assignment 3 blog assignment (html).


Reading for discussion today:
*Battelle Chapters 5, 6, 7

References for technical material:
*(originally for week 3)Web search engines. Part 1,  IEEE Computer, Vol. 39(6), 2006.   (Access the pdf file from the top of the menu at the left of the page you reach.) A very concise summary of crawling. Use it for the outline;  don't worry about the technical details about computing resources and the technical jargon.  We will discuss the important ideas in class.
*(originally for week 3)Web Search Engines: Part 2,  IEEE Computer, Vol. 39(8), 2006.   (Access the pdf file from the top of the menu at the left of the page you reach.)  A very concise summary of indexing.  Same remarks as for Part I.  You can skip the section "Speeding things up" altogether.
*(originally for week 3) Googlebot - Wikipedia, the free encyclopedia
*Sites discussing size of the Web:
Internet Domain Survey by Internet Systems Consortium, Inc.
Netcraft: September 2007 Web Server Survey
*Some Google documentation on AdWords:
Google AdWords Help Center: How Are Ads Ranked?
Google AdWords Help Center: What is a 'Quality Score' and how is it calculated?
Google AdWords: Learning Center


Week 5, Oct. 17:

Topics:
Economics of search, continued
Postponed:
Computing resources for search
             computing power
             networking
Distributed computing
Scaling  resources

Class discussion:  Battelle covers a lot of history in Chapters 3-7.  In our focus on how search works now, we haven't paid much attention to why earlier Web search engines failed.  We already know one reason: Google introduced a better ranking algorithm.  But economics also played a role: other search engines either could not become profitable or could not stay profitable.  Re-examine Chapters 3-6 with an eye to the economic factors involved.  Where did economics play a role and how?  Also consider the issues I raised for last week that we did not discuss fully:  Consider the magnitude of the change in business model brought about by click-through ads.  What do we gain and lose from this model?  Can this model survive click-through spam?  Given Google's AdSense, is Google a media company or an advertisement company, i.e. is its primary business the delivery of information or of ads? Does it matter?
 
By the time we reach Chapter 7 of  Battelle, he is raising issues of more fundamental changes in the economics of our society.  What are the effects on retailing?  Does trademarking become obsolete?    What is the state of the news media?  (We will come back to issues of intellectual property again in a week or two.  Intellectual property is not exculsively an economic issue, but the debate is certainly driven by economics.)  

What issues related to search and economics do you find most important?

Written assignment due this week:  Please visit the Assignment 4 page (pdf).


Reading for discussion today:
*review Battelle Chapters 3- 7
*Chris Anderson,   The Long Tail (pdf file).  This article is about 15 pages if formated as a typical print article.  We will go more deeply into long-tail economics when we talk about intellectual property.  So, if you are pressed for time, read this short summary by Anderson now, and come back to the article when I post it again for our discussion of  new economic models for intellectual property.
*Google seeks dismissal of AA trademark suit, Dallas Morning News
*Watchdog drops Google Australia from suit, Herald Sun, Australia (now only suing parent Google)
*Google/DoubleClick buy set for EU phase 2; Microsoft, Yahoo to oppose, Euro2day
*Google, DoubleClick EU Review Will Focus on Market, Not Privacy, Bloomberg.com

References for technical material:
postponed

Week 6, Oct. 24:

Topics:
Computing resources for search
             computing power
             networking
Distributed computing
Scaling  resources

Postponed:
Quality of search engine results
        trust in results
        quality of results versus goals of search
Improving search engine results

Class discussion: We will begin the class with final thoughts from last week.  I have listed Battelle Chapter 9, discussing Google's IPO, for discussion. (We will return to Chapter 8 after fall break.)  Does Google's IPO process say anything about how fundamental a change we are seeing in the economics of search? 

Our first new topic for the week is the computing resources needed for search.  The third of three components of  Google's success is their exceptional managment of their computing resources - getting the most for their investment.   A key to this is distributing the computing the search engine needs to do among many, many computers.  Think about how all the activities of a search engine can be shared among many computers.

After covering computing resources, we will come back to the quaility of search results.  Think about the kinds of searches that have been easiest and those that have been most difficult for you.   Do the easy searches share common features?  the difficult searches?  What improvements to search - either new options for specifying the search or new ways to rank or present the results - would be helpful?  We discussed trusting the results of search a bit earlier in the semester;  what issues in trusting results would you like to revisit or introduce?  

For our discussion of improving search engine results, we will start considering what search engines other than Google have to offer that is different, beginning with some well-known techniques that Google chooses not to implement.


No written assignment due this week, but:
Final paper topic description due Monday, November 5, 2007 at 5pm.  See Project Guidelines online.


Reading for discussion today:

*Battelle Chapter 9
*Competing for Clients, and Paying by the Click, NY Times, October 15, 2007. 

References for technical material:
*(Originally for week 5) Barroso, L.A.Dean, J.  and  Holzle, U.,   Web search for a planet: The Google cluster architecture,  IEEE Micro, Vol. 23(2), March-April 2003, pp 22-28.   (Access the pdf file from the top of the menu at the left of the page you reach.) Don't worry about the technical details.  We will discuss the important ideas in class.
*(Originally for week 5) Behold the server farm, Fortune Magazine, Jul. 27, 2006.  A non-technical piece about where where all those bits sit.  Here's the lead-in:
"They're ugly. They require a small city's worth of electricity. And they're where the Web happens. Microsoft, Google, Yahoo, and others are spending billions to build them as fast as they can."


click here for current weeks



last revision  Fri Oct 26 12:07 EDT 2007
Copyright  2007,  Andrea S. LaPaugh