FRS 117: Google and Ye Shall Find???
Fall 2007
Assignment 3
Due Friday, October 12 at 5PM

Blog entry:

Now that you know the basic methods that Google employs to rank pages, it is time to explore a bit.   Try to find some results of a Google search that are interesting in terms of which documents are ranked highly and write a blog entry about them.  What does it mean to be interesting?  I’ll let you decide what you think is interesting, but here are some examples:

Spamming of the search is interesting.  Consider Loren Baker’s article Google Loves Transparent Links & Hit Counter Spam , which we looked at in class.  I don’t expect such a detailed investigation or such a long article, but you can point out some suspicious results and follow up a bit.
Results that illustrate aspects of Google’s ranking algorithm are interesting.  For example, a document that is ranked unexpectedly highly because of the anchor text of links to it would illustrate how much stock Google puts in anchor text.

Is it possible to find such an example without hours of search experiments?  Well, I tried about 3 queries and came up with some potential candidates.  My strategy was to try query terms where there is controversy or a couple of interpretations, one of which might be less common but more lucrative or politically charged.  One of my potentially interesting results was to compare two queries:  weed versus weeds.   With stemming, both queries should have many of the same documents as relevant.  “Weed” has a second meaning (marijuana) that “weeds” does not have.  Here are the first results page for weed and the first results page for weeds.  Why does “Weed Identification” not make it to the first page of “weed” results?    Can we see why the marijuana sites win the high-ranked spots?  Another potentially interesting search was “toxic waste” (with quotes).  I thought some attorneys’ sites might be highly ranked,  and it would be interesting to try to figure out how they achieved the high ranking.  Instead what I found (see the first results page for "toxic waste" ) was a candy site ranked 6th.  Now how did they achieve that?  Was it spam or does the site just have legitimately high page rank and good use of the search terms?  I didn’t investigate further, so I can’t answer that.

Both Google and Yahoo provide means to see the pages linking to a give page.  In Google, searching for link:web-site-name (e.g. link:www.princeton.edu) will give you the pages that Google has found linked to the named web site (e.g.  www.princeton.eduSee Google Help Center: Advanced Operators for documentationIn Yahoo, you can do the same inquiry, but you must include the full address: link:http://web-site-name ( e.g. link:http://www.princeton.eduor you can go to Yahoo’s site explorer See About Site Explorer  for documentation.   Although Yahoo gives links from its own repository of information, these can be helpful in looking at what links Google has seen – especially since it is not clear that a link: query in Google gives all the links into a page that Google has seen.

Do your exploration and write a blog entry about how you explored and what you found.  Your entry need not be long – about 250 words is fine.  If you have more to say, it is fine to be longer.