Blog entry:
Now that you know the basic methods that Google
employs to
rank pages, it is time to explore a bit.
Try to find some results of a Google search that are interesting
in
terms of which documents are ranked highly and write a blog entry about
them. What does it mean to be interesting? I’ll let you decide what you think is
interesting, but here are some examples:
- Spamming of the search is
interesting. Consider Loren Baker’s
article Google
Loves Transparent Links & Hit Counter Spam , which we
looked at
in class. I don’t expect such a detailed
investigation or such a long article, but you can point out some
suspicious
results and follow up a bit.
Results that illustrate
aspects of Google’s ranking algorithm are interesting.
For example, a document that is ranked unexpectedly highly
because of
the anchor text of links to it would illustrate how much stock Google
puts in
anchor text.
Is it possible to find such an example without hours of
search experiments?
Well, I tried about
3 queries and came up with some potential candidates.
My strategy was to try query terms where there is
controversy or a couple of interpretations, one of which might be less
common
but more lucrative or politically charged.
One of my potentially interesting results was to compare two
queries:
weed versus
weeds.
With stemming,
both queries should have many of the same documents as relevant.
“Weed” has a second meaning (marijuana) that “weeds” does not
have.
Here are
the
first results page for weed and
the
first results page for weeds.
Why does “Weed Identification” not make it to the first page
of “weed” results?
Can we see
why the
marijuana sites win the high-ranked spots?
Another potentially interesting search was “
toxic
waste” (with quotes).
I thought some
attorneys’ sites might be
highly ranked,
and it would be
interesting to try to figure out how they achieved the high
ranking.
Instead
what I found (see the
first
results page for "toxic waste"
)
was a candy site ranked 6th. Now how did they achieve that?
Was it spam or does the site just have
legitimately high page rank and good use of the search terms? I didn’t investigate further, so I can’t
answer that.
Both Google and Yahoo provide
means to see the pages linking to a give page.
In Google, searching for link:web-site-name (e.g. link:www.princeton.edu)
will give you the pages that Google has found linked to the named web
site (e.g. www.princeton.edu
) See Google Help
Center: Advanced
Operators for
documentation. In Yahoo, you can do the same inquiry, but
you must include the full address: link:http://web-site-name ( e.g. link:http://www.princeton.edu) or
you can go to Yahoo’s site explorer .
See About
Site Explorer for documentation.
Although Yahoo gives links from its own repository of
information, these
can be helpful in looking at what links Google has seen – especially
since it
is not clear that a link: query
in Google gives all the links into a page that Google
has seen.
Do your exploration
and write a
blog entry about how you explored and what you found.
Your entry need not be long – about 250 words
is fine. If you have more to say, it is fine to be longer.