Quantitative Search Engine Study:
This is the search engine study we designed together in class on Dec.
5. In this study you will make a quantitative comparison of
the results provided by Google and Yahoo -- a run-off of the "big two"
I will aggregate the numbers you all provide to declare the
winner (if any).
Query choice: Choose one query
to run on both search engines. The query should have the
following properties:
- It should be a query with the purpose of gathering information
about a subject. Your goal should be to find several sites
with good information. Imagine you are doing a report for a
class. Feel free to use a topic you actually need to research or
have researched.
- As for assignment 5, you should try to construct a query that you
think is neither too
easy nor too difficult for a search engine to provide good
results.
- The query should be constructed to avoid anticipated
ambiguity. Use several search terms if necessary.
- Do not use advanced search properties for either search engine.
Exception: you may use quotes around phrases, e.g. "computer
architecture".
- Make sure you use the same query for both search engines.
You may use the same query as you used for assignment 5 if it satisfies
all the properties.
Analysis of results: Run
the query on each of the search engines and examine the first 20
results of the organic Web search. For each result, examine the
Web page and decide whether it
is relevant to the query or not. Compute the
following measures for each search engine:
- The precision of the first 20 results. This is the
percentage of the 20 results that you determine to be relevant.
For example, if you find 7 results to be relevant, the precision
is 7/20, or 35%.
- The rank (i.e. position from the top) of the third relevant
result that you find. Another way to say this is the
number of results you must examine in rank order to find 3 relevant
ones. If the search engine does not return 3
relevant results among the first 20, report "doesn't exist".
Assessing relevance:
Relevance is subjective.
You
decide what pages are relevant. Simply containing the query
term(s) somewhere does not make a page relevant. The page must be
useful for your purpose of gathering information. However, do not
make your criteria unrealistic. For example, do not require
some very specific information to be present that was not in any way
captured by the query terms you used -- if you use the search query
cancer collies (looking for
information about cancer in the collie breed of dogs), it seems
unfair to
require that a page specifically have survival rates to be
relevant.
Google and Yahoo both have "translate this page" feature.
Therefore, if you get results in a foreign language, you may use the
automated translation and evaluate the translated version for
relevance. If you understand the language, evaluate the
original. If you can't understand the page, treat it as
irrelevant. In other words, do what you would do if the search
was
for your own information.
Email me with any concerns you have about determining relevance.
Reporting your results:
- Write a short blog entry reporting your results. Include
the query and a description of what you looked for to score a
result as relevant.
- Send me an email with just the following
information:
- your name
- the query
- Google precision of first 20 results:
% (range 0 -100)
- Google rank of third relevant result:
(range 3 - 20 or "doesn't exist")
- Yahoo precision of first 20 results:
% (range 0 -100)
- Yahoo rank of third relevant result:
(range 3 - 20 or "doesn't exist")
A.S. LaPaugh Fri Dec 7 14:20
EST 2007