You may discuss the general methods of solving
the problems with other students in the class. However, each student
must work out the details and write up his or her own solution to each
problem independently.
Some problems
have been used in previous offerings of COS 435. You are NOT allowed to
use any solutions posted for previous offerings of COS 435 or any
solutions produced by anyone else for the assigned
problems. You may use other reference materials; you must
give citations to all reference materials that you use.
This problem is our class
experiment with evaluating search engines. We will compare Google, to Microsoft's Bing. (You may be interested in comScore's
January
2012
U.S.
Search Engine Rankings.
Take special note of the information on "Powered By" Reporting at the
bottom of the page. )
This is only meant to be an exercise, so I do not expect we can do a
thorough enough job to call the study valid. But it will have the
components of a full evaluation and hopefully we will get something
interesting. You may be interested in the equally (more?)
unscientific 2011
comparison of Google and Bing by Conrad Saam of Search Engine Land.
Part A: Choose an information need. The information need should
require gathering information about a subject from several Web sites
with good information. An example of an activity that would
provide an appropriate information need is doing a report for a
course. You should choose an information need that that you
think is neither too easy nor too difficult for a search
engine. For example, one expects looking for information on the
rotovirus to yield essentially 100% relevant pages - too easy;
conversely, looking for information on the history of the LaPaugh
family in Europe might (at best) yield one relevant
result in 20 - too hard.
Write a description of your
information need
that can be used to judge whether any given Web search result is
relevant or not. Use the style of the TREC topic specifications,
using title,
description,
and narrative
sections. (See the examples of TREC topic specifications in the class
presentation on the evaluation of retrieval systems.) You will be
distinguishing between "highly relevant" and "simply relevant", so you
may wish to distinguish these in your narrative section, but it is fine
to leave the distinction between "highly relevant" and "simply
relevant" as a quality judgment. In either case, you should be
demanding in your criteria for "highly relevant". Once you
have your information need described, write one query that you will use on both
search engines to capture the information need. The query
should have the following properties:
Before proceeding to Part B, submit your
description of information need and your query to Professor LaPaugh by
email for approval. This is primarily to make sure no
two people have the same information need or query.
Part B:
Run your query on each of Google
and Bing. Run the queries
while remaining as anonymous as possible to the search engines: without
Bing or Google toolbars active, with the "Suggested Sites" feature of
Internet Explorer off, and logged off your Google and Windows live
accounts. Consider only the regular search results, not sponsored
links. Ignore “image results”, “video results”, “news
results” and any other special results -
these are not counted as part of the first 10 results on the first
results page and may cause the first results page to have less than 10
regular results. If you are having trouble with several
results in languages other than English, you can go to the advanced
search and choose English only, but then do this for both of the search
engines. (In my trials, I did not get foreign-language results with a
regular search, so this may not be an issue.) Record the
first 30 results returned.
Pooling: To get a pool for hand assessment, take the
first 20 results from each search engine. Remove duplicates, and
visit each result to decide relevance. Score each result as
"highly relevant" , "simply relevant" or irrelevant according to your
description of Part A. Record the
size of the pool (number of unique results produced by the combined
results 1 - 20 of each search engine). Also record the number of
"highly relevant" and "simply relevant" results in the pool.
Scoring: After constructing the pool, go back and score
each of the first 30 results returned by each search engine based on
your scoring of the pool. If a result does not appear in the
pool, it receives a score of irrelevant. If a document
appears twice under different URLs in the list for one search engine,
count it only for its better ranking for that search engine and delete
any additional appearances within the same list. In this case there
will be less than 30 distinct results returned by the search
engine. Do not go back to the search engine to get more
results. Keep only what was returned in the first 30, with their original
ranks. For each search engine, calculate the following
measures. For
all but discounted cumulative gain (measure 4), "simply relevant" and
"highly relevant" should be lumped together as "relevant".
The
first 4 measures are ways of capturing the quality of the first 20
results, which is about as far as most people look. The
fifth measure gives credit to one search engine for finding relevant
documents returned earlier by the other search engine.
What to hand in for Part B: Email to Professor
LaPaugh and Yiming Liu:
The
pool size, number of relevant results in the pool, and the 5 scores
will be averaged across the class, so please separate them from the
other parts of your email and report each number on a separate
line, clearly labeled as to what it is.
Part C:
What observations do you make about usability issues (user
friendliness) of each search engine - separate from the quality of
results you have been assessing in Part B? You may email your
observations with Part B, but write them after, and clearly
separated from, the Part B results.