COS 435, Spring 2008 - Problem Set 3
Due at 5:00pm, Friday, March 7, 2008.
Collaboration Policy
You may discuss the general methods of solving the problems with other
students in the class. However,
each student must work out the details and write up his or her own
solution to each problem
independently.
Lateness Policy
A late penalty will be applied, unless there
are extraordinary circumstances and/or prior arrangements:
- Penalized 10%
of the earned score if
submitted by 5:00 pm Monday (3/10/08). (Note extended period of
only 10% penalty.)
- Penalized 25% of the earned score if
submitted by 1:30 pm Wednesday(3/12/08).
- Penalized 50% if submitted later than
1:30 pm Wednesday (3/12/08).
Problems
Problem 1:
Part A. Implement the HITS algorithm as an iterative
algorithm
as discussed in class. Use the iterative version
appearing in slide 14 (page 7 of the
posted pdf) of the class presentation. This should
be a straightforward modification of your PageRank program for
Assignment 2. Use the same convergence criterion that
you used in Assignment 2, Problem 2, Part b and print the hub and
authority values after each iteration. Run your HITS
implementation for the same 3 graphs that you used in Assignment 2,
Problem 2.
Part B. Compare your HITS
results to your PageRank results of
Assignment 2. Is the convergence criterion met more or less
quickly? How do the hub and authority values of nodes
compare to their PageRank, and is this more or less in agreement with
your intuition? (Remember that the authority value is the primary
value used for returning ranked results of a search.)
What to hand in: Hand in
your program for Part a and files or printouts containing the output of
this program for each of the graphs. Also hand in your answers to
the questions in Part b.
Problem 2:
This problem is our class experiment with evaluating search
engines.
We will compare the two most popular search engines: Google, and
Yahoo. (See comScore's
January 2008
U.S. Search Engine Rankings.)
This is only meant to be an exercise, so I do not expect we can do a
thorough enough job to call the study valid. But it will have the
components of a full evaluation and hopefully we will get something
interesting.
Part A: Choose an information
need. The information need should require gathering
information
about a subject from several Web sites
with good information. An examples of an activity that would
provide an appropriate information need is doing a report for a
course. (I suggested in class that you record your recent
information searches in anticipation of this exercise.) You
should choose an information need that that you
think is neither too
easy nor too difficult for a search engine. For example,
one expects looking for information on the flu to yeild essentially
100% relevant
pages - too easy; conversely, looking for information on the
history of the LaPaugh family in Europe might (at best) yield one relevant
result in 20 - too hard.
Write a
description of your information need that can be used to judge
whether any given Web search result is relevant or not. (See the
examples of TREC description and narrative sections on slides 14 and 15
of the presentation
on
evaluation of retrieval systems (pdf).) Once you
have your information need described, write one query that
you will use on both search engines to capture the information
need. The query should have the
following properties:
- The query should be constructed to avoid anticipated
ambiguity. Use several search terms if necessary.
- Do not use advanced search operations for either search engine.
Exception: you may use quotes around phrases, e.g. "computer
architecture".
- Make sure you use the same query for both search engines.
Before
proceeding to Part B, submit your description of information need and
your query to Professor LaPaugh by email for approval. This is primarily to make sure
no two people have the same information need or query.
Part B:
Run your query on on each
of Google and Yahoo.
Consider only the regular search
results, not sponsored links. Also, ignore the clustering by site
in Google; count each result returned. If you are
having trouble with several results in languages other than English,
you can go to the advanced search and choose
English only, but then do this for both of the search engines. (In my
trials,
I did not get foreign-language results with a regular search, so this
may not
be an issue.) Record the first 30 results
returned.
Pooling: To get a pool
for hand assessment, take the the first 20 results from
each search engine. Remove duplicates, and visit each result to
decide relevance. Score
each result as relevant or irrelevant.
Record the size of the pool (number of unique results produced by the
combined results 1 - 20 of each search engine). Also record the
number of relevant results in the pool.
Scoring: After
constructing the pool, go back and score each of the first 30
results
returned by each search engine based on your scoring of the pool.
If a result does not appear in the
pool,
it receives a score of irrelevant. If a document appears
twice
under
different URLs in the list for one search engine, count it only for
it's better ranking for that search engine and delete any additional
appearances within the same list. In this case there will be less than
30 distinct documents returned
by the search engine. Do not go back to the search engine to get
more
documents. Keep only what was returned in the
first 30, with their original ranks.
For each search engine, calculate the following measures
- Precision at rank 5
- Precision at rank 10
- Precision at rank 20
- The average over the precisions at each point that a relevant
docment appears in the first 20 (yes 20, not 30). See slide 11 of
the presentation
on
evaluation of retrieval systems (pdf). (I will be computing
the mean average precision over all your query results.)
- If all
the relevant docments in the pool appear in the first
30 returned results, record the rank at which the last relevant result
appears. This is the point of 100% recall of the relevant
documents in the pool. If some relevant documents are missing in
the first 30 returned results, note this and record the recall of
the relevant documents in the pool within these 30 results.
The first 4 measures are ways of capturing the quality of the first 20
results, which is about as far as most people look. The fifth
measure gives credit to one search engine for finding relevant
documents returned earlier by the other search engine.
What to hand in for Part B: Email to me and Joe the
query used, the pool size and number of relevant result in the pool and
for each search engine, the 5 scores described above. We will be
averaging these results across the class, so please report each number
on a separate line, clearly labeled as to what it is.
Part C:
What observations do you make about usability issues (user
friendliness) of each search engine - separate from the quality of
results you have been assessing in Part B?