For the past three decades, most research in information retrieval has assumed a ranked retrieval model, in which a query returns a ranking of corpus documents by their estimated relevance to this query. This model maps to the familiar user interface of most commercial and academic search engines.
Despite its popularity, the ranked retrieval model suffers because it does not provide a clear split between relevant and irrelevant documents. This weakness makes it impossible to obtain even basic analysis of the query results, such as the number of relevant documents, let alone a more complicated one, such as the result quality.
In contrast, a set retrieval model partitions the corpus into two subsets of documents: those that are considered relevant, and those that are not. A set retrieval model does not rank the retrieved documents; instead, it establishes a clear split between documents that are in and out of the retrieved set. As a result, set retrieval models enable rich analysis of query results, which can then be applied to improve user experience.
Armed with a set retrieval framework, we revisit query clarity, an information gain measure introduced by Cronen-Townsend and Croft in 2002 to predict the ambiguity of a query against an information retrieval system. While Cronen-Townsend and Croft offered evidence in support of query clarity as a measure, subsequent research by Turpin and Hersh in 2004 showed a lack of correlation between clarity scores and user performance.
We claim that query clarity is an effective measure, but that it needs to be revised to leverage a set retrieval model. We present a normalized clarity score that measures the clarity of a query result set relative to other document sets of its size. We discuss theoretical results about the distribution of normalized clarity, preliminary evidence in favor of normalized clarity as a tool to measure query quality, and applications of normalized clarity to interactive information retrieval.
Cronen-Townsend, S. and W. B. Croft, W. B. Quantifying query ambiguity. In Proc. of Human Language Technology 2002, pages 94--98, March 2002.
Turpin, A. and Hersh, W. Do clarity scores for queries correlate with user performance?. In Proceedings of the 15th Australasian Database Conference - Volume 27 (Dunedin, New Zealand). K. Schewe and H. Williams, Eds. ACM International Conference Proceeding Series, vol. 52. Australian Computer Society, Darlinghurst, Australia, 85-91, 2004.