Improved query difficulty prediction for the web

by Claudia Hauff, Vanessa Murdock, Ricardo Baeza-Yates

Query performance prediction aims to predict whether a query will have a high average precision given retrieval from a particular collection, or low average precision. An accurate estimator of the quality of search engine results can allow the search engine to decide to which queries to apply query expansion, for which queries to suggest alternative search terms, to adjust the sponsored results, or to return results from specialized collections. In this paper we present an evaluation of state of the art query prediction algorithms, both post-retrieval and pre-retrieval and we analyze their sensitivity towards the retrieval algorithm. We evaluate query difficulty predictors over three widely different collections and query sets and present an analysis of why prediction algorithms perform significantly worse onWeb data. Finally we introduce Improved Clarity, and demonstrate that it outperforms state-of-the-art predictors on three standard collections, including two large Web collections.

The paper will be presented at the ACM Conference on Information and Knowledge Management CIKM 2008 in Napa Valley, USA

[download pdf]