@beate

Evaluating new search engine configurations with pre-existing judgments and clicks

, , and . Proceedings of the 20th international conference on World wide web, page 397--406. New York, NY, USA, ACM, (2011)
DOI: 10.1145/1963405.1963463

Abstract

We provide a novel method of evaluating search results, which allows us to combine existing editorial judgments with the relevance estimates generated by click-based user browsing models. There are evaluation methods in the literature that use clicks and editorial judgments together, but our approach is novel in the sense that it allows us to predict the impact of <i>unseen</i> search models <i>without</i> online tests to collect clicks and without requesting new editorial data, since we are only re-using <i>existing</i> editorial data, and clicks observed for previous result set configurations. Since the user browsing model and the pre-existing editorial data cannot provide relevance estimates for all documents for the selected set of queries, one important challenge is to obtain this performance estimation where there are a lot of ranked documents with missing relevance values. We introduce a query and rank based smoothing to overcome this problem. We show that a hybrid of these smoothing techniques performs better than both query and position based smoothing, and despite the high percentage of missing judgments, the resulting method is significantly correlated (0.74) with <i>DCG</i> values evaluated using fully judged datasets, and approaches inter-annotator agreement. We show that previously published techniques, applicable to frequent queries, degrade when applied to a random sample of queries, with a correlation of only 0.29. While our experiments focus on evaluation using <i>DCG</i>, our method is also applicable to other commonly used metrics.

Description

Evaluating new search engine configurations with pre-existing judgments and clicks

Links and resources

Tags

community

  • @beate
  • @dblp
@beate's tags highlighted