@aho

Using navigation data to improve IR functions in the context of web search

, and . CIKM '01: Proceedings of the tenth international conference on Information and knowledge management, page 135--142. New York, NY, USA, ACM, (2001)
DOI: 10.1145/502585.502609

Abstract

As part of the process of delivering content, devices like proxies and gateways log valuable information about the activities and navigation patterns of users on the Web. In this study, we consider how this navigation data can be used to improve Web search. A query posted to a search engine together with the set of pages accessed during a search task is known as a search session. We develop a mixture model for the observed set of search sessions, and propose variants of the classical EM algorithm for training. The model itself yields a type of navigation-based query clustering. By implicitly borrowing strength between related queries, the mixture formulation allows us to identify the "highly relevant" URLs for each query cluster. Next, we explore methods for incorporating existing labeled data (the Yahoo! directory, for example) to speed convergence and help resolve low-traffic clusters. Finally, the mixture formulation also provides for a simple, hierarchical display of search results based on the query clusters. The effectiveness of our approach is evaluated using proxy access logs for the outgoing Lucent proxy.

Links and resources

Tags

community

  • @brusilovsky
  • @aho
  • @dblp
@aho's tags highlighted