@aho

Improving web site search using web server logs

, , and . CASCON '06: Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research, New York, NY, USA, ACM, (2006)
DOI: 10.1145/1188966.1188996

Abstract

Despite the success of global search engines, web site search engines are still suffering from poor performance. Since a web site is different from the whole web in link structure, access pattern, and data scale, it is not always successful when the methods which improve the performance of web search are applied to web site search. In this paper, we propose a novel algorithm to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and the relationships of web pages in the session are analyzed based on their similarities. Then, a new web page representation is generated. Anchor text is used to create another representation. They are combined with original text-based representation in web site search. Two kinds of combination methods are investigated and tested: combination of document representations and combination of ranking scores. Our experimental results show that our algorithm can improve the retrieval accuracy for the four retrieval models we tested: Inference Network Model, Okapi Model, Cosine Similarity Model and TFIDF Model. The highest performance increase from web log analysis is from TFIDF model, and overall, inference network model with web log information achieves the best result.

Links and resources

Tags

community

  • @brusilovsky
  • @aho
  • @dblp
@aho's tags highlighted