@brusilovsky

Clustering the tagged web

, , , and . Proceedings of the Second ACM International Conference on Web Search and Data Mining, page 54--63. New York, NY, USA, ACM, (2009)
DOI: 10.1145/1498759.1498809

Abstract

Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from large-scale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for improving automatic clustering of web pages. This paper explores the use of tags in 1) K-means clustering in an extended vector space model that includes tags as well as page text and 2) a novel generative clustering algorithm based on latent Dirichlet allocation that jointly models text and tags. We evaluate the models by comparing their output t…(more)

Links and resources

Tags

community

  • @brusilovsky
  • @ans
  • @aho
@brusilovsky's tags highlighted