@jamesh

Scaling distributional similarity to large corpora

, and . ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, page 361--368. Morristown, NJ, USA, Association for Computational Linguistics, (2006)
DOI: http://dx.doi.org/10.3115/1220175.1220221

Abstract

Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naïve nearest-neighbour approach to comparing context vectors extracted from large corpora scales poorly (O(n2) in the vocabulary size).In this paper, we compare several existing approaches to approximating the nearest-neighbour search for distributional similarity. We investigate the trade-off between efficiency and accuracy, and find that SASH (Houle and Sakuma, 2005) provides the best balance.

Description

Scaling distributional similarity to large corpora

Links and resources

Tags

community

  • @dblp
  • @jamesh
@jamesh's tags highlighted