@jennymac

Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning

, , and . page 203--212. (1999)

Abstract

We present a new composite similarity metric that combines information from multiple linguistic indicators to measure semantic distance between pairs of small textual units. Several potential features are investigated and an optimal combination is selected via machine learning. We discuss a more restrictive definition of similarity than traditional, document-level and information retrieval-oriented, notions of similarity, and motivate it by showing its relevance to the multi-document text summarization problem. Results from our system are evaluated against standard information retrieval techniques, establishing that the new method is more effective in identifying closely related textual units.

Links and resources

Tags

community

  • @dblp
  • @jennymac
@jennymac's tags highlighted