@mkroell

TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages

. Computational Linguistics, 23 (1): 33--64 (March 1997)

Abstract

TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts. Multi-paragraph subtopic segmentation should be useful for many text analysis tasks, including information retrieval and summarization.

Links and resources

Tags

community

  • @jfmaas
  • @mkroell
  • @nlp
  • @lantiq
  • @ldietz
  • @dblp
  • @davidswelt
  • @pdturney
@mkroell's tags highlighted