TextTiling: Segmenting Text into
Multi-paragraph Subtopic Passages
M. Hearst. Computational Linguistics, 23 (1):
33--64(March 1997)
Abstract
TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages,
or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical
co-occurrence and distribution. The algorithm is fully implemented and is shown to produce
segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts.
Multi-paragraph subtopic segmentation should be useful for many text analysis tasks, including
information retrieval and summarization.
%0 Journal Article
%1 Hearst97
%A Hearst, Marti A.
%D 1997
%I MIT Press
%J Computational Linguistics
%K NLP TextSegmentation
%N 1
%P 33--64
%T TextTiling: Segmenting Text into
Multi-paragraph Subtopic Passages
%U http://acl.ldc.upenn.edu/J/J97/J97-1003.pdf
%V 23
%X TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages,
or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical
co-occurrence and distribution. The algorithm is fully implemented and is shown to produce
segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts.
Multi-paragraph subtopic segmentation should be useful for many text analysis tasks, including
information retrieval and summarization.
@article{Hearst97,
abstract = {TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages,
or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical
co-occurrence and distribution. The algorithm is fully implemented and is shown to produce
segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts.
Multi-paragraph subtopic segmentation should be useful for many text analysis tasks, including
information retrieval and summarization.},
added-at = {2008-06-17T06:51:21.000+0200},
author = {Hearst, Marti A.},
biburl = {https://www.bibsonomy.org/bibtex/21d157a965286828e36a6fc4f9734e99a/mkroell},
citeulike-article-id = {432471},
comment = {Not using probabilistic topic models (old paper) but one of the first in this area of document segmentation.},
howpublished = {ISSN:0891-2017},
interhash = {9234c40827860930b18efe3b6c79829a},
intrahash = {1d157a965286828e36a6fc4f9734e99a},
journal = {Computational Linguistics},
keywords = {NLP TextSegmentation},
month = {March},
number = 1,
pages = {33--64},
priority = {2},
publisher = {MIT Press},
timestamp = {2009-03-31T10:25:53.000+0200},
title = {TextTiling: Segmenting Text into
Multi-paragraph Subtopic Passages},
url = {http://acl.ldc.upenn.edu/J/J97/J97-1003.pdf},
volume = 23,
year = 1997
}