@lee_peck

Narrative text classification for automatic key phrase extraction in web document corpora

, , und . WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data management, Seite 51--58. New York, NY, USA, ACM, (2005)
DOI: http://doi.acm.org/10.1145/1097047.1097059

Zusammenfassung

Automatic key phrase extraction is a useful tool in many text related applications such as clustering and summarization. State-of-the-art methods are aimed towards extracting key phrases from traditional text such as technical papers. Application of these methods on Web documents, which often contain diverse and heterogeneous contents, is of particular interest and challenge in the information age. In this work, we investigate the significance of narrative text classification in the task of automatic key phrase extraction in Web document corpora. We benchmark three methods, TFIDF, KEA, and Keyterm, used to extract key phrases from all the plain text and from only the narrative text of Web pages. ANOVA tests are used to analyze the ranking data collected in a user study using quantitative measures of acceptable percentage and quality value. The evaluation shows that key phrases extracted from the narrative text only are significantly better than those obtained from all plain text of Web pages. This demonstrates that narrative text classification is indispensable for effective key phrase extraction in Web document corpora.

Beschreibung

Narrative text classification for automatic key phrase extraction in web document corpora

Links und Ressourcen

Tags

Community

  • @dblp
  • @lee_peck
@lee_pecks Tags hervorgehoben