Inproceedings,

Bootstrapping Noun Groups Using Closed-Class Elements Only

K. Eichler, and G. Neumann..
Proceedings of LWA2010 - Workshop-Woche: Lernen, Wissen & Adaptivitaet, Kassel, Germany, (2010)

Full text

Abstract

The identification of noun groups in text is a well researched task and serves as a pre-step for other natural language processing tasks, such as the extractionof keyphrases or technical terms. We present a first version of a noun group chunker that, given an unannotated text corpus, adapts itself to the domain at hand in an unsupervised way. Our approach is inspired by findings from cognitive linguistics, in particular the division of language into open-class elements and closedclass elements. Our system extracts noun groups using lists of closed-class elements and one linguistically inspired seed extraction rule for each open class. Supplied with raw text, the system creates an initial validation set for each open class based on the seed rules and applies a bootstrapping procedure to mutually expand the set of extraction rules and the validation sets. Possibly domain-dependent information about open-class elements, as for example provided by a part-of speech lexicon, is not used by the system in order to ensure the domain-independency of the approach. Instead, the system adapts itself automatically to the domain of the input text by bootstrapping domain-specific validation lists. An evaluation of our system on the Wall Street Journal training corpus used for the CONLL 2000 shared task on chunking shows that our bootstrapping approach can be successfully applied to the task of noun group chunking.

BibTeX key: kdml8
entry type: inproceedings
address: Kassel, Germany
booktitle: Proceedings of LWA2010 - Workshop-Woche: Lernen, Wissen & Adaptivitaet
year: 2010
crossref: lwa2010
presentation_start: 2010-10-05 16:55:00
session: kdml2
track: kdml
presentation_end: 2010-10-05 17:05:00
room: 0446
Document: http://www.kde.cs.uni-kassel.de/conf/lwa10/papers/kdml8.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{kdml8, abstract = {The identification of noun groups in text is a well researched task and serves as a pre-step for other natural language processing tasks, such as the extractionof keyphrases or technical terms. We present a first version of a noun group chunker that, given an unannotated text corpus, adapts itself to the domain at hand in an unsupervised way. Our approach is inspired by findings from cognitive linguistics, in particular the division of language into open-class elements and closedclass elements. Our system extracts noun groups using lists of closed-class elements and one linguistically inspired seed extraction rule for each open class. Supplied with raw text, the system creates an initial validation set for each open class based on the seed rules and applies a bootstrapping procedure to mutually expand the set of extraction rules and the validation sets. Possibly domain-dependent information about open-class elements, as for example provided by a part-of speech lexicon, is not used by the system in order to ensure the domain-independency of the approach. Instead, the system adapts itself automatically to the domain of the input text by bootstrapping domain-specific validation lists. An evaluation of our system on the Wall Street Journal training corpus used for the CONLL 2000 shared task on chunking shows that our bootstrapping approach can be successfully applied to the task of noun group chunking.}, added-at = {2010-10-05T14:15:12.000+0200}, address = {Kassel, Germany}, author = {Eichler, Kathrin and Neumann., Günter}, biburl = {https://www.bibsonomy.org/bibtex/234de399e9d651aafdb499499a424ec70/lwa2010}, booktitle = {Proceedings of LWA2010 - Workshop-Woche: Lernen, Wissen {\&} Adaptivitaet}, crossref = {lwa2010}, editor = {Atzmüller, Martin and Benz, Dominik and Hotho, Andreas and Stumme, Gerd}, interhash = {5d01044ae6183c118b2097992208663c}, intrahash = {34de399e9d651aafdb499499a424ec70}, keywords = {bootstrapping chunking closed-class elements extraction group noun room:0446 session:kdml2 term workshop:kdml}, presentation_end = {2010-10-05 17:05:00}, presentation_start = {2010-10-05 16:55:00}, room = {0446}, session = {kdml2}, timestamp = {2010-10-05T14:15:14.000+0200}, title = {Bootstrapping Noun Groups Using Closed-Class Elements Only}, track = {kdml}, url = {http://www.kde.cs.uni-kassel.de/conf/lwa10/papers/kdml8.pdf}, year = 2010 }

BibSonomy

Bootstrapping Noun Groups Using Closed-Class Elements Only

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on