EAGER: extending automatically gazetteers for entity recognition
O. Gunes, C. Schallhart, T. Furche, J. Lehmann, and A. Ngonga. Proceedings of the 3rd Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP, page 29--33. Stroudsburg, PA, USA, Association for Computational Linguistics, (2012)
Abstract
Key to named entity recognition, the manual gazetteering of entity lists is a costly, errorprone process that often yields results that are incomplete and suffer from sampling bias. Exploiting current sources of structured information, we propose a novel method for extending minimal seed lists into complete gazetteers. Like previous approaches, we value Wikipedia as a huge, well-curated, and relatively unbiased source of entities. However, in contrast to previous work, we exploit not only its content, but also its structure, as exposed in DBPedia. We extend gazetteers through Wikipedia categories, carefully limiting the impact of noisy categorizations. The resulting gazetteers easily outperform previous approaches on named entity recognition.
%0 Conference Paper
%1 Gunes:2012:EEA:2392793.2392798
%A Gunes, Omer
%A Schallhart, Christian
%A Furche, Tim
%A Lehmann, Jens
%A Ngonga, Axel
%B Proceedings of the 3rd Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP
%C Stroudsburg, PA, USA
%D 2012
%I Association for Computational Linguistics
%K eager listgrowing setcompletion
%P 29--33
%T EAGER: extending automatically gazetteers for entity recognition
%U http://dl.acm.org/citation.cfm?id=2392793.2392798
%X Key to named entity recognition, the manual gazetteering of entity lists is a costly, errorprone process that often yields results that are incomplete and suffer from sampling bias. Exploiting current sources of structured information, we propose a novel method for extending minimal seed lists into complete gazetteers. Like previous approaches, we value Wikipedia as a huge, well-curated, and relatively unbiased source of entities. However, in contrast to previous work, we exploit not only its content, but also its structure, as exposed in DBPedia. We extend gazetteers through Wikipedia categories, carefully limiting the impact of noisy categorizations. The resulting gazetteers easily outperform previous approaches on named entity recognition.
@inproceedings{Gunes:2012:EEA:2392793.2392798,
abstract = {Key to named entity recognition, the manual gazetteering of entity lists is a costly, errorprone process that often yields results that are incomplete and suffer from sampling bias. Exploiting current sources of structured information, we propose a novel method for extending minimal seed lists into complete gazetteers. Like previous approaches, we value Wikipedia as a huge, well-curated, and relatively unbiased source of entities. However, in contrast to previous work, we exploit not only its content, but also its structure, as exposed in DBPedia. We extend gazetteers through Wikipedia categories, carefully limiting the impact of noisy categorizations. The resulting gazetteers easily outperform previous approaches on named entity recognition.},
acmid = {2392798},
added-at = {2013-10-24T09:54:12.000+0200},
address = {Stroudsburg, PA, USA},
author = {Gunes, Omer and Schallhart, Christian and Furche, Tim and Lehmann, Jens and Ngonga, Axel},
biburl = {https://www.bibsonomy.org/bibtex/20c8d55d44b6a2a2ae84beb54efe7248c/asmelash},
booktitle = {Proceedings of the 3rd Workshop on the People's Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP},
description = {EAGER},
interhash = {79acb0211864d68b53d54a7db069f066},
intrahash = {0c8d55d44b6a2a2ae84beb54efe7248c},
keywords = {eager listgrowing setcompletion},
location = {Jeju, Republic of Korea},
numpages = {5},
pages = {29--33},
publisher = {Association for Computational Linguistics},
timestamp = {2013-10-24T09:54:12.000+0200},
title = {EAGER: extending automatically gazetteers for entity recognition},
url = {http://dl.acm.org/citation.cfm?id=2392793.2392798},
year = 2012
}