Comparison of generality based algorithm variants for automatic taxonomy generation
A. Henschel, W. Woon, T. Wächter, and S. Madnick. IIT'09: Proceedings of the 6th international conference on Innovations in information technology, page 206--210. Piscataway, NJ, USA, IEEE Press, (2009)
Abstract
We compare a family of algorithms for the automatic generation of taxonomies by adapting the Heymannalgorithm in various ways. The core algorithm determines the generality of terms and iteratively inserts them in a growing taxonomy. Variants of the algorithm are created by altering the way and the frequency, generality of terms is calculated. We analyse the performance and the complexity of the variants combined with a systematic threshold evaluation on a set of seven manually created benchmark sets. As a result, betweenness centrality calculated on unweighted similarity graphs often performs best but requires threshold fine-tuning and is computationally more expensive than closeness centrality. Finally, we show how an entropy-based filter can lead to more precise taxonomies.
Description
Comparison of generality based algorithm variants for automatic taxonomy generation
%0 Conference Paper
%1 1802317
%A Henschel, Andreas
%A Woon, Wei Lee
%A Wächter, Thomas
%A Madnick, Stuart
%B IIT'09: Proceedings of the 6th international conference on Innovations in information technology
%C Piscataway, NJ, USA
%D 2009
%I IEEE Press
%K generality genta11
%P 206--210
%T Comparison of generality based algorithm variants for automatic taxonomy generation
%U http://portal.acm.org/citation.cfm?id=1802317
%X We compare a family of algorithms for the automatic generation of taxonomies by adapting the Heymannalgorithm in various ways. The core algorithm determines the generality of terms and iteratively inserts them in a growing taxonomy. Variants of the algorithm are created by altering the way and the frequency, generality of terms is calculated. We analyse the performance and the complexity of the variants combined with a systematic threshold evaluation on a set of seven manually created benchmark sets. As a result, betweenness centrality calculated on unweighted similarity graphs often performs best but requires threshold fine-tuning and is computationally more expensive than closeness centrality. Finally, we show how an entropy-based filter can lead to more precise taxonomies.
%@ 978-1-4244-5698-7
@inproceedings{1802317,
abstract = {We compare a family of algorithms for the automatic generation of taxonomies by adapting the Heymannalgorithm in various ways. The core algorithm determines the generality of terms and iteratively inserts them in a growing taxonomy. Variants of the algorithm are created by altering the way and the frequency, generality of terms is calculated. We analyse the performance and the complexity of the variants combined with a systematic threshold evaluation on a set of seven manually created benchmark sets. As a result, betweenness centrality calculated on unweighted similarity graphs often performs best but requires threshold fine-tuning and is computationally more expensive than closeness centrality. Finally, we show how an entropy-based filter can lead to more precise taxonomies.},
added-at = {2010-10-11T09:35:52.000+0200},
address = {Piscataway, NJ, USA},
author = {Henschel, Andreas and Woon, Wei Lee and W\"{a}chter, Thomas and Madnick, Stuart},
biburl = {https://www.bibsonomy.org/bibtex/24951cbb5b0e6cd41a6c7ce318497f0c8/chriskoerner},
booktitle = {IIT'09: Proceedings of the 6th international conference on Innovations in information technology},
description = {Comparison of generality based algorithm variants for automatic taxonomy generation},
interhash = {7dc534500eb274e0844bc216634ffb6a},
intrahash = {4951cbb5b0e6cd41a6c7ce318497f0c8},
isbn = {978-1-4244-5698-7},
keywords = {generality genta11},
location = {AI-Ain, United Arab Emirates},
pages = {206--210},
publisher = {IEEE Press},
timestamp = {2010-10-11T09:47:56.000+0200},
title = {Comparison of generality based algorithm variants for automatic taxonomy generation},
url = {http://portal.acm.org/citation.cfm?id=1802317},
year = 2009
}