German Encyclopedia Alignment Based on Information Retrieval Techniques
R. Kern, and M. Granitzer. Research and Advanced Technology for Digital Libraries, volume 6273 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 10.1007/978-3-642-15464-5\_32.(2010)
Abstract
Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance.
%0 Book Section
%1 kern_german_2010
%A Kern, Roman
%A Granitzer, Michael
%B Research and Advanced Technology for Digital Libraries
%D 2010
%E Lalmas, Mounia
%E Jose, Joemon
%E Rauber, Andreas
%E Sebastiani, Fabrizio
%E Frommholz, Ingo
%I Springer Berlin / Heidelberg
%K ECDL2011 EntityMatching
%P 315--326
%T German Encyclopedia Alignment Based on Information Retrieval Techniques
%U http://dx.doi.org/10.1007/978-3-642-15464-5_32
%V 6273
%X Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance.
@incollection{kern_german_2010,
abstract = {Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance.},
added-at = {2010-10-14T22:17:01.000+0200},
author = {Kern, Roman and Granitzer, Michael},
biburl = {https://www.bibsonomy.org/bibtex/2f0bba408d5243e1af03fe340a0701e5a/datentaste},
booktitle = {Research and Advanced Technology for Digital Libraries},
editor = {Lalmas, Mounia and Jose, Joemon and Rauber, Andreas and Sebastiani, Fabrizio and Frommholz, Ingo},
interhash = {debd4420f16eb595c036fd3fb6e64b9c},
intrahash = {f0bba408d5243e1af03fe340a0701e5a},
keywords = {ECDL2011 EntityMatching},
note = {10.1007/978-3-642-15464-5\_32},
pages = {315--326},
publisher = {Springer Berlin / Heidelberg},
series = {Lecture Notes in Computer Science},
timestamp = {2010-10-14T22:17:01.000+0200},
title = {German Encyclopedia Alignment Based on Information Retrieval Techniques},
url = {http://dx.doi.org/10.1007/978-3-642-15464-5_32},
volume = 6273,
year = 2010
}