Language-independent compound splitting with morphological operations
K. Macherey, A. Dai, D. Talbot, A. Popat, and F. Och. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-11), Portland, OR, (2011)
Abstract
Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.
%0 Conference Paper
%1 Macherey:EtAl:11
%A Macherey, Klaus
%A Dai, Andrew
%A Talbot, David
%A Popat, Ashok
%A Och, Franz
%B Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-11)
%C Portland, OR
%D 2011
%K bilingual compound corpus detection machine mt smt splitting translation
%T Language-independent compound splitting with morphological operations
%U http://aclweb.org/anthology-new/P/P11/P11-1140.pdf
%X Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.
@inproceedings{Macherey:EtAl:11,
abstract = {Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.},
added-at = {2011-09-13T15:29:23.000+0200},
address = {Portland, OR},
author = {Macherey, Klaus and Dai, Andrew and Talbot, David and Popat, Ashok and Och, Franz},
biburl = {https://www.bibsonomy.org/bibtex/2c51b28f764c780a028fc4a740e078e02/jil},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-11)},
interhash = {00e36426440879fd8c93a9f29f65cc86},
intrahash = {c51b28f764c780a028fc4a740e078e02},
keywords = {bilingual compound corpus detection machine mt smt splitting translation},
timestamp = {2013-11-23T20:11:51.000+0100},
title = {Language-independent compound splitting with morphological operations},
url = {http://aclweb.org/anthology-new/P/P11/P11-1140.pdf},
year = 2011
}