@seandalai

Corpus-Driven Splitting of Compound Words

. Proceedings of the Ninth International Conference on Theoretical and Methodological Issues in Machine Translation, (2002)

Abstract

This paper presents a method for splitting compound words into their constituents based on cognate words in the other language of a parallel corpus. A minor extension to the method allows the decompounding of words which do not have cognates in the other language. By decompounding the training corpus for an Example-Based MT system, the incidence of word alignment failure can be substantially reduced, yielding a modest improvement in performance.

Links and resources

Tags