Article,

A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

ascale Fung, and athleen McKeown.
Machine Translation, 12 (1): 53--87 (1997)

Abstract

Technical-term translation represents one of the most difficult tasks for human translators since (1) most translators are not familiar with terms and domain-specific terminology and (2) such terms are not adequately covered by printed dictionaries. This paper describes an algorithm for translating technical words and terms from noisy parallel corpora across language groups. Given any word which is part of a technical term in the source language, the algorithm produces a ranked candidate match for it in the target language. Potential translations for the term are compiled from the matched words and are also ranked. We show how this ranked list helps translators in technical-term translation. Most algorithms for lexical and term translation focus on Indo-European language pairs, and most use a sentence-aligned clean parallel corpus without insertion, deletion or OCR noise. Our algorithm is language- and character-set-independent, and is robust to noise in the corpus. We show how our algorithm requires minimum preprocessing and is able to obtain technical-word translations without sentence-boundary identification or sentence alignment, from the English+óGé¼GC£Japanese awk manual corpus with noise arising from text insertions or deletions and on the English+óGé¼GC£Chinese HKUST bilingual corpus. We obtain a precision of 55.35\% from the awk corpus for word translation including rare words, counting only the best candidate and direct translations. Translation precision of the best-candidate translation is 89.93\% from the HKUST corpus. Potential term translations produced by the program help bilingual speakers to get a 47\% improvement in translating technical terms.

BibTeX key: Fung1997
entry type: article
year: 1997
journal: Machine Translation
number: 1
pages: 53--87
volume: 12
annote: Language: eng
url: http://dx.doi.org/10.1023/A:1007974605290

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 Fung1997 %A ascale Fung, %A athleen McKeown, %D 1997 %J Machine Translation %K Corpus autom{\'{a}}tica textuales,Terminolog{\'{\i}}a,Traduccion %N 1 %P 53--87 %T A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups %U http://dx.doi.org/10.1023/A:1007974605290 %V 12 %X Technical-term translation represents one of the most difficult tasks for human translators since (1) most translators are not familiar with terms and domain-specific terminology and (2) such terms are not adequately covered by printed dictionaries. This paper describes an algorithm for translating technical words and terms from noisy parallel corpora across language groups. Given any word which is part of a technical term in the source language, the algorithm produces a ranked candidate match for it in the target language. Potential translations for the term are compiled from the matched words and are also ranked. We show how this ranked list helps translators in technical-term translation. Most algorithms for lexical and term translation focus on Indo-European language pairs, and most use a sentence-aligned clean parallel corpus without insertion, deletion or OCR noise. Our algorithm is language- and character-set-independent, and is robust to noise in the corpus. We show how our algorithm requires minimum preprocessing and is able to obtain technical-word translations without sentence-boundary identification or sentence alignment, from the English+óGé¼GC£Japanese awk manual corpus with noise arising from text insertions or deletions and on the English+óGé¼GC£Chinese HKUST bilingual corpus. We obtain a precision of 55.35\% from the awk corpus for word translation including rare words, counting only the best candidate and direct translations. Translation precision of the best-candidate translation is 89.93\% from the HKUST corpus. Potential term translations produced by the program help bilingual speakers to get a 47\% improvement in translating technical terms. %Z Language: eng

BibSonomy

A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on