Аннотация

Abstract This paper presents an extended, harmonised account of our previous work on integrating controlled language data in an Example-based Machine Translation system. Gough and Way in MT Summit pp. 133+óGé¼GC£140 (2003) focused on controlling the output text in a novel manner, while Gough and Way (9th Workshop of the EAMT, (2004a), pp. 73+óGé¼GC£81) sought to constrain the input strings according to controlled language specifications. Our original sub-sentential alignment algorithm could deal only with 1:1 matches, but subsequent refinements enabled n:m alignments to be captured. A direct consequence was that we were able to populate the system+óGé¼Gäós databases with more than six times as many potentially useful fragments. Together with two simple novel improvements +óGé¼GC£ correcting a small number of mistranslations in the lexicon, and allowing multiple translations in the lexicon +óGé¼GC£ translation quality improves considerably. We provide detailed automatic and human evaluations of a number of experiments carried out to test the quality of the system. We observe that our system outperforms the rule-based on-line system Logomedia on a range of automatic evaluation metrics, and that the +óGé¼-£best+óGé¼Gäó translation candidate is consistently highly ranked by our system. Finally, we note in a number of tests that the BLEU metric gives objectively different results than other automatic evaluation metrics and a manual evaluation. Despite these conflicting results, we observe a preference for controlling the source data rather than the target translations.

Линки и ресурсы

тэги

сообщество

  • @sofiagruiz92
  • @dblp
@sofiagruiz92- тэги данного пользователя выделены