Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text
Translation
A. Berard, O. Pietquin, C. Servan, and L. Besacier. (2016)cite arxiv:1612.01744Comment: accepted to NIPS workshop on End-to-end Learning for Speech and Audio Processing.
Abstract
This paper proposes a first attempt to build an end-to-end speech-to-text
translation system, which does not use source language transcription during
learning or decoding. We propose a model for direct speech-to-text translation,
which gives promising results on a small French-English synthetic corpus.
Relaxing the need for source language transcription would drastically change
the data collection methodology in speech translation, especially in
under-resourced scenarios. For instance, in the former project DARPA TRANSTAC
(speech translation from spoken Arabic dialects), a large effort was devoted to
the collection of speech transcripts (and a prerequisite to obtain transcripts
was often a detailed transcription guide for languages with little standardized
spelling). Now, if end-to-end approaches for speech-to-text translation are
successful, one might consider collecting data by asking bilingual speakers to
directly utter speech in the source language from target language text
utterances. Such an approach has the advantage to be applicable to any
unwritten (source) language.
Description
Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation
%0 Generic
%1 berard2016listen
%A Berard, Alexandre
%A Pietquin, Olivier
%A Servan, Christophe
%A Besacier, Laurent
%D 2016
%K mt speech
%T Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text
Translation
%U http://arxiv.org/abs/1612.01744
%X This paper proposes a first attempt to build an end-to-end speech-to-text
translation system, which does not use source language transcription during
learning or decoding. We propose a model for direct speech-to-text translation,
which gives promising results on a small French-English synthetic corpus.
Relaxing the need for source language transcription would drastically change
the data collection methodology in speech translation, especially in
under-resourced scenarios. For instance, in the former project DARPA TRANSTAC
(speech translation from spoken Arabic dialects), a large effort was devoted to
the collection of speech transcripts (and a prerequisite to obtain transcripts
was often a detailed transcription guide for languages with little standardized
spelling). Now, if end-to-end approaches for speech-to-text translation are
successful, one might consider collecting data by asking bilingual speakers to
directly utter speech in the source language from target language text
utterances. Such an approach has the advantage to be applicable to any
unwritten (source) language.
@misc{berard2016listen,
abstract = {This paper proposes a first attempt to build an end-to-end speech-to-text
translation system, which does not use source language transcription during
learning or decoding. We propose a model for direct speech-to-text translation,
which gives promising results on a small French-English synthetic corpus.
Relaxing the need for source language transcription would drastically change
the data collection methodology in speech translation, especially in
under-resourced scenarios. For instance, in the former project DARPA TRANSTAC
(speech translation from spoken Arabic dialects), a large effort was devoted to
the collection of speech transcripts (and a prerequisite to obtain transcripts
was often a detailed transcription guide for languages with little standardized
spelling). Now, if end-to-end approaches for speech-to-text translation are
successful, one might consider collecting data by asking bilingual speakers to
directly utter speech in the source language from target language text
utterances. Such an approach has the advantage to be applicable to any
unwritten (source) language.},
added-at = {2020-05-11T06:39:07.000+0200},
author = {Berard, Alexandre and Pietquin, Olivier and Servan, Christophe and Besacier, Laurent},
biburl = {https://www.bibsonomy.org/bibtex/2d4f5537049b45ccc1f691b46c57e5356/ramimanna},
description = {Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation},
interhash = {d8d5b8482580efb068e4482dcb26b136},
intrahash = {d4f5537049b45ccc1f691b46c57e5356},
keywords = {mt speech},
note = {cite arxiv:1612.01744Comment: accepted to NIPS workshop on End-to-end Learning for Speech and Audio Processing},
timestamp = {2020-05-11T06:39:07.000+0200},
title = {Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text
Translation},
url = {http://arxiv.org/abs/1612.01744},
year = 2016
}