Article,

UWSpeech: Speech to Speech Translation for Unwritten Languages

C. Zhang, X. Tan, Y. Ren, T. Qin, K. Zhang, and T. Liu.
35 (16): 14319-14327 (2021, 2020)

Abstract

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training. However, those methods cannot be applied to unwritten target languages, which have no written text or phoneme available. In this paper, we develop a translation system for unwritten languages, named as UWSpeech, which converts target unwritten speech into discrete tokens with a converter, and then translates source-language speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter. We propose a method called XL-VAE, which enhances vector quantized variational autoencoder (VQ-VAE) with cross-lingual (XL) speech recognition, to train the converter and inverter of UWSpeech jointly. Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.

BibTeX key: zhang20212020uwspeech
entry type: article
year: 2021, 2020
number: 16
pages: 14319-14327
volume: 35
type: Text.Serial.Journal
source: Proceedings of the AAAI Conference on Artificial Intelligence
uri: https://ojs.aaai.org/index.php/AAAI
issn: 2374-3468
id: 17684
url: https://ojs.aaai.org/index.php/AAAI/article/view/17684

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{zhang20212020uwspeech, abstract = {Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training. However, those methods cannot be applied to unwritten target languages, which have no written text or phoneme available. In this paper, we develop a translation system for unwritten languages, named as UWSpeech, which converts target unwritten speech into discrete tokens with a converter, and then translates source-language speech into target discrete tokens with a translator, and finally synthesizes target speech from target discrete tokens with an inverter. We propose a method called XL-VAE, which enhances vector quantized variational autoencoder (VQ-VAE) with cross-lingual (XL) speech recognition, to train the converter and inverter of UWSpeech jointly. Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.}, added-at = {2022-01-02T15:35:54.000+0100}, author = {Zhang, Chen and Tan, Xu and Ren, Yi and Qin, Tao and Zhang, Kejun and Liu, Tie-Yan}, biburl = {https://www.bibsonomy.org/bibtex/2fafaa886675a4d23b41a8f91aec8f99e/johnaoga}, id = {17684}, interhash = {244f41709324c7aea87835e28b12377a}, intrahash = {fafaa886675a4d23b41a8f91aec8f99e}, issn = {2374-3468}, keywords = {speech translation}, number = 16, pages = {14319-14327}, source = {Proceedings of the AAAI Conference on Artificial Intelligence}, timestamp = {2022-01-02T15:35:54.000+0100}, title = {UWSpeech: Speech to Speech Translation for Unwritten Languages}, type = {Text.Serial.Journal}, uri = {https://ojs.aaai.org/index.php/AAAI}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/17684}, volume = 35, year = {2021, 2020} }

BibSonomy

UWSpeech: Speech to Speech Translation for Unwritten Languages

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on