copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

V. Aggarwal, M. Cotescu, N. Prateek, J. Lorenzo-Trueba, and R. Barra-Chicote. (2019)cite arxiv:1911.12760Comment: Accepted to ICASSP 2020.

Abstract

We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second. Specifically, we enhance the disentanglement capabilities of a state-of-the-art sequence-to-sequence based system with a Variational AutoEncoder (VAE) and a Householder Flow. The proposed system provides a 22% KL-divergence reduction while jointly improving perceptual metrics over state-of-the-art. At synthesis time we use one example of expressive style as a reference input to the encoder for generating any text in the desired style. Perceptual MUSHRA evaluations show that we can create a voice with a 9% relative naturalness improvement over standard Neural Text-to-Speech, while also improving the perceived emotional intensity (59 compared to the 55 of neutral speech).

Description

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

Links and resources

BibTeX key: aggarwal2019using
entry type: misc
year: 2019
url: http://arxiv.org/abs/1911.12760
note: cite arxiv:1911.12760Comment: Accepted to ICASSP 2020

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

Comments and Reviews
(0)