Abstract
We adapt the architectures of previous audio manipulation and generation
neural networks to the task of real-time any-to-one voice conversion. Our
resulting model, LLVC ($L$ow-latency $L$ow-resource
$V$oice $C$onversion), has a latency of under 20ms at a
bitrate of 16kHz and runs nearly 2.8x faster than real-time on a consumer CPU.
LLVC uses both a generative adversarial architecture as well as knowledge
distillation in order to attain this performance. To our knowledge LLVC
achieves both the lowest resource usage as well as the lowest latency of any
open-source voice conversion model. We provide open-source samples, code, and
pretrained model weights at https://github.com/KoeAI/LLVC.
Users
Please
log in to take part in the discussion (add own reviews or comments).