Advanced search
Start date
Betweenand


YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone

Full text
Author(s):
Show less -
Casanova, Edresson ; Weber, Julian ; Shulby, Christopher ; Candido Junior, Arnaldo ; Goelge, Eren ; Ponti, Moacir Antonelli ; Chaudhuri, K ; Jegelka, S ; Song, L ; Szepesvari, C ; Niu, G ; Sabato, S
Total Authors: 12
Document type: Journal article
Source: INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162; v. N/A, p. 12-pg., 2022-01-01.
Abstract

YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multispeaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multispeaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multispeaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-theart results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training. (AU)

FAPESP's process: 19/07316-0 - Singularity theory and its applications to differential geometry, differential equations and computer vision
Grantee:Farid Tari
Support Opportunities: Research Projects - Thematic Grants