YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone

Casanova, Edresson; Weber, Julian; Shulby, Christopher; Candido Junior, Arnaldo; Goelge, Eren; Ponti, Moacir Antonelli; Chaudhuri, K; Jegelka, S; Song, L; Szepesvari, C; Niu, G; Sabato, S

Texto completo
Autor(es): Mostrar menos -	Casanova, Edresson ; Weber, Julian ; Shulby, Christopher ; Candido Junior, Arnaldo ; Goelge, Eren ; Ponti, Moacir Antonelli ; Chaudhuri, K ; Jegelka, S ; Song, L ; Szepesvari, C ; Niu, G ; Sabato, S Número total de Autores: 12
Tipo de documento:	Artigo Científico
Fonte:	INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162; v. N/A, p. 12-pg., 2022-01-01.
Resumo
YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multispeaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multispeaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multispeaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-theart results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training. (AU)

Processo FAPESP:	19/07316-0 - Teoria de singularidades e aplicações a geometria diferencial, equações diferenciais e visão computacional
Beneficiário:	Farid Tari
Modalidade de apoio:	Auxílio à Pesquisa - Temático

URL curto