Scholarship 24/13098-3 - Aprendizagem profunda, Emoções

Grant number:	24/13098-3
Support Opportunities:	Scholarships in Brazil - Master
Start date:	March 01, 2025
End date:	July 31, 2026
Field of knowledge:	Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques

Principal Investigator:	Paula Dornhofer Paro Costa
Grantee:	Pedro Rodrigues Corrêa

Host Institution:	Faculdade de Engenharia Elétrica e de Computação (FEEC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil
Company:	Universidade Estadual de Campinas (UNICAMP). Faculdade de Engenharia Elétrica e de Computação (FEEC)

Associated research grant:	20/09838-0 - BI0S - Brazilian Institute of Data Science, AP.PCPE

Associated scholarship(s):	25/09875-7 - Expressive Multimodal TTS for Robots, BE.EP.MS

Abstract Expressive facial animations are seen by viewers as more natural (little rejection and strangeness to the content) when they are in tune with the content and the way it is communicated by the interlocutor. In this context, several existing methods lose flexibility, as they are dependent on pre-determined emotional labels or facial expression models, which limits a reliable representation of the emotions expressed on the face. Some speech-guided animation models use natural language processing to control style (expressivity), but are limited to simple textual prompts that are not necessarily in line with the interlocutor's status, which varies throughout a speech. This project aims to develop a method that uses speech and text to generate expressive facial animations based on the description of this dynamic status, emphasizing the movements of facial elements (mouth, nose, eyebrow), as well as the emotive content of the speech. From the speech audio, a natural language model based on the Transformers architecture will make this textual description. This text will serve as a dynamic expression guide for generating facial animations generated by a speech-driven model. Furthermore, a dataset will be built based on automatic annotation methods using LLMs (Large Language Models), which will associate facial expressions with various textual descriptions. This dataset will be used to train a model based on CLIP (Contrastive Language-Image Pretraining) that can encode, in the same semantic space, animation and text. This process aims to ensure that the expressiveness of the interlocutor's face is in accordance with his status, that is, the way he transmits the spoken content, at all times.

News published in Agência FAPESP Newsletter about the scholarship:
More items Less items
TITULO

Articles published in other media outlets ( ):
More items Less items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Short URL