| Grant number: | 24/13098-3 |
| Support Opportunities: | Scholarships in Brazil - Master |
| Start date: | March 01, 2025 |
| Status: | Discontinued |
| Field of knowledge: | Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques |
| Principal Investigator: | Paula Dornhofer Paro Costa |
| Grantee: | Pedro Rodrigues Corrêa |
| Host Institution: | Faculdade de Engenharia Elétrica e de Computação (FEEC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil |
| Company: | Universidade Estadual de Campinas (UNICAMP). Faculdade de Engenharia Elétrica e de Computação (FEEC) |
| Associated research grant: | 20/09838-0 - BI0S - Brazilian Institute of Data Science, AP.PCPE |
| Associated scholarship(s): | 25/09875-7 - Expressive Multimodal TTS for Robots, BE.EP.MS |
Abstract Expressive facial animations are seen by viewers as more natural (little rejection and strangeness to the content) when they are in tune with the content and the way it is communicated by the interlocutor. In this context, several existing methods lose flexibility, as they are dependent on pre-determined emotional labels or facial expression models, which limits a reliable representation of the emotions expressed on the face. Some speech-guided animation models use natural language processing to control style (expressivity), but are limited to simple textual prompts that are not necessarily in line with the interlocutor's status, which varies throughout a speech. This project aims to develop a method that uses speech and text to generate expressive facial animations based on the description of this dynamic status, emphasizing the movements of facial elements (mouth, nose, eyebrow), as well as the emotive content of the speech. From the speech audio, a natural language model based on the Transformers architecture will make this textual description. This text will serve as a dynamic expression guide for generating facial animations generated by a speech-driven model. Furthermore, a dataset will be built based on automatic annotation methods using LLMs (Large Language Models), which will associate facial expressions with various textual descriptions. This dataset will be used to train a model based on CLIP (Contrastive Language-Image Pretraining) that can encode, in the same semantic space, animation and text. This process aims to ensure that the expressiveness of the interlocutor's face is in accordance with his status, that is, the way he transmits the spoken content, at all times. | |
| News published in Agência FAPESP Newsletter about the scholarship: | |
| More itemsLess items | |
| TITULO | |
| Articles published in other media outlets ( ): | |
| More itemsLess items | |
| VEICULO: TITULO (DATA) | |
| VEICULO: TITULO (DATA) | |