Advanced search
Start date
Betweenand

Brazil in focus: a multimodal large language model aware of the brazilian context for text generation

Grant number: 24/07969-1
Support Opportunities:Scholarships in Brazil - Doctorate (Direct)
Start date: October 01, 2024
End date: December 31, 2026
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Sandra Eliza Fontes de Avila
Grantee:Gabriel Oliveira dos Santos
Host Institution: Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil
Associated scholarship(s):25/00837-5 - Multilingual Vision Language Model with In-Context Learning Ability, BE.EP.DD

Abstract

The Natural Language Processing (NLP) field has undergone significant transformations, primarily marked by Large Language Models (LLMs). However, an inherent limitation of these models is the incapacity of processing data modalities beyond text. To tackle this, in recent years, different Multimodal Large Language Models (MLLMs) have been proposed to extend the LLMs to other modalities further. Despite the advances, existing literature predominantly focuses on high-resource languages and neglects cultural aspects, perpetuating biases towards dominant worldviews. In light of this, this research proposes constructing an MLLM tailored to the Portuguese language and the Brazilian context. Specifically, we aim to develop a framework for building an MLLM capable of generating descriptions in Portuguese for images, allowing its knowledge about the Brazilian context to be continuously updated using the integration of a Retrieval Augmented Generation (RAG) pipeline into the MLLM. Furthermore, considering we are working under a data restriction scenario, we intend to leverage pre-trained LLMs specialized in Portuguese and propose a block that connects the visual encoder to the LLM so that our MLLM can perform tasks in the in-context learning fashion. Existing proposals in the literature are computationally expensive; in contrast, we aim to train our model at a low cost. Additionally, we aim to conduct a case study of our framework applied to identify manifestations of Brazilian culture. We hypothesize that conditioning caption generation based on Brazil-centered data will enhance our model's capacity to recognize elements from Brazilian culture. In this sense, we seek to contribute towards advancing the development of NLP beyond English-centric paradigms and empowering Brazilians with linguistically accurate and contextually adapted and relevant systems. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)