Advanced search
Start date
Betweenand

Multilingual Vision Language Model with In-Context Learning Ability

Grant number: 25/00837-5
Support Opportunities:Scholarships abroad - Research Internship - Doctorate (Direct)
Start date: June 30, 2025
End date: June 29, 2026
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Sandra Eliza Fontes de Avila
Grantee:Gabriel Oliveira dos Santos
Supervisor: Matthieu Cord
Host Institution: Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil
Institution abroad: Université Paris-Sorbonne (Paris 4), France  
Associated to the scholarship:24/07969-1 - Brazil in focus: a multimodal large language model aware of the brazilian context for text generation, BP.DD

Abstract

Natural Language Processing (NLP) has undergone significant transformations, primarily marked by Large Language Models (LLMs). However, an inherent limitation of these models is the incapacity of processing data modalities beyond text. To tackle this, in recent years, different Multimodal LLMs have been proposed to extend the LLMs to other modalities. Despite the advances, existing literature predominantly focuses on English and other few high-resource languages, neglecting others. In light of this, this BEPE project proposes developing a low-cost multilingual vision language model (VLM) capable of adapting to tasks involving texts in low-resource languages through in-context learning. Specifically, we propose a multilingual VLM that supports interleaved image-text pairs, which, combined with a retrieval-augmented generation pipeline, can enhance the model performance in vision-language tasks involving texts in low-resource languages, thereby mitigating the lack of annotated datasets issues. Yet, we plan to develop a low-cost pipeline for training VLMs by leveraging pre-trained multilingual LLMs and visual encoders and employing parameter-efficient fine-tuning techniques. This way, we seek to push forward the development of NLP beyond English-centric paradigms and contribute toward a more inclusive and diverse technological landscape.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)