Advanced search
Start date
Betweenand

Data-Centric Approaches to Leveraging Large Language Models for Missing Data Imputation

Grant number: 24/23791-8
Support Opportunities:Scholarships in Brazil - Doctorate
Start date: July 01, 2025
End date: February 28, 2029
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Ana Carolina Lorena
Grantee:Arthur Dantas Mangussi
Host Institution: Divisão de Ciência da Computação (IEC). Instituto Tecnológico de Aeronáutica (ITA). Ministério da Defesa (Brasil). São José dos Campos , SP, Brazil
Associated research grant:21/06870-3 - Beyond algorithm selection: meta-learning for data and algorithm analysis and understanding, AP.JP2

Abstract

Real-world data often present challenges such as imbalance, noise, and missing values. Recently, the Machine Learning literature has demonstrated that improving model performance also requires enhancing the quality of the data used for training, giving rise to a new research field known as Data-Centric Artificial Intelligence. Missing values, defined as the absence of information in one or more columns of a dataset, represent a common challenge in this context. The literature proposes various strategies to impute these values, ranging from simpler methods to advanced models based on Deep Learning. With the recent advancements in Generative Artificial Intelligence and Large Language Models (LLMs), new investigations have begun exploring the use of LLMs for missing value imputation. Therefore, this doctoral research proposal aims to investigate methodologies based on LLMs to perform missing data imputation efficiently and robustly from the perspective of Data-Centric AI. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)