Advanced search
Start date
Betweenand

Evaluating different data representations for extracting standard meta-features from unstructured datasets

Grant number: 25/19111-4
Support Opportunities:Scholarships abroad - Research Internship - Scientific Initiation
Start date: December 01, 2025
End date: February 28, 2026
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Ana Carolina Lorena
Grantee:Douglas Bergamim Fernandes
Supervisor: Telmo de Menezes e Silva Filho
Host Institution: Divisão de Ciência da Computação (IEC). Instituto Tecnológico de Aeronáutica (ITA). São José dos Campos , SP, Brazil
Institution abroad: University of Bristol, England  
Associated to the scholarship:25/10215-1 - Evaluating different representations for extracting standard meta-features from unstructured datasets, BP.IC

Abstract

The use of Machine Learning with unstructured data, especially text and images, has become central in modern applications, where the standard practice is to adapt pre-trained models via fine-tuning for specific tasks. Deciding which architecture to use and how to adapt it typically demands extensive, costly sweeps over models and hyperparameters. Meta-learning offers a way to anticipate these decisions by relating dataset descriptors (meta-features) to algorithm performance; however, classical meta-features were designed for tabular data and do not directly apply to unstructured data, and while libraries such as PyMFE exist for tabular settings, equivalent support is lacking elsewhere. This work investigates strategies for "structuring" text and image datasets so that standard meta-features can be meaningfully computed, comparing representations induced by embeddings from pre-trained neural networks that convert unstructured collections into attribute-value matrices compatible with existing extractors. Within the BEPE scope, the project also proposes to consolidate a research partnership between the Brazilian research team and the University of Bristol to co-develop this methodology and its evaluation protocol, with the explicit goal of turning meta-characterization of unstructured datasets into a reliable, reproducible practice--consolidating shared benchmarks and open-source pipelines that guide model selection and fine-tuning with lower experimental costs. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)