Advanced search
Start date
Betweenand

Evaluating different representations for extracting standard meta-features from unstructured datasets

Grant number: 25/10215-1
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: July 01, 2025
End date: June 30, 2026
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Ana Carolina Lorena
Grantee:Douglas Bergamim Fernandes
Host Institution: Divisão de Ciência da Computação (IEC). Instituto Tecnológico de Aeronáutica (ITA). Ministério da Defesa (Brasil). São José dos Campos , SP, Brazil
Associated research grant:21/06870-3 - Beyond algorithm selection: meta-learning for data and algorithm analysis and understanding, AP.JP2

Abstract

The growing use of Machine Learning (ML) techniques in areas such as computer vision and natural language processing has intensified the demand for methods capable of handling unstructured data, such as images and text. These types of data often exhibit high dimensionality and carry large amounts of information, making the task of identifying the most suitable ML algorithms for each scenario both complex and costly. In this context, Meta-learning (MtL) emerges as a promising approach to support the selection process by investigating which intrinsic characteristics of datasets are related to algorithm performance. However, most of the meta-features available in the literature were developed for structured, tabular data, which limits their applicability in more modern settings. To overcome this limitation, previous studies have shown that data such as images and text can be represented through embeddings - numerical vectors obtained from pre-trained deep neural networks - making them compatible with meta-feature extraction tools. Each neural network architecture generates a distinct representation, capturing different aspects of the original data. This project proposes to investigate how useful different embedded representations are for extracting standard meta-features from unstructured datasets. The PyMFE (Python Meta-Feature Extractor) library already provides a Python implementation for extracting meta-features from datasets, but its application is restricted to attribute-value formatted data. Public datasets such as CIFAR-10 and CIFAR-100 will be used, and the experiments will aim to assess the impact of the embedding choice on the quality of the extracted meta-features. The goal is to contribute to expanding the applicability of Meta-learning in response to the current demands of Machine Learning. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)