| Grant number: | 25/10215-1 |
| Support Opportunities: | Scholarships in Brazil - Scientific Initiation |
| Start date: | July 01, 2025 |
| Status: | Discontinued |
| Field of knowledge: | Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques |
| Principal Investigator: | Ana Carolina Lorena |
| Grantee: | Douglas Bergamim Fernandes |
| Host Institution: | Divisão de Ciência da Computação (IEC). Instituto Tecnológico de Aeronáutica (ITA). São José dos Campos , SP, Brazil |
| Associated research grant: | 21/06870-3 - Beyond algorithm selection: meta-learning for data and algorithm analysis and understanding, AP.JP2 |
| Associated scholarship(s): | 25/19111-4 - Evaluating different data representations for extracting standard meta-features from unstructured datasets, BE.EP.IC |
Abstract The growing use of Machine Learning (ML) techniques in areas such as computer vision and natural language processing has intensified the demand for methods capable of handling unstructured data, such as images and text. These types of data often exhibit high dimensionality and carry large amounts of information, making the task of identifying the most suitable ML algorithms for each scenario both complex and costly. In this context, Meta-learning (MtL) emerges as a promising approach to support the selection process by investigating which intrinsic characteristics of datasets are related to algorithm performance. However, most of the meta-features available in the literature were developed for structured, tabular data, which limits their applicability in more modern settings. To overcome this limitation, previous studies have shown that data such as images and text can be represented through embeddings - numerical vectors obtained from pre-trained deep neural networks - making them compatible with meta-feature extraction tools. Each neural network architecture generates a distinct representation, capturing different aspects of the original data. This project proposes to investigate how useful different embedded representations are for extracting standard meta-features from unstructured datasets. The PyMFE (Python Meta-Feature Extractor) library already provides a Python implementation for extracting meta-features from datasets, but its application is restricted to attribute-value formatted data. Public datasets such as CIFAR-10 and CIFAR-100 will be used, and the experiments will aim to assess the impact of the embedding choice on the quality of the extracted meta-features. The goal is to contribute to expanding the applicability of Meta-learning in response to the current demands of Machine Learning. (AU) | |
| News published in Agência FAPESP Newsletter about the scholarship: | |
| More itemsLess items | |
| TITULO | |
| Articles published in other media outlets ( ): | |
| More itemsLess items | |
| VEICULO: TITULO (DATA) | |
| VEICULO: TITULO (DATA) | |