Advanced search
Start date
Betweenand

Beyond algorithm selection: meta-learning for data and algorithm analysis and understanding

Grant number: 21/06870-3
Support Opportunities:Research Grants - Young Investigators Grants - Phase 2
Start date: February 01, 2022
End date: January 31, 2027
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Ana Carolina Lorena
Grantee:Ana Carolina Lorena
Host Institution: Divisão de Ciência da Computação (IEC). Instituto Tecnológico de Aeronáutica (ITA). Ministério da Defesa (Brasil). São José dos Campos , SP, Brazil
Associated research grant:12/22608-8 - Use of data complexity measures in the support of supervised machine learning, AP.JP
Associated scholarship(s):24/23791-8 - Data-Centric Approaches to Leveraging Large Language Models for Missing Data Imputation, BP.DR
25/10215-1 - Evaluating different representations for extracting standard meta-features from unstructured datasets, BP.IC
25/10304-4 - Studying meta-features at an instance-level, BP.IC
+ associated scholarships 24/09091-3 - Transfer learning: where to transfer from and what to transfer?, BP.PD
24/07637-9 - Analyzing meta-data from public repositories, BP.IC
24/07655-7 - Analyzing meta-datasets at an instance-level, BP.IC
23/04911-0 - Gathering meta-data from competitions, BP.IC
23/03958-2 - Gathering meta-data from public repositories, BP.IC
22/10553-6 - An unified approach for dealing with missing and noise data, BP.MS
22/10917-8 - On building ensembles of diverse and competent classifiers, BP.DD
22/10683-7 - Is my benchmark of datasets challenging enough?, BP.PD - associated scholarships

Abstract

The area of Meta-learning (MtL) leverages onknowledge from problems for which successful Machine Learning (ML) solutions are known to support automated algorithm selection for new problems. But far more meta-knowledge can be extracted by relating data properties to algorithmic performance, a topic which remains under-explored compared to the usage of MtL for automated algorithm selection. For instance, one may reveal the competences and limitations of different ML algorithms and highlight data quality issues that are worth investigating. Building on the previous experience of the researcher during her Young Research Project phase 1 which involved the study, proposal and usage of data complexity measures for characterizing the hardness level of classification and regression problems, this project will go one step further and employ such measures for supporting algorithm and data understanding in a MtL perspective. By deepening such understanding, we expect to contribute on improving the comprehensibility and reliability in the usage of ML models. We also expect to generate contributions in three areas which can directly benefit from data and algorithm understanding: data pre-processing, ensemble learning and transfer learning. The idea is to guide the solution of the previous tasks using meta-knowledge extracted about the dichotomous relationship between data properties and algorithmic performance. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (10)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
TORQUETTE, GUSTAVO; BASGALUPP, MARCIO P.; LUDERMIR, TERESA B.; LORENA, ANA CAROLINA. Network-based Instance Hardness Measures for Classification Problems. 40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, v. N/A, p. 8-pg., . (22/07458-1, 21/06870-3, 20/09835-1)
MANGUSSI, ARTHUR DANTAS; SANTOS, MIRIAM SEOANE; LOPES, FILIPE LOYOLA; PEREIRA, RICARDO CARDOSO; LORENA, ANA CAROLINA; ABREU, PEDRO HENRIQUES. mdatagen: A python library for the artificial generation of missing data. Neurocomputing, v. 625, p. 10-pg., . (23/13688-2, 21/06870-3, 22/10553-6)
DOS SANTOS FERNANDES, LUIZ HENRIQUE; SMITH-MILES, KATE; LORENA, ANA CAROLINA; XAVIER-JUNIOR, JC; RIOS, RA. Generating Diverse Clustering Datasets with Targeted Characteristics. INTELLIGENT SYSTEMS, PT I, v. 13653, p. 15-pg., . (21/06870-3)
PEREIRA, JOAO LUIZ JUNHO; SMITH-MILES, KATE; MUNOZ, MARIO ANDRES; LORENA, ANA CAROLINA. Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation. DATA MINING AND KNOWLEDGE DISCOVERY, v. 38, n. 2, p. 40-pg., . (21/06870-3, 22/10683-7)
UEDA, PATRICIA S. M.; RIVOLLI, ADRIANO; LORENA, ANA CAROLINA. An Instance Level Analysis of Classification Difficulty for Unlabeled Data. INTELLIGENT SYSTEMS, BRACIS 2024, PT I, v. 15412, p. 15-pg., . (21/06870-3)
MANGUSSI, ARTHUR DANTAS; PEREIRA, RICARDO CARDOSO; ABREU, PEDRO HENRIQUES; LORENA, ANA CAROLINA. Assessing Adversarial Effects of Noise in Missing Data Imputation. INTELLIGENT SYSTEMS, BRACIS 2024, PT I, v. 15412, p. 15-pg., . (21/06870-3, 23/13688-2)
LOYOLA LOPES, FILIPE; DANTAS MANGUSSI, ARTHUR; CARDOSO PEREIRA, RICARDO; SEOANE SANTOS, MIRIAM; HENRIQUES ABREU, PEDRO; CAROLINA LORENA, ANA. A Label Propagation Approach for Missing Data Imputation. IEEE ACCESS, v. 13, p. 14-pg., . (21/06870-3, 22/10553-6, 23/13688-2)
LORENA, ANA C.; PAIVA, PEDRO Y. A.; PRUDENCIO, RICARDO B. C.. Trusting My Predictions: On the Value of Instance-Level Analysis. ACM COMPUTING SURVEYS, v. 56, n. 7, p. 28-pg., . (21/06870-3)
VALERIANO, MARIA GABRIELA; MATRAN-FERNANDEZ, ANA; KIFFER, CARLOS; LORENA, ANA CAROLINA. Understanding the performance of machine learning models from data- to patient-level. ACM JOURNAL OF DATA AND INFORMATION QUALITY, v. 16, n. 4, p. 19-pg., . (21/06870-3)
PAIVA, PEDRO YURI ARBS; MORENO, CAMILA CASTRO; SMITH-MILES, KATE; VALERIANO, MARIA GABRIELA; LORENA, ANA CAROLINA. Relating instance hardness to classification performance in a dataset: a visual approach. MACHINE LEARNING, v. N/A, p. 39-pg., . (21/06870-3)