Research Grants 21/06870-3 - Aprendizado computacional, Inteligência artificial

Abstract

The area of Meta-learning (MtL) leverages onknowledge from problems for which successful Machine Learning (ML) solutions are known to support automated algorithm selection for new problems. But far more meta-knowledge can be extracted by relating data properties to algorithmic performance, a topic which remains under-explored compared to the usage of MtL for automated algorithm selection. For instance, one may reveal the competences and limitations of different ML algorithms and highlight data quality issues that are worth investigating. Building on the previous experience of the researcher during her Young Research Project phase 1 which involved the study, proposal and usage of data complexity measures for characterizing the hardness level of classification and regression problems, this project will go one step further and employ such measures for supporting algorithm and data understanding in a MtL perspective. By deepening such understanding, we expect to contribute on improving the comprehensibility and reliability in the usage of ML models. We also expect to generate contributions in three areas which can directly benefit from data and algorithm understanding: data pre-processing, ensemble learning and transfer learning. The idea is to guide the solution of the previous tasks using meta-knowledge extracted about the dichotomous relationship between data properties and algorithmic performance. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:

More items Less items

TITULO

Articles published in other media outlets ( ):

More items Less items

VEICULO: TITULO (DATA)

Scientific publications (16)

(The scientific publications listed on this page originate from the Web of Science or SciELO databases. Their authors have cited FAPESP grant or fellowship project numbers awarded to Principal Investigators or Fellowship Recipients, whether or not they are among the authors. This information is collected automatically and retrieved directly from those bibliometric databases.)

VALERIANO, MARIA GABRIELA; MARZAGAO, DAVID KOHAN; MONTELONGO, ALFREDO; KIFFER, CARLOS ROBERTO VEIGA; KATZ, NATAN; LORENA, ANA CAROLINA. Filtering Instances and Rejecting Predictions to Obtain Reliable Models in Healthcare. MACHINE LEARNING, v. 115, n. 1, p. 40-pg., 2026-01-06. (21/06870-3)

TORQUETTE, GUSTAVO; BASGALUPP, MARCIO P.; LUDERMIR, TERESA B.; LORENA, ANA CAROLINA. Network-based Instance Hardness Measures for Classification Problems. 40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, v. N/A, p. 8-pg., 2025-01-01. (22/07458-1, 21/06870-3, 20/09835-1)

MANGUSSI, ARTHUR DANTAS; SANTOS, MIRIAM SEOANE; LOPES, FILIPE LOYOLA; PEREIRA, RICARDO CARDOSO; LORENA, ANA CAROLINA; ABREU, PEDRO HENRIQUES. mdatagen: A python library for the artificial generation of missing data. Neurocomputing, v. 625, p. 10-pg., 2025-01-29. (23/13688-2, 21/06870-3, 22/10553-6)

PAIVA, PEDRO YURI ARBS; MORENO, CAMILA CASTRO; SMITH-MILES, KATE; VALERIANO, MARIA GABRIELA; LORENA, ANA CAROLINA. Relating instance hardness to classification performance in a dataset: a visual approach. MACHINE LEARNING, v. N/A, p. 39-pg., 2022-06-22. (21/06870-3)

DOS SANTOS FERNANDES, LUIZ HENRIQUE; SMITH-MILES, KATE; LORENA, ANA CAROLINA; XAVIER-JUNIOR, JC; RIOS, RA. Generating Diverse Clustering Datasets with Targeted Characteristics. INTELLIGENT SYSTEMS, PT I, v. 13653, p. 15-pg., 2022-01-01. (21/06870-3)

PEREIRA, JOAO LUIZ JUNHO; SMITH-MILES, KATE; MUNOZ, MARIO ANDRES; LORENA, ANA CAROLINA. Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation. DATA MINING AND KNOWLEDGE DISCOVERY, v. 38, n. 2, p. 40-pg., 2023-10-20. (21/06870-3, 22/10683-7)

LORENA, ANA C.; PAIVA, PEDRO Y. A.; PRUDENCIO, RICARDO B. C.. Trusting My Predictions: On the Value of Instance-Level Analysis. ACM COMPUTING SURVEYS, v. 56, n. 7, p. 28-pg., 2024-07-01. (21/06870-3)

MANGUSSI, ARTHUR DANTAS; PEREIRA, RICARDO CARDOSO; LORENA, ANA CAROLINA; SANTOS, MIRIAM SEOANE; ABREU, PEDRO HENRIQUES. Studying the robustness of data imputation methodologies against adversarial attacks. COMPUTERS & SECURITY, v. 157, p. 16-pg., 2025-10-01. (21/06870-3, 22/10553-6, 23/13688-2)

UEDA, PATRICIA S. M.; RIVOLLI, ADRIANO; LORENA, ANA CAROLINA. An Instance Level Analysis of Classification Difficulty for Unlabeled Data. INTELLIGENT SYSTEMS, BRACIS 2024, PT I, v. 15412, p. 15-pg., 2025-01-01. (21/06870-3)

MANGUSSI, ARTHUR DANTAS; PEREIRA, RICARDO CARDOSO; ABREU, PEDRO HENRIQUES; LORENA, ANA CAROLINA. Assessing Adversarial Effects of Noise in Missing Data Imputation. INTELLIGENT SYSTEMS, BRACIS 2024, PT I, v. 15412, p. 15-pg., 2025-01-01. (21/06870-3, 23/13688-2)

LOYOLA LOPES, FILIPE; DANTAS MANGUSSI, ARTHUR; CARDOSO PEREIRA, RICARDO; SEOANE SANTOS, MIRIAM; HENRIQUES ABREU, PEDRO; CAROLINA LORENA, ANA. A Label Propagation Approach for Missing Data Imputation. IEEE ACCESS, v. 13, p. 14-pg., 2025-01-01. (21/06870-3, 22/10553-6, 23/13688-2)

VALERIANO, MARIA GABRIELA; JUNHO PEREIRA, JOAO LUIZ; VEIGA KIFFER, CARLOS ROBERTO; LORENA, ANA CAROLINA. Explaining instances in the health domain based on the exploration of a dataset's hardness embedding. PROCEEDINGS OF THE 2024 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2024 COMPANION, v. N/A, p. 9-pg., 2024-01-01. (21/06870-3, 22/10683-7, 23/10419-0)

CASTRO DOS SANTOS, NIDIA; MANGUSSI, ARTHUR; RIBEIRO, TIAGO; SILVA, RAFAEL NASCIMENTO DE BRITO; SANTAMARIA, MAURO PEDRINE; FERES, MAGDA; VAN DYKE, THOMAS; LORENA, ANA CAROLINA. Factors influencing the response to periodontal therapy in patients with diabetes: post hoc analysis of a randomized clinical trial using machine learning. Journal of Applied Oral Science, v. 33, p. 11-pg., 2025-01-01. (22/10553-6, 21/14439-0, 16/02234-7, 21/06870-3)

INOCENCIO JUNIOR, RONALDO LOPES; BASGALUPP, MARCIO P.; LUDERMIR, TERESA B.; LORENA, ANA CAROLINA. Data Balancing for Mitigating Sampling Bias in Machine Learning. 40TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, v. N/A, p. 9-pg., 2025-01-01. (22/07458-1, 21/06870-3, 20/09835-1)

VALERIANO, MARIA GABRIELA; MATRAN-FERNANDEZ, ANA; KIFFER, CARLOS; LORENA, ANA CAROLINA. Understanding the performance of machine learning models from data- to patient-level. ACM JOURNAL OF DATA AND INFORMATION QUALITY, v. 16, n. 4, p. 19-pg., 2024-12-01. (21/06870-3)

BASGALUPP, MARCIO P.; BARROS, RODRIGO C.; CERRI, RICARDO; NERI, FERRANTE; MIRANDA, PERICLES B. C.; LUDERMIR, TERESA. Grammar-based Evolutionary Approaches for Software Effort Estimation. 2025 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, CEC, v. N/A, p. 4-pg., 2025-01-01. (22/07458-1, 21/06870-3, 20/09835-1)

Grant number:	21/06870-3
Support Opportunities:	Research Grants - Young Investigators Grants - Phase 2
Start date:	February 01, 2022
End date:	January 31, 2027
Field of knowledge:	Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques

Principal Investigator:	Ana Carolina Lorena
Grantee:	Ana Carolina Lorena

Host Institution:	Divisão de Ciência da Computação (IEC). Instituto Tecnológico de Aeronáutica (ITA). São José dos Campos , SP, Brazil

City of the host institution:	São José dos Campos

Associated research grant:	12/22608-8 - Use of data complexity measures in the support of supervised machine learning, AP.JP
Associated scholarship(s):	25/21948-0 - Meta-Learning in time series forecasting: a data complexity perspective, BP.PD 24/23791-8 - Data-Centric Approaches to Leveraging Large Language Models for Missing Data Imputation, BP.DR 25/10215-1 - Evaluating different representations for extracting standard meta-features from unstructured datasets, BP.IC + associated scholarships 25/10304-4 - Studying meta-features at an instance-level, BP.IC 24/09091-3 - Transfer learning: where to transfer from and what to transfer?, BP.PD 24/07655-7 - Analyzing meta-datasets at an instance-level, BP.IC 24/07637-9 - Analyzing meta-data from public repositories, BP.IC 23/04911-0 - Gathering meta-data from competitions, BP.IC 23/03958-2 - Gathering meta-data from public repositories, BP.IC 22/10553-6 - An unified approach for dealing with missing and noise data, BP.MS 22/10917-8 - On building ensembles of diverse and competent classifiers, BP.DD 22/10683-7 - Is my benchmark of datasets challenging enough?, BP.PD - associated scholarships

Short URL