Advanced search
Start date
Betweenand


Handling imbalanced datasets through Optimum-Path Forest

Full text
Author(s):
Passos, Leandro Aparecido S. ; Jodas, Danilo S. ; Ribeiro, Luiz C. F. ; Akio, Marco ; De Souza, Andre Nunes ; Papa, Joao Paulo
Total Authors: 6
Document type: Journal article
Source: KNOWLEDGE-BASED SYSTEMS; v. 242, p. 13-pg., 2022-04-22.
Abstract

In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the (OPF)-P-2 and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches.& nbsp; (C)& nbsp;2022 Elsevier B.V. All rights reserved. (AU)

FAPESP's process: 18/21934-5 - Network statistics: theory, methods, and applications
Grantee:André Fujita
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 14/12236-1 - AnImaLS: Annotation of Images in Large Scale: what can machines and specialists learn from interaction?
Grantee:Alexandre Xavier Falcão
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 20/12101-0 - Support for computational environments and experiments execution: data acquisition, categorization and maintenance
Grantee:Leandro Aparecido Passos Junior
Support Opportunities: Scholarships in Brazil - Technical Training Program - Technical Training
FAPESP's process: 19/18287-0 - Real-time Urban Forest Management Using Machine Learning
Grantee:Danilo Samuel Jodas
Support Opportunities: Scholarships in Brazil - Post-Doctoral
FAPESP's process: 19/07665-4 - Center for Artificial Intelligence
Grantee:Fabio Gagliardi Cozman
Support Opportunities: Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program
FAPESP's process: 17/02286-0 - Probabilistic models for commercial losses detection
Grantee:André Nunes de Souza
Support Opportunities: Regular Research Grants
FAPESP's process: 13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry
Grantee:Francisco Louzada Neto
Support Opportunities: Research Grants - Research, Innovation and Dissemination Centers - RIDC