Advanced search
Start date
Betweenand


Data Complexity Measures for Imbalanced Classification Tasks

Full text
Author(s):
Barella, Victor H. ; Garcia, Luis P. F. ; de Souto, Marcilio P. ; Lorena, Ana C. ; de Carvalho, Andre ; IEEE
Total Authors: 6
Document type: Journal article
Source: 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN); v. N/A, p. 8-pg., 2018-01-01.
Abstract

In imbalanced classification tasks, the training datasets may show class overlapping and classes of low density. In these scenarios, the predictions for the minority class are impaired. Although assessing the imbalance level of a training set is straightforward, it is hard to measure other aspects that may affect the predictive performance of classification algorithms in imbalanced tasks. This paper presents a set of measures designed to understand the difficulty of imbalanced classification tasks by regarding on each class individually. They are adapted from popular data complexity measures for classification problems, which are shown to perform poorly in imbalanced scenarios. Experiments on synthetic datasets with different levels of imbalance, class overlapping and density of the classes show that the proposed adaptations can better explain the difficulty of imbalanced classification tasks. (AU)

FAPESP's process: 16/18615-0 - Advanced machine learning
Grantee:André Carlos Ponce de Leon Ferreira de Carvalho
Support Opportunities: Research Grants - Research Partnership for Technological Innovation - PITE
FAPESP's process: 13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry
Grantee:Francisco Louzada Neto
Support Opportunities: Research Grants - Research, Innovation and Dissemination Centers - RIDC
FAPESP's process: 15/01382-0 - The influence of pre-processing data techniques on classification algorithms
Grantee:Victor Hugo Barella
Support Opportunities: Scholarships in Brazil - Doctorate