Advanced search
Start date
Betweenand

Use of data complexity measures in the support of supervised machine learning

Abstract

Machine Learning (ML) techniques have been successfully employed in the solution of various data classification problems. Recently some studies are devoted to understand how quantitative measures quantifying the complexity of data sets used for classification, such as the geometric overlap between classes, affects the performance of ML techniques. Among the contributions of this approach is a better understanding of the domains of competence and limitations of these techniques. This project will initially study different measures to characterize the complexity of classification problems. Although there is a variety of measures in the literature, studies about their areas of expertise are not frequent, namely in what types of application and analysis they may be more appropriate. We then intend to use these measurements in the support of the reduction in the complexity involved in solving a given classification problem. A first attempt in this direction is to pre-process data, so as to reduce the complexity of new datasets generated. Two pre-processing tasks will be investigated: noise identification and feature subset selection. In another front, the reduction in the complexity in solving a classification problem will be addressed by employing a divide-and-conquer strategy. In this case, the goal is to find sub problems of lower complexity, whose solutions can be combined to solve the original classification problem. (AU)

Scientific publications (12)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
BARELLA, VICTOR H.; GARCIA, LUIS P. F.; DE SOUTO, MARCILIO C. P.; LORENA, ANA C.; DE CARVALHO, ANDRE C. P. L. F. Assessing the data complexity of imbalanced datasets. INFORMATION SCIENCES, v. 553, p. 83-109, APR 2021. Web of Science Citations: 0.
RIVOLLI, ADRIANO; READ, JESSE; SOARES, CARLOS; PFAHRINGER, BERNHARD; DE CARVALHO, ANDRE C. P. L. F. An empirical analysis of binary transformation strategies and base algorithms for multi-label learning. MACHINE LEARNING, v. 109, n. 8 JUN 2020. Web of Science Citations: 0.
PIMENTEL, BRUNO ALMEIDA; DE CARVALHO, ANDRE C. P. L. F. A Meta-learning approach for recommending the number of clusters for clustering algorithms. KNOWLEDGE-BASED SYSTEMS, v. 195, MAY 11 2020. Web of Science Citations: 0.
GARCIA, LUIS P. F.; LEHMANN, JENS; DE CARVALHO, ANDRE C. P. L. F.; LORENA, ANA C. New label noise injection methods for the evaluation of noise filters. KNOWLEDGE-BASED SYSTEMS, v. 163, p. 693-704, JAN 1 2019. Web of Science Citations: 0.
QUITERIO, THAISE M.; LORENA, ANA C. Using complexity measures to determine the structure of directed acyclic graphs in multiclass classification. APPLIED SOFT COMPUTING, v. 65, p. 428-442, APR 2018. Web of Science Citations: 1.
PISANI, PAULO HENRIQUE; LORENA, ANA CAROLINA; DE CARVALHO, ANDRE C. P. L. F. Adaptive Biometric Systems Using Ensembles. IEEE INTELLIGENT SYSTEMS, v. 33, n. 2, p. 19-28, MAR-APR 2018. Web of Science Citations: 1.
LORENA, ANA C.; MACIEL, ARON I.; DE MIRANDA, PERICLES B. C.; COSTA, IVAN G.; PRUDENCIO, RICARDO B. C. Data complexity meta-features for regression problems. MACHINE LEARNING, v. 107, n. 1, SI, p. 209-246, JAN 2018. Web of Science Citations: 5.
TRAMBAIOLLI, L. R.; SPOLAOR, N.; LORENA, A. C.; ANGHINAH, R.; SATO, J. R. Feature selection before EEG classification supports the diagnosis of Alzheimer's disease. CLINICAL NEUROPHYSIOLOGY, v. 128, n. 10, p. 2058-2067, OCT 2017. Web of Science Citations: 10.
PISANI, PAULO HENRIQUE; POH, NORMAN; DE CARVALHO, ANDRE C. P. L. F.; LORENA, ANA CAROLINA. Score normalization applied to adaptive biometric systems. COMPUTERS & SECURITY, v. 70, p. 565-580, SEP 2017. Web of Science Citations: 2.
MORALES, PABLO; LUENGO, JULIAN; GARCIA, LUIS P. F.; LORENA, ANA C.; DE CARVALHO, ANDRE C. P. L. F.; HERRERA, FRANCISCO. The NoiseFiltersR Package: Label Noise Preprocessing in R. R JOURNAL, v. 9, n. 1, p. 219-228, JUN 2017. Web of Science Citations: 3.
PISANI, PAULO HENRIQUE; LORENA, ANA CAROLINA; DE CARVALHO, ANDRE C. P. L. F. Adaptive algorithms applied to accelerometer biometrics in a data stream context. Intelligent Data Analysis, v. 21, n. 2, p. 353-370, 2017. Web of Science Citations: 3.
SPOLAOR, NEWTON; LORENA, ANA CAROLINA; LEE, HUEI DIANA. Feature Selection via Pareto Multi-objective Genetic Algorithms. APPLIED ARTIFICIAL INTELLIGENCE, v. 31, n. 9-10, p. 764-791, 2017. Web of Science Citations: 0.

Please report errors in scientific publications list by writing to: cdi@fapesp.br.