Abstract
Machine Learning (ML) techniques have been successfully employed in the solution of various data classification problems. Recently some studies are devoted to understand how quantitative measures quantifying the complexity of data sets used for classification, such as the geometric overlap between classes, affects the performance of ML techniques. Among the contributions of this approach is a better understanding of the domains of competence and limitations of these techniques. This project will initially study different measures to characterize the complexity of classification problems. Although there is a variety of measures in the literature, studies about their areas of expertise are not frequent, namely in what types of application and analysis they may be more appropriate. We then intend to use these measurements in the support of the reduction in the complexity involved in solving a given classification problem. A first attempt in this direction is to pre-process data, so as to reduce the complexity of new datasets generated. Two pre-processing tasks will be investigated: noise identification and feature subset selection. In another front, the reduction in the complexity in solving a classification problem will be addressed by employing a divide-and-conquer strategy. In this case, the goal is to find sub problems of lower complexity, whose solutions can be combined to solve the original classification problem. (AU)
Scientific publications
(12)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
LORENA, ANA C.;
MACIEL, ARON I.;
DE MIRANDA, PERICLES B. C.;
COSTA, IVAN G.;
PRUDENCIO, RICARDO B. C.
Data complexity meta-features for regression problems.
MACHINE LEARNING,
v. 107,
n. 1, SI,
p. 209-246,
JAN 2018.
Web of Science Citations: 5.
MORALES, PABLO;
LUENGO, JULIAN;
GARCIA, LUIS P. F.;
LORENA, ANA C.;
DE CARVALHO, ANDRE C. P. L. F.;
HERRERA, FRANCISCO.
The NoiseFiltersR Package: Label Noise Preprocessing in R.
R JOURNAL,
v. 9,
n. 1,
p. 219-228,
JUN 2017.
Web of Science Citations: 3.