Using Complexity Measures to Evolve Synthetic Classification Datasets

de Melo, Vinicius V.; Lorena, Ana C.; IEEE

Full text
Author(s):	de Melo, Vinicius V. ; Lorena, Ana C. ; IEEE Total Authors: 3
Document type:	Journal article
Source:	2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN); v. N/A, p. 8-pg., 2018-01-01.
Abstract
Machine Learning studies usually involve a large volume of experimental work. For instance, any new technique or solution to a classification problem has to be evaluated concerning the predictive performance achieved in many datasets. In order to evaluate the robustness of the algorithm face to different class distributions, it would be interesting to choose a set of datasets that spans different levels of classification difficulty. In this paper, we present a method to generate synthetic classification datasets with varying complexity levels. The idea is to greedly exchange the labeling of a set of synthetically generated points in order to reach a given level of classification complexity, which is assessed by measures that estimate the difficulty of a classification problem based on the geometrical distribution of the data. (AU)

FAPESP's process:	17/20844-0 - A metaheuristic with self-construction of operators for global continuous optimization: extensions and applications of the drone squadron optimization algorithm
Grantee:	Vinícius Veloso de Melo
Support Opportunities:	Regular Research Grants


FAPESP's process:	12/22608-8 - Use of data complexity measures in the support of supervised machine learning
Grantee:	Ana Carolina Lorena
Support Opportunities:	Research Grants - Young Investigators Grants

Short URL