Classifier Recommendation Using Data Complexity Measures

Garcia, Luis P. F.; Lorena, Ana C.; de Souto, Marcilio C. P.; Ho, Tin Kam; IEEE

Full text
Author(s):	Garcia, Luis P. F. ; Lorena, Ana C. ; de Souto, Marcilio C. P. ; Ho, Tin Kam ; IEEE Total Authors: 5
Document type:	Journal article
Source:	2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR); v. N/A, p. 6-pg., 2018-01-01.
Abstract
Application of machine learning to new and unfamiliar domains calls for increasing automation in choosing a learning algorithm suitable for the data arising from each domain. Meta-learning could address this need since it has been largely used in the last years to support the recommendation of the most suitable algorithms for a new dataset. The use of complexity measures could increase the systematic comprehension over the meta-models and also allow to differentiate the performance of a set of techniques taking into account the overlap between classes imposed by feature values, the separability and distribution of the data points. In this paper we compare the effectiveness of several standard regression models in predicting the accuracies of classifiers for classification problems from the OpenML repository. We show that the models can predict the classifiers' accuracies with low mean-squared-error and identify the best classifier for a problem that results in statistically significant improvements over a randomly chosen classifier or a fixed classifier believed to be good on average. (AU)

FAPESP's process:	12/22608-8 - Use of data complexity measures in the support of supervised machine learning
Grantee:	Ana Carolina Lorena
Support Opportunities:	Research Grants - Young Investigators Grants

Short URL