Boosting meta-learning with simulated data complexity measures

Garcia, Luis P. F.; Rivolli, Adriano; Alcobaca, Edesio; Lorena, Ana C.; de Carvalho, Andre C. P. L. F.

Full text
Author(s):	Garcia, Luis P. F. ^[1] ; Rivolli, Adriano ^[2] ; Alcobaca, Edesio ^[3] ; Lorena, Ana C. ^[4] ; de Carvalho, Andre C. P. L. F. ^[3] Total Authors: 5
Affiliation:	^[1] Univ Brasilia, Dept Comp Sci, BR-70910900 Brasilia, DF - Brazil ^[2] Technol Univ Parana, Comp Dept, Curitiba, Parana - Brazil ^[3] Univ Sao Paulo, Inst Math & Comp Sci, Sao Paulo - Brazil ^[4] Aeronaut Inst Technol, Praca Marechal Eduardo Gomes, Sao Paulo - Brazil Total Affiliations: 4
Document type:	Journal article
Source:	Intelligent Data Analysis; v. 24, n. 5, p. 1011-1028, 2020.
Web of Science Citations:	0
Abstract
Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied to these datasets. The meta-features must describe essential aspects of the dataset and distinguish different problems and solutions. However, if one wants the use of Meta-Learning to be computationally efficient, the extraction of the meta-feature values should also show a low computational cost, considering a trade-off between the time spent to run all the algorithms and the time required to extract the meta-features. One class of measures with successful results in the characterization of classification datasets is concerned with estimating the underlying complexity of the classification problem. These data complexity measures take into account the overlap between classes imposed by the feature values, the separability of the classes and distribution of the instances within the classes. However, the extraction of these measures from datasets usually presents a high computational cost. In this paper, we propose an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability. The proposal consists of a novel Meta-Learning system able to predict the values of the data complexity measures for a dataset by using simpler meta-features as input. In an extensive set of experiments, we show that the predictive performance achieved by Meta-Learning systems which use the predicted data complexity measures is similar to the performance obtained using the original data complexity measures, but the computational cost involved in their computation is significantly reduced. (AU)

FAPESP's process:	12/22608-8 - Use of data complexity measures in the support of supervised machine learning
Grantee:	Ana Carolina Lorena
Support Opportunities:	Research Grants - Young Investigators Grants


FAPESP's process:	13/07375-0 - CeMEAI - Center for Mathematical Sciences Applied to Industry
Grantee:	Francisco Louzada Neto
Support Opportunities:	Research Grants - Research, Innovation and Dissemination Centers - RIDC


FAPESP's process:	18/14819-5 - Automated machine learning: learning to learn
Grantee:	Edesio Pinto de Souza Alcobaça Neto
Support Opportunities:	Scholarships in Brazil - Doctorate (Direct)


FAPESP's process:	16/18615-0 - Advanced machine learning
Grantee:	André Carlos Ponce de Leon Ferreira de Carvalho
Support Opportunities:	Research Grants - Research Partnership for Technological Innovation - PITE

Short URL