Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Boosting meta-learning with simulated data complexity measures

Texto completo
Autor(es):
Garcia, Luis P. F. [1] ; Rivolli, Adriano [2] ; Alcobaca, Edesio [3] ; Lorena, Ana C. [4] ; de Carvalho, Andre C. P. L. F. [3]
Número total de Autores: 5
Afiliação do(s) autor(es):
[1] Univ Brasilia, Dept Comp Sci, BR-70910900 Brasilia, DF - Brazil
[2] Technol Univ Parana, Comp Dept, Curitiba, Parana - Brazil
[3] Univ Sao Paulo, Inst Math & Comp Sci, Sao Paulo - Brazil
[4] Aeronaut Inst Technol, Praca Marechal Eduardo Gomes, Sao Paulo - Brazil
Número total de Afiliações: 4
Tipo de documento: Artigo Científico
Fonte: Intelligent Data Analysis; v. 24, n. 5, p. 1011-1028, 2020.
Citações Web of Science: 0
Resumo

Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied to these datasets. The meta-features must describe essential aspects of the dataset and distinguish different problems and solutions. However, if one wants the use of Meta-Learning to be computationally efficient, the extraction of the meta-feature values should also show a low computational cost, considering a trade-off between the time spent to run all the algorithms and the time required to extract the meta-features. One class of measures with successful results in the characterization of classification datasets is concerned with estimating the underlying complexity of the classification problem. These data complexity measures take into account the overlap between classes imposed by the feature values, the separability of the classes and distribution of the instances within the classes. However, the extraction of these measures from datasets usually presents a high computational cost. In this paper, we propose an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability. The proposal consists of a novel Meta-Learning system able to predict the values of the data complexity measures for a dataset by using simpler meta-features as input. In an extensive set of experiments, we show that the predictive performance achieved by Meta-Learning systems which use the predicted data complexity measures is similar to the performance obtained using the original data complexity measures, but the computational cost involved in their computation is significantly reduced. (AU)

Processo FAPESP: 12/22608-8 - Uso de medidas de complexidade de dados no suporte ao aprendizado de máquina supervisionado
Beneficiário:Ana Carolina Lorena
Modalidade de apoio: Auxílio à Pesquisa - Jovens Pesquisadores
Processo FAPESP: 13/07375-0 - CeMEAI - Centro de Ciências Matemáticas Aplicadas à Indústria
Beneficiário:Francisco Louzada Neto
Modalidade de apoio: Auxílio à Pesquisa - Centros de Pesquisa, Inovação e Difusão - CEPIDs
Processo FAPESP: 18/14819-5 - Aprendizado de máquina automático: aprendendo a aprender
Beneficiário:Edesio Pinto de Souza Alcobaça Neto
Modalidade de apoio: Bolsas no Brasil - Doutorado Direto
Processo FAPESP: 16/18615-0 - Aprendizado de máquina avançado
Beneficiário:André Carlos Ponce de Leon Ferreira de Carvalho
Modalidade de apoio: Auxílio à Pesquisa - Parceria para Inovação Tecnológica - PITE