Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection

Texto completo
Autor(es):
Estrela, Gustavo [1, 2] ; Gubitoso, Marco Dimas [2] ; Ferreira, Carlos Eduardo [2] ; Barrera, Junior [2] ; Reis, Marcelo S. [1]
Número total de Autores: 5
Afiliação do(s) autor(es):
[1] Inst Butantan, Ctr Toxins Immune Response & Cell Signaling CeT, Lab Ciclo Celular, BR-05503900 Butanta, SP - Brazil
[2] Univ Sao Paulo, Inst Matemat & Estat, BR-05503900 Sao Paulo, SP - Brazil
Número total de Afiliações: 2
Tipo de documento: Artigo Científico
Fonte: Entropy; v. 22, n. 4 APR 2020.
Citações Web of Science: 0
Resumo

In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection. (AU)

Processo FAPESP: 16/25959-7 - Projeto de algoritmos baseados em florestas de posets para o problema de otimização U-curve
Beneficiário:Gustavo Estrela de Matos
Linha de fomento: Bolsas no Brasil - Iniciação Científica
Processo FAPESP: 13/07467-1 - CeTICS - Centro de Toxinas, Imuno-Resposta e Sinalização Celular
Beneficiário:Hugo Aguirre Armelin
Linha de fomento: Auxílio à Pesquisa - Centros de Pesquisa, Inovação e Difusão - CEPIDs
Processo FAPESP: 15/01587-0 - Armazenagem, modelagem e análise de sistemas dinâmicos para aplicações em e-Science
Beneficiário:João Eduardo Ferreira
Linha de fomento: Auxílio à Pesquisa - Programa eScience e Data Science - Temático