Busca avançada
Ano de início
Entree


Data complexity measures in feature selection

Texto completo
Autor(es):
Okimoto, Lucas C. ; Lorena, Ana C. ; IEEE
Número total de Autores: 3
Tipo de documento: Artigo Científico
Fonte: 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN); v. N/A, p. 8-pg., 2019-01-01.
Resumo

Feature selection (FS) is a pre-processing step often mandatory in data analysis by Machine Learning techniques. Its objective is to reduce data dimensionality by identifying and maintaining only the relevant features from a dataset. In this work we evaluate the use of complexity measures of classification problems in FS. These descriptors allow estimating the intrinsic difficulty of a classification problem by regarding on characteristics of the dataset available for learning. We propose a combined univariate-multivariate FS technique which employs two complexity measures: Fisher's maximum discriminant ratio and sum of intra-extra class distances. The results reveal that the complexity measures are indeed suitable for estimating feature importance in classification datasets. Large reductions in the numbers of features were obtained, while preserving, in general, the predictive accuracy of two strong classification techniques: Support Vector Machines and Random Forests. (AU)

Processo FAPESP: 12/22608-8 - Uso de medidas de complexidade de dados no suporte ao aprendizado de máquina supervisionado
Beneficiário:Ana Carolina Lorena
Modalidade de apoio: Auxílio à Pesquisa - Jovens Pesquisadores