Busca avançada
Ano de início
Entree


Evaluation of Normalization Techniques in Text Classification for Portuguese

Texto completo
Autor(es):
Conrado, Merley da Silva ; Laguna Gutierrez, Victor Antonio ; Rezende, Solange Oliveira ; Murgante, B ; Gervasi, O ; Misra, S ; Nedjah, N ; Rocha, AMAC ; Taniar, D ; Apduhan, BO
Número total de Autores: 10
Tipo de documento: Artigo Científico
Fonte: COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT III; v. 7335, p. 13-pg., 2012-01-01.
Resumo

Text classification is an important task of Artificial Intelligence. Normally, this task uses large textual datasets whose representation is feasible because of normalization and selection techniques. In the literature, we can find three normalization techniques: stemming, lemmatization, and nominalization. Nevertheless, it is difficult to choose the most suitable technique for the text classification task. In this paper, we investigate this question experimentally by applying five different classifiers to four textual datasets in the Portuguese language. Additionally, the classification results are evaluated using unigrams, bigrams, and the combination of unigrams and bigrams. The results indicate that, in general, the number of terms obtained by each of the cases and the comprehensibility required in the results of the classification can be used as criteria to define the most suitable technique for the text classification task. (AU)

Processo FAPESP: 09/16142-3 - Modelo híbrido de extração de termos aplicado na mineração de textos
Beneficiário:Merley da Silva Conrado
Modalidade de apoio: Bolsas no Brasil - Doutorado
Processo FAPESP: 11/19850-9 - Métodos de agrupamento hierárquico para organização automática de resultados de motores de busca
Beneficiário:Solange Oliveira Rezende
Modalidade de apoio: Auxílio à Pesquisa - Regular