Busca avançada
Ano de início
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Using complex networks for text classification: Discriminating informative and imaginative documents

Texto completo
de Arruda, Henrique F. [1] ; Costa, Luciano da F. [2] ; Amancio, Diego R. [1]
Número total de Autores: 3
Afiliação do(s) autor(es):
[1] Univ Sao Paulo Sao Carlos, Inst Math & Comp Sci, Sao Paulo - Brazil
[2] Univ Sao Paulo Sao Carlos, Sao Carlos Inst Phys, Sao Paulo - Brazil
Número total de Afiliações: 2
Tipo de documento: Artigo Científico
Fonte: EPL; v. 113, n. 2 JAN 2016.
Citações Web of Science: 6

Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts. Copyright (C) EPLA, 2016 (AU)

Processo FAPESP: 14/20830-0 - Modelagem e reconhecimento de padrões em textos com redes complexas
Beneficiário:Diego Raphael Amancio
Linha de fomento: Auxílio à Pesquisa - Regular
Processo FAPESP: 12/50986-7 - Graph spectra and complex network evolution
Beneficiário:Luciano da Fontoura Costa
Linha de fomento: Auxílio à Pesquisa - Regular
Processo FAPESP: 11/50761-2 - Modelos e métodos de e-Science para ciências da vida e agrárias
Beneficiário:Roberto Marcondes Cesar Junior
Linha de fomento: Auxílio à Pesquisa - Temático