Busca avançada
Ano de início
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Complex networks analysis of language complexity

Texto completo
Amancio, Diego R. [1] ; Aluisio, Sandra M. [2] ; Oliveira, Jr., Osvaldo N. [1] ; Costa, Luciano da F. [1]
Número total de Autores: 4
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, Inst Phys Sao Carlos, BR-13560970 Sao Paulo - Brazil
[2] Univ Sao Paulo, Inst Math & Comp Sci, BR-13560970 Sao Paulo - Brazil
Número total de Afiliações: 2
Tipo de documento: Artigo Científico
Fonte: EPL; v. 100, n. 5 DEC 2012.
Citações Web of Science: 19

Methods from statistical physics, such as those involving complex networks, have been increasingly used in the quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. Furthermore, in less complex texts the distance between concepts, represented as nodes, tended to decrease. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest paths and diversity. Also, the discrimination of complex texts was improved with higher hierarchical network metrics, thus pointing to the usefulness of considering wider contexts around the concepts. Though the accuracy rate in the distinction was not as high as in methods using deep linguistic knowledge, the complex network approach is still useful for a rapid screening of texts whenever assessing complexity is essential to guarantee accessibility to readers with limited reading ability. Copyright (c) EPLA, 2012 (AU)

Processo FAPESP: 11/50761-2 - Modelos e métodos de e-Science para ciências da vida e agrárias
Beneficiário:Roberto Marcondes Cesar Junior
Linha de fomento: Auxílio à Pesquisa - Temático
Processo FAPESP: 10/00927-9 - Classificação de textos com redes complexas
Beneficiário:Diego Raphael Amancio
Linha de fomento: Bolsas no Brasil - Doutorado Direto