Biblioteca Virtual - Centro de Documentação e Informação da FAPESP

Busca avançada

Pesquisar - Utilize aspas para obter um resultado mais específico

Índice

Área do conhecimento

Ano de início

Entree

(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Texto completo
Autor(es):	de Arruda, Henrique F. ^[1] ; Marinho, Vanessa Q. ^[1] ; Costa, Luciano da F. ^[2] ; Amancio, Diego R. ^{[1, 3]} Número total de Autores: 4
Afiliação do(s) autor(es):	^[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP - Brazil ^[2] Univ Sao Paulo, Sao Carlos Inst Phys, Sao Carlos, SP - Brazil ^[3] Indiana Univ, Sch Informat Comp & Engn, Bloomington, IN 47408 - USA Número total de Afiliações: 3
Tipo de documento:	Artigo Científico
Fonte:	INFORMATION PROCESSING & MANAGEMENT; v. 56, n. 3, p. 479-494, MAY 2019.
Citações Web of Science:	1
Resumo
An interesting model to represent texts as a graph (also called network) is the word adjacency (co-occurrence) representation, which is known to capture mainly syntactical features of texts. In this study, we propose a novel network model, which is based on the similarity between the content of the paragraphs of the text. By considering this representation, we characterized the networks with respect to measurements developed in the network science area. We characterized these measurements according to their properties regarding their ability to discriminate between real and shuffled texts, and to capture information regarding the content similarity of chunks of text. In order to compare the results with a more sophisticated approach, we employed a methodology based on word2vec. When comparing real and shuffled texts, the results revealed that real texts tend to have a more well-defined community structure. This characteristic can be related to the organization of subjects in real texts. The network-based measurements that were found to be able to discriminate real from shuffled texts were used as features in a classifier. As a result, the obtained accuracy was 98.72%. In order to compare with a different methodology, we used doc2vec-based features in the classifier, yielding an accuracy rate of 70.8%. The proposed network-based features were employed to analyze the Voynich manuscript, which was found to be compatible with real texts according to the considered characteristics. (AU)

Processo FAPESP:	17/13464-6 - Modelando grafos de citação e informação: uma abordagem baseada em redes complexas
Beneficiário:	Diego Raphael Amancio
Modalidade de apoio:	Bolsas no Exterior - Pesquisa


Processo FAPESP:	15/22308-2 - Representações intermediárias em ciência computacional para descoberta de conhecimento
Beneficiário:	Roberto Marcondes Cesar Junior
Modalidade de apoio:	Auxílio à Pesquisa - Temático


Processo FAPESP:	16/19069-9 - Classificação de documentos usando informações semânticas em redes complexas
Beneficiário:	Diego Raphael Amancio
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	11/50761-2 - Modelos e metodos de escience para ciencias da vida e agrarias. (fapesp-mct/cnpq/pronex-2011)
Beneficiário:	Roberto Marcondes Cesar Junior
Modalidade de apoio:	Auxílio à Pesquisa - Temático


Processo FAPESP:	15/05676-8 - Desenvolvimento de novos modelos para reconhecimento de autoria com a utilização de redes complexas
Beneficiário:	Vanessa Queiroz Marinho
Modalidade de apoio:	Bolsas no Brasil - Mestrado