Labelled network subgraphs reveal stylistic subtleties in written texts

Marinho, Vanessa Queiroz; Hirst, Graeme; Amancio, Diego Raphael

Texto completo
Autor(es):	Marinho, Vanessa Queiroz ^[1] ; Hirst, Graeme ^[2] ; Amancio, Diego Raphael ^[1] Número total de Autores: 3
Afiliação do(s) autor(es):	^[1] Univ Sao Paulo, Inst Math & Comp Sci, Ave Trabalhador Sancarlense, 400 Ctr, BR-13566590 Sao Carlos, SP - Brazil ^[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4 - Canada Número total de Afiliações: 2
Tipo de documento:	Artigo Científico
Fonte:	JOURNAL OF COMPLEX NETWORKS; v. 6, n. 4, p. 620-638, AUG 2018.
Citações Web of Science:	0
Resumo
The vast amount of data and increase of computational capacity have allowed the analysis of texts from several perspectives, including the representation of texts as complex networks. Nodes of the network represent the words, and edges represent some relationship, usually word co-occurrence. Even though networked representations have been applied to study some tasks, such approaches are not usually combined with traditional models relying upon statistical paradigms. Because networked models are able to grasp textual patterns, we devised a hybrid classifier, called labelled subgraphs, that combines the frequency of common words with small structures found in the topology of the network. Our approach is illustrated in two contexts, authorship attribution and translationese identification. In the former, a set of novels written by different authors is analysed. To identify translationese, texts from the Canadian Hansard and the European Parliament were classified as to original and translated instances. Our results suggest that labelled subgraphs are able to represent texts and it should be further explored in other tasks, such as the analysis of text complexity, language proficiency and machine translation. (AU)

Processo FAPESP:	15/05676-8 - Desenvolvimento de novos modelos para reconhecimento de autoria com a utilização de redes complexas
Beneficiário:	Vanessa Queiroz Marinho
Modalidade de apoio:	Bolsas no Brasil - Mestrado


Processo FAPESP:	14/20830-0 - Modelagem e reconhecimento de padrões em textos com redes complexas
Beneficiário:	Diego Raphael Amancio
Modalidade de apoio:	Auxílio à Pesquisa - Regular


Processo FAPESP:	15/23803-7 - Atribuição de autoria através do uso de métodos tradicionais e redes complexas
Beneficiário:	Vanessa Queiroz Marinho
Modalidade de apoio:	Bolsas no Exterior - Estágio de Pesquisa - Mestrado


Processo FAPESP:	16/19069-9 - Classificação de documentos usando informações semânticas em redes complexas
Beneficiário:	Diego Raphael Amancio
Modalidade de apoio:	Auxílio à Pesquisa - Regular

URL curto