Busca avançada
Ano de início
Entree


Quantifying the hierarchical adherence of modular documents

Texto completo
Autor(es):
Benatti, Alexandre ; Brito, Ana C. M. ; Amancio, Diego R. ; Costa, Luciano da F.
Número total de Autores: 4
Tipo de documento: Artigo Científico
Fonte: JOURNAL OF PHYSICS-COMPLEXITY; v. 4, n. 4, p. 18-pg., 2023-12-01.
Resumo

Several natural and artificial structures are characterized by an intrinsic hierarchical organization. The present work describes a methodology for quantifying the degree of adherence between a given hierarchical template and a respective modular document (e.g. books or homepages with content organized into modules) organized as a respective content network. The original document, which in the case of the present work concerns Wikipedia pages, is transformed into a respective content network by first dividing the document into parts or modules. Then, the contents (words) of each pair of modules are compared in terms of the coincidence similarity index, yielding a respective weight. The adherence between the hierarchical template and the content network can then be measured by considering the coincidence similarity between the respective adjacency matrices, leading to the respective hierarchical adherence index. In order to provide additional information about this adherence, four specific indices are also proposed, quantifying the number of links between non-adjacent levels, links between nodes in the same level, converging links between adjacent levels, and missing links. The potential of the approach is illustrated respectively to model-theoretical networks as well as to real-world data obtained from Wikipedia. In addition to confirming the effectiveness of the suggested concepts and methods, the results suggest that real-world documents do not tend to substantially adhere to respective hierarchical templates. (AU)

Processo FAPESP: 20/14817-2 - Usando redes complexas e processamento de línguas naturais para caracterização e previsão de sucesso na Ciência
Beneficiário:Ana Caroline Medeiros Brito
Modalidade de apoio: Bolsas no Brasil - Doutorado
Processo FAPESP: 20/06271-0 - Combinando redes complexas e word embeddings em tarefas de classificação de textos
Beneficiário:Diego Raphael Amancio
Modalidade de apoio: Auxílio à Pesquisa - Regular
Processo FAPESP: 15/22308-2 - Representações intermediárias em Ciência Computacional para descoberta de conhecimento
Beneficiário:Roberto Marcondes Cesar Junior
Modalidade de apoio: Auxílio à Pesquisa - Temático