Advanced search
Start date
Betweenand


Quantifying the hierarchical adherence of modular documents

Full text
Author(s):
Benatti, Alexandre ; Brito, Ana C. M. ; Amancio, Diego R. ; Costa, Luciano da F.
Total Authors: 4
Document type: Journal article
Source: JOURNAL OF PHYSICS-COMPLEXITY; v. 4, n. 4, p. 18-pg., 2023-12-01.
Abstract

Several natural and artificial structures are characterized by an intrinsic hierarchical organization. The present work describes a methodology for quantifying the degree of adherence between a given hierarchical template and a respective modular document (e.g. books or homepages with content organized into modules) organized as a respective content network. The original document, which in the case of the present work concerns Wikipedia pages, is transformed into a respective content network by first dividing the document into parts or modules. Then, the contents (words) of each pair of modules are compared in terms of the coincidence similarity index, yielding a respective weight. The adherence between the hierarchical template and the content network can then be measured by considering the coincidence similarity between the respective adjacency matrices, leading to the respective hierarchical adherence index. In order to provide additional information about this adherence, four specific indices are also proposed, quantifying the number of links between non-adjacent levels, links between nodes in the same level, converging links between adjacent levels, and missing links. The potential of the approach is illustrated respectively to model-theoretical networks as well as to real-world data obtained from Wikipedia. In addition to confirming the effectiveness of the suggested concepts and methods, the results suggest that real-world documents do not tend to substantially adhere to respective hierarchical templates. (AU)

FAPESP's process: 20/14817-2 - Using complex networks and natural language processing to characterize and predict academic success
Grantee:Ana Caroline Medeiros Brito
Support Opportunities: Scholarships in Brazil - Doctorate
FAPESP's process: 20/06271-0 - Combining complex networks and word embeddings in text classification tasks
Grantee:Diego Raphael Amancio
Support Opportunities: Regular Research Grants
FAPESP's process: 15/22308-2 - Intermediate representations in Computational Science for knowledge discovery
Grantee:Roberto Marcondes Cesar Junior
Support Opportunities: Research Projects - Thematic Grants