Advanced search
Start date
Betweenand

Using semantical information to classify texts modelled as complex networks

Grant number: 16/19069-9
Support type:Regular Research Grants
Duration: February 01, 2017 - December 31, 2017
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Diego Raphael Amancio
Grantee:Diego Raphael Amancio
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

Complex networks have been used to model many complex systems, owing to its versatility to represent systems as an association of nodes. Even though network science has already been used to analyze written texts in recent years, the majority of works exploring topological information of networks have emphasized only the stylistic/structural properties of documents. Here, we propose an extension of traditional models to grasp semantic aspects of texts. Examples of proposed enhancements include the representation of texts in a multi-scale fashion, where nodes may represent words, sentence, paragraphs, set of paragraphs, etc. We also intend to improve the semantical characterization of texts by including semantical links. To do so, we intend to implement recent advances in textual similarity research, which includes vectorial representations of words using word embeddings. In this context, we plan to tackle two tasks related to text classification, namely topic segmentation and multi-document extractive summarization.To address these natural language processing tasks, modifications in community detection methods and multi-layer models are proposed as the main tools designedto include semantical information in traditional representations. Owing to the generality of the proposed methods, we believe that the tools proposed here could be easily extended to analyze similar natural language processing tasks. (AU)

Scientific publications (15)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
CORREA, JR., EDILSON A.; AMANCIO, DIEGO R. Word sense induction using word embeddings and community detection in complex networks. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 523, p. 180-190, JUN 1 2019. Web of Science Citations: 0.
DE ARRUDA, HENRIQUE F.; MARINHO, VANESSA Q.; COSTA, LUCIANO DA F.; AMANCIO, DIEGO R. Paragraph-based representation of texts: A complex networks approach. INFORMATION PROCESSING & MANAGEMENT, v. 56, n. 3, p. 479-494, MAY 2019. Web of Science Citations: 1.
DE ARRUDA, HENRIQUE F.; SILVA, FILIPI N.; COMIN, CESAR H.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F. Connecting network science and information theory. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 515, p. 641-648, FEB 1 2019. Web of Science Citations: 0.
RODRIGUEZ, MAYRA Z.; COMIN, CESAR H.; CASANOVA, DALCIMAR; BRUNO, ODEMIR M.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F.; RODRIGUES, FRANCISCO A. Clustering algorithms: A comparative approach. PLoS One, v. 14, n. 1 JAN 15 2019. Web of Science Citations: 2.
DE ARRUDA, HENRIQUE F.; MARINHO, VANESSA Q.; LIMA, THALES S.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F. An image analysis approach to text analytics based on complex networks. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 510, p. 110-120, NOV 15 2018. Web of Science Citations: 2.
TOHALINO, JORGE V.; AMANCIO, DIEGO R. Extractive multi-document summarization using multilayer networks. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 503, p. 526-539, AUG 1 2018. Web of Science Citations: 2.
LIMA, THALES S.; DE ARRUDA, HENRIQUE F.; SILVA, FILIPI N.; COMIN, CESAR H.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F. The dynamics of knowledge acquisition via self-learning in complex networks. Chaos, v. 28, n. 8 AUG 2018. Web of Science Citations: 1.
MARINHO, VANESSA QUEIROZ; HIRST, GRAEME; AMANCIO, DIEGO RAPHAEL. Labelled network subgraphs reveal stylistic subtleties in written texts. JOURNAL OF COMPLEX NETWORKS, v. 6, n. 4, p. 620-638, AUG 2018. Web of Science Citations: 0.
CORREA, JR., EDILSON A.; LOPES, ALNEU A.; AMANCIO, DIEGO R. Word sense disambiguation: A complex network approach. INFORMATION SCIENCES, v. 442, p. 103-113, MAY 2018. Web of Science Citations: 4.
AKIMUSHKIN, CAMILO; AMANCIO, DIEGO R.; OLIVEIRA, JR., OSVALDO N. On the role of words in the network structure of texts: Application to authorship attribution. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 495, p. 49-58, APR 1 2018. Web of Science Citations: 1.
MACHICAO, JEANETH; CORREA, JR., EDILSON A.; MIRANDA, GISELE H. B.; AMANCIO, DIEGO R.; BRUNO, ODEMIR M. Authorship attribution based on Life-Like Network Automata. PLoS One, v. 13, n. 3 MAR 22 2018. Web of Science Citations: 0.
DE ARRUDA, HENRIQUE FERRAZ; SILVA, FILIPI NASCIMENTO; MARINHO, VANESSA QUEIROZ; AMANCIO, DIEGO RAPHAEL; COSTA, LUCIANO DA FONTOURA. Representation of texts as complex networks: a mesoscopic approach. JOURNAL OF COMPLEX NETWORKS, v. 6, n. 1, p. 125-144, FEB 2018. Web of Science Citations: 3.
DE ARRUDA, HENRIQUE F.; SILVA, FILIPI N.; COSTA, LUCIANO DA F.; AMANCIO, DIEGO R. Knowledge acquisition: A Complex networks approach. INFORMATION SCIENCES, v. 421, p. 154-166, DEC 2017. Web of Science Citations: 9.
CORREA, JR., EDILSON A.; SILVA, FILIPI N.; COSTA, LUCIANO DA F.; AMANCIO, DIEGO R. Patterns of authors contribution in scientific manuscripts. Journal of Informetrics, v. 11, n. 2, p. 498-510, MAY 2017. Web of Science Citations: 5.
AKIMUSHKIN, CAMILO; AMANCIO, DIEGO RAPHAEL; OLIVEIRA, JR., OSVALDO NOVAIS. Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks. PLoS One, v. 12, n. 1 JAN 26 2017. Web of Science Citations: 15.

Please report errors in scientific publications list by writing to: cdi@fapesp.br.