Advanced search
Start date
Betweenand

Using semantical information to classify texts modelled as complex networks

Grant number: 16/19069-9
Support Opportunities:Regular Research Grants
Start date: February 01, 2017
End date: December 31, 2017
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Diego Raphael Amancio
Grantee:Diego Raphael Amancio
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

Complex networks have been used to model many complex systems, owing to its versatility to represent systems as an association of nodes. Even though network science has already been used to analyze written texts in recent years, the majority of works exploring topological information of networks have emphasized only the stylistic/structural properties of documents. Here, we propose an extension of traditional models to grasp semantic aspects of texts. Examples of proposed enhancements include the representation of texts in a multi-scale fashion, where nodes may represent words, sentence, paragraphs, set of paragraphs, etc. We also intend to improve the semantical characterization of texts by including semantical links. To do so, we intend to implement recent advances in textual similarity research, which includes vectorial representations of words using word embeddings. In this context, we plan to tackle two tasks related to text classification, namely topic segmentation and multi-document extractive summarization.To address these natural language processing tasks, modifications in community detection methods and multi-layer models are proposed as the main tools designedto include semantical information in traditional representations. Owing to the generality of the proposed methods, we believe that the tools proposed here could be easily extended to analyze similar natural language processing tasks. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (24)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
MARINHO, VANESSA QUEIROZ; HIRST, GRAEME; AMANCIO, DIEGO RAPHAEL. Labelled network subgraphs reveal stylistic subtleties in written texts. JOURNAL OF COMPLEX NETWORKS, v. 6, n. 4, p. 620-638, . (15/05676-8, 14/20830-0, 15/23803-7, 16/19069-9)
GEWERS, FELIPE L.; FERREIRA, GUSTAVO R.; DE ARRUDA, HENRIQUE F.; SILVA, FILIPI N.; COMIN, CESAR H.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F.. Principal Component Analysis: A Natural Approach to Data Exploration. ACM COMPUTING SURVEYS, v. 54, n. 4, . (17/13464-6, 18/09125-4, 15/22308-2, 16/19069-9, 18/10489-0, 11/50761-2, 19/16223-5)
DE ARRUDA, HENRIQUE FERRAZ; SILVA, FILIPI NASCIMENTO; MARINHO, VANESSA QUEIROZ; AMANCIO, DIEGO RAPHAEL; COSTA, LUCIANO DA FONTOURA. Representation of texts as complex networks: a mesoscopic approach. JOURNAL OF COMPLEX NETWORKS, v. 6, n. 1, p. 125-144, . (16/19069-9, 11/50761-2, 15/05676-8, 14/20830-0, 15/08003-4)
DE ARRUDA, HENRIQUE F.; SILVA, FILIPI N.; COSTA, LUCIANO DA F.; AMANCIO, DIEGO R.. Knowledge acquisition: A Complex networks approach. INFORMATION SCIENCES, v. 421, p. 154-166, . (14/20830-0, 16/19069-9, 11/50761-2, 15/08003-4)
DE ARRUDA, HENRIQUE F.; SILVA, FILIPI N.; COMIN, CESAR H.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F.. Connecting network science and information theory. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 515, p. 641-648, . (15/18942-8, 16/19069-9, 14/20830-0, 15/08003-4, 11/50761-2)
CORREA, JR., EDILSON A.; SILVA, FILIPI N.; COSTA, LUCIANO DA F.; AMANCIO, DIEGO R.. Patterns of authors contribution in scientific manuscripts. Journal of Informetrics, v. 11, n. 2, p. 498-510, . (14/20830-0, 16/19069-9, 11/50761-2, 15/08003-4)
DE ARRUDA, HENRIQUE F.; MARINHO, VANESSA Q.; COSTA, LUCIANO DA F.; AMANCIO, DIEGO R.. Paragraph-based representation of texts: A complex networks approach. INFORMATION PROCESSING & MANAGEMENT, v. 56, n. 3, p. 479-494, . (17/13464-6, 15/22308-2, 16/19069-9, 11/50761-2, 15/05676-8)
CORREA, JR., EDILSON A.; AMANCIO, DIEGO R.. Word sense induction using word embeddings and community detection in complex networks. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 523, p. 180-190, . (14/20830-0, 17/13464-6, 16/19069-9)
MACHICAO, JEANETH; CORREA, JR., EDILSON A.; MIRANDA, GISELE H. B.; AMANCIO, DIEGO R.; BRUNO, ODEMIR M.. Authorship attribution based on Life-Like Network Automata. PLoS One, v. 13, n. 3, . (17/13464-6, 14/20830-0, 15/05899-7, 16/19069-9, 14/08026-1)
COMIN, CESAR H.; PERON, THOMAS; SILVA, FILIPI N.; AMANCIO, DIEGO R.; RODRIGUES, FRANCISCO A.; COSTA, LUCIANO DA F.. Complex systems: Features, similarity and connectivity. PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS, v. 861, p. 1-41, . (15/22308-2, 15/08003-4, 16/23827-6, 18/09125-4, 16/19069-9, 14/20830-0, 13/26416-9)
AKIMUSHKIN, CAMILO; AMANCIO, DIEGO RAPHAEL; OLIVEIRA, JR., OSVALDO NOVAIS. Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks. PLoS One, v. 12, n. 1, . (14/20830-0, 16/19069-9)
CORREA JR, EDILSON A.; MARINHO, VANESSA Q.; AMANCIO, DIEGO R.. Semantic flow in language networks discriminates texts by genre and publication date. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 557, . (15/05676-8, 16/19069-9)
MEDEIROS BRITO, ANA CAROLINE; SILVA, FILIPI NASCIMENTO; AMANCIO, DIEGO RAPHAEL. A complex network approach to political analysis: Application to the Brazilian Chamber of Deputies. PLoS One, v. 15, n. 3, . (16/19069-9)
BRITO, ANA C. M.; SILVA, FILIPI N.; DE ARRUDA, HENRIQUE F.; COMIN, CESAR H.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F.. Classification of abrupt changes along viewing profiles of scientific articles. Journal of Informetrics, v. 15, n. 2, p. 15-pg., . (18/09125-4, 18/10489-0, 15/22308-2, 16/19069-9, 15/08003-4)
BRITO, ANA C. M.; SILVA, FILIPI N.; DE ARRUDA, HENRIQUE F.; COMIN, CESAR H.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F.. lassification of abrupt changes along viewing profiles of scientific article. Journal of Informetrics, v. 15, n. 2, . (18/10489-0, 19/16223-5, 15/08003-4, 16/19069-9, 18/09125-4, 15/22308-2)
TOHALINO, JORGE V.; AMANCIO, DIEGO R.. Extractive multi-document summarization using multilayer networks. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 503, p. 526-539, . (17/13464-6, 16/19069-9)
RODRIGUEZ, MAYRA Z.; COMIN, CESAR H.; CASANOVA, DALCIMAR; BRUNO, ODEMIR M.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F.; RODRIGUES, FRANCISCO A.. Clustering algorithms: A comparative approach. PLoS One, v. 14, n. 1, . (16/19069-9, 14/20830-0, 15/18942-8, 15/22308-2, 14/08026-1, 18/09125-4, 11/50761-2)
TOHALINO, JORGE VALVERDE; AMANCIO, DIEGO RAPHAEL; IEEE. Extractive Multi Document Summarization using Dynamical Measurements of Complex Networks. 2017 6TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), v. N/A, p. 6-pg., . (16/19069-9)
DA SILVA, EDUARDO BORGES; SILVA, THIAGO CHRISTIANO; CONSTANTINO, MICHEL; AMANCIO, DIEGO RAPHAEL; TABAK, BENJAMIN MIRANDA. Overconfidence and the 2D:4D ratio. JOURNAL OF BEHAVIORAL AND EXPERIMENTAL FINANCE, v. 25, . (16/19069-9)
CORREA, JR., EDILSON A.; LOPES, ALNEU A.; AMANCIO, DIEGO R.. Word sense disambiguation: A complex network approach. INFORMATION SCIENCES, v. 442, p. 103-113, . (17/13464-6, 16/19069-9, 15/14228-9, 14/20830-0, 11/22749-8)
LIMA, THALES S.; DE ARRUDA, HENRIQUE F.; SILVA, FILIPI N.; COMIN, CESAR H.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F.. The dynamics of knowledge acquisition via self-learning in complex networks. Chaos, v. 28, n. 8, . (17/13464-6, 17/09280-7, 15/22308-2, 16/19069-9, 11/50761-2, 15/08003-4, 15/18942-8)
DE ARRUDA, HENRIQUE F.; MARINHO, VANESSA Q.; LIMA, THALES S.; AMANCIO, DIEGO R.; COSTA, LUCIANO DA F.. An image analysis approach to text analytics based on complex networks. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 510, p. 110-120, . (16/19069-9, 11/50761-2, 15/22308-2, 15/05676-8)
AKIMUSHKIN, CAMILO; AMANCIO, DIEGO R.; OLIVEIRA, JR., OSVALDO N.. On the role of words in the network structure of texts: Application to authorship attribution. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, v. 495, p. 49-58, . (14/20830-0, 13/14262-7, 16/19069-9)
SILVA, THIAGO CHRISTIANO; AMANCIO, DIEGO RAPHAEL; TABAK, BENJAMIN MIRANDA. Modeling supply-chain networks with firm-to-firm wire transfers. EXPERT SYSTEMS WITH APPLICATIONS, v. 190, . (16/19069-9)