Advanced search
Start date
Betweenand

Distributed text representation model with online learning

Grant number: 18/02146-6
Support Opportunities:Scholarships in Brazil - Post-Doctoral
Start date: August 01, 2018
End date: July 31, 2021
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Tiago Agostinho de Almeida
Grantee:Renato Moraes Silva
Host Institution: Centro de Ciências em Gestão e Tecnologia (CCGT). Universidade Federal de São Carlos (UFSCAR). Campus de Sorocaba. Sorocaba , SP, Brazil

Abstract

The amount of digital information stored in text format has been growing radically for the last decade with the digital inclusion and the popularization of smartphones. For that reason, the demand for automatic systems that can extract knowledge from texts has increased and has become more and more fundamental. The quality of these systems is highly influenced by the computational representation models of the texts. The most traditional model, the "bag of words", does not capture the context and semantic relations, is highly sparse and is not able to reflect the constant changes in the textual information patterns, generated by applications such as social networks and instant messaging systems, within an acceptable time. Even the recent distributed text representation models have limitations when used in these scenarios, since they should have incremental learning because new terms, such as slang, symbols, and abbreviations, arise very frequently. Therefore, this research project aims to propose and develop a distributed representation of texts that can be updated online. For this, unsupervised clustering techniques and recurrent neural networks can be combined to associate new terms to groups of known terms, enabling the model to appropriately represent terms not seen before. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (6)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
SILVA, RENATO M.; SANTOS, RONEY L. S.; ALMEIDA, TIAGO A.; PARDO, THIAGO A. S.. Towards automatically filtering fake news in Portuguese. EXPERT SYSTEMS WITH APPLICATIONS, v. 146, . (18/02146-6, 17/09387-6)
LOCHTER, JOHANNES V.; SILVA, RENATO M.; ALMEIDA, TIAGO A.; YAMAKAMI, AKEBO; WANI, MA; KANTARDZIC, M; SAYEDMOUCHAWEH, M; GAMA, J; LUGHOFER, E. Semantic indexing-based data augmentation for filtering undesired short text messages. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), v. N/A, p. 6-pg., . (17/09387-6, 18/02146-6)
SILVA, RENATO M.; LOCHTER, JOHANNES, V; ALMEIDA, TIAGO A.; YAMAKAMI, AKEBO; XAVIER-JUNIOR, JC; RIOS, RA. FastContext: Handling Out-of-Vocabulary Words Using the Word Structure and Context. INTELLIGENT SYSTEMS, PT II, v. 13654, p. 19-pg., . (18/02146-6)
FREITAS, BRENO L.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. Gaussian Mixture Descriptors Learner. KNOWLEDGE-BASED SYSTEMS, v. 188, . (18/02146-6, 17/09387-6)
LOCHTER, JOHANNES V.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. Multi-level out-of-vocabulary words handling approach. KNOWLEDGE-BASED SYSTEMS, v. 251, p. 11-pg., . (18/02146-6, 17/09387-6)
BITTENCOURT, MARCIELE M.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning. APPLIED SOFT COMPUTING, v. 96, . (18/02146-6, 17/09387-6)