Advanced search
Start date
Betweenand

A continuously evolving distributed text representation model

Grant number: 17/09387-6
Support Opportunities:Regular Research Grants
Start date: September 01, 2017
End date: February 29, 2020
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Tiago Agostinho de Almeida
Grantee:Tiago Agostinho de Almeida
Host Institution: Centro de Ciências em Gestão e Tecnologia (CCGT). Universidade Federal de São Carlos (UFSCAR). Campus de Sorocaba. Sorocaba , SP, Brazil
Associated researchers:Renato Moraes Silva

Abstract

The increasing volume of unstructured data produced by humanity motivated the employment of machines to perform tasks traditionally performed by humans, such as translation, transcription, opinion mining, among others. Although there are many methods for text categorization, it is still a challenge to find a text computational representation able to capture the semantic meaning and to continuously increase the vocabulary, as well as evolve the knowledge regarding the relations between terms and sentences. With existing representation models, changes in text patterns are not readily reflected in the computational model. Therefore, scenarios which the textual pattern is dynamic and changes frequently, the available models require a long time and cost for adaptation. In such context, the scenario of short and noisy texts, commonly found in text communication by web and smartphones, is one of the applications that demands incremental models, since new terms can arise all the time, such as symbols, slang and abbreviations. In this way, this research project proposes to use unsupervised clustering techniques with the state-of-the-art recurrent neural networks to create a computational model of text representation able to learn continuously, associating new terms with groups of known terms, allowing terms not yet seen to have relevance by the existing model. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications (6)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
BITTENCOURT, MARCIELE M.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning. APPLIED SOFT COMPUTING, v. 96, . (18/02146-6, 17/09387-6)
LOCHTER, JOHANNES, V; PIRES, PEDRO R.; BOSSOLANI, CARLOS; YAMAKAMI, AKEBO; ALMEIDA, TIAGO A.; IEEE. Evaluating the impact of corpora used to train distributed text representation models for noisy and short texts. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), v. N/A, p. 8-pg., . (17/06495-2, 17/09387-6)
SILVA, RENATO M.; SANTOS, RONEY L. S.; ALMEIDA, TIAGO A.; PARDO, THIAGO A. S.. Towards automatically filtering fake news in Portuguese. EXPERT SYSTEMS WITH APPLICATIONS, v. 146, . (18/02146-6, 17/09387-6)
FREITAS, BRENO L.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. Gaussian Mixture Descriptors Learner. KNOWLEDGE-BASED SYSTEMS, v. 188, . (18/02146-6, 17/09387-6)
LOCHTER, JOHANNES V.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. Multi-level out-of-vocabulary words handling approach. KNOWLEDGE-BASED SYSTEMS, v. 251, p. 11-pg., . (18/02146-6, 17/09387-6)
LOCHTER, JOHANNES V.; SILVA, RENATO M.; ALMEIDA, TIAGO A.; YAMAKAMI, AKEBO; WANI, MA; KANTARDZIC, M; SAYEDMOUCHAWEH, M; GAMA, J; LUGHOFER, E. Semantic indexing-based data augmentation for filtering undesired short text messages. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), v. N/A, p. 6-pg., . (17/09387-6, 18/02146-6)