Advanced search
Start date

A continuously evolving distributed text representation model


The increasing volume of unstructured data produced by humanity motivated the employment of machines to perform tasks traditionally performed by humans, such as translation, transcription, opinion mining, among others. Although there are many methods for text categorization, it is still a challenge to find a text computational representation able to capture the semantic meaning and to continuously increase the vocabulary, as well as evolve the knowledge regarding the relations between terms and sentences. With existing representation models, changes in text patterns are not readily reflected in the computational model. Therefore, scenarios which the textual pattern is dynamic and changes frequently, the available models require a long time and cost for adaptation. In such context, the scenario of short and noisy texts, commonly found in text communication by web and smartphones, is one of the applications that demands incremental models, since new terms can arise all the time, such as symbols, slang and abbreviations. In this way, this research project proposes to use unsupervised clustering techniques with the state-of-the-art recurrent neural networks to create a computational model of text representation able to learn continuously, associating new terms with groups of known terms, allowing terms not yet seen to have relevance by the existing model. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items

Scientific publications (6)
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
LOCHTER, JOHANNES, V; PIRES, PEDRO R.; BOSSOLANI, CARLOS; YAMAKAMI, AKEBO; ALMEIDA, TIAGO A.; IEEE. Evaluating the impact of corpora used to train distributed text representation models for noisy and short texts. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), v. N/A, p. 8-pg., . (17/06495-2, 17/09387-6)
BITTENCOURT, MARCIELE M.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning. APPLIED SOFT COMPUTING, v. 96, . (18/02146-6, 17/09387-6)
SILVA, RENATO M.; SANTOS, RONEY L. S.; ALMEIDA, TIAGO A.; PARDO, THIAGO A. S.. Towards automatically filtering fake news in Portuguese. EXPERT SYSTEMS WITH APPLICATIONS, v. 146, . (18/02146-6, 17/09387-6)
FREITAS, BRENO L.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. Gaussian Mixture Descriptors Learner. KNOWLEDGE-BASED SYSTEMS, v. 188, . (18/02146-6, 17/09387-6)
LOCHTER, JOHANNES V.; SILVA, RENATO M.; ALMEIDA, TIAGO A.; YAMAKAMI, AKEBO; WANI, MA; KANTARDZIC, M; SAYEDMOUCHAWEH, M; GAMA, J; LUGHOFER, E. Semantic indexing-based data augmentation for filtering undesired short text messages. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), v. N/A, p. 6-pg., . (17/09387-6, 18/02146-6)
LOCHTER, JOHANNES V.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. Multi-level out-of-vocabulary words handling approach. KNOWLEDGE-BASED SYSTEMS, v. 251, p. 11-pg., . (18/02146-6, 17/09387-6)

Please report errors in scientific publications list using this form.