Advanced search
Start date
Betweenand

A continuously evolving distributed text representation model

Abstract

The increasing volume of unstructured data produced by humanity motivated the employment of machines to perform tasks traditionally performed by humans, such as translation, transcription, opinion mining, among others. Although there are many methods for text categorization, it is still a challenge to find a text computational representation able to capture the semantic meaning and to continuously increase the vocabulary, as well as evolve the knowledge regarding the relations between terms and sentences. With existing representation models, changes in text patterns are not readily reflected in the computational model. Therefore, scenarios which the textual pattern is dynamic and changes frequently, the available models require a long time and cost for adaptation. In such context, the scenario of short and noisy texts, commonly found in text communication by web and smartphones, is one of the applications that demands incremental models, since new terms can arise all the time, such as symbols, slang and abbreviations. In this way, this research project proposes to use unsupervised clustering techniques with the state-of-the-art recurrent neural networks to create a computational model of text representation able to learn continuously, associating new terms with groups of known terms, allowing terms not yet seen to have relevance by the existing model. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
Articles published in other media outlets (0 total):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
BITTENCOURT, MARCIELE M.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning. APPLIED SOFT COMPUTING, v. 96, . (18/02146-6, 17/09387-6)
FREITAS, BRENO L.; SILVA, RENATO M.; ALMEIDA, TIAGO A.. Gaussian Mixture Descriptors Learner. KNOWLEDGE-BASED SYSTEMS, v. 188, . (18/02146-6, 17/09387-6)
SILVA, RENATO M.; SANTOS, RONEY L. S.; ALMEIDA, TIAGO A.; PARDO, THIAGO A. S.. Towards automatically filtering fake news in Portuguese. EXPERT SYSTEMS WITH APPLICATIONS, v. 146, . (18/02146-6, 17/09387-6)

Please report errors in scientific publications list by writing to: cdi@fapesp.br.