Advanced search
Start date
Betweenand

A continuously evolving distributed text representation model

Grant number: 17/09387-6
Support type:Regular Research Grants
Duration: September 01, 2017 - February 29, 2020
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Tiago Agostinho de Almeida
Grantee:Tiago Agostinho de Almeida
Home Institution: Centro de Ciências em Gestão e Tecnologia (CCGT). Universidade Federal de São Carlos (UFSCAR). Campus de Sorocaba. Sorocaba , SP, Brazil
Assoc. researchers:Renato Moraes Silva

Abstract

The increasing volume of unstructured data produced by humanity motivated the employment of machines to perform tasks traditionally performed by humans, such as translation, transcription, opinion mining, among others. Although there are many methods for text categorization, it is still a challenge to find a text computational representation able to capture the semantic meaning and to continuously increase the vocabulary, as well as evolve the knowledge regarding the relations between terms and sentences. With existing representation models, changes in text patterns are not readily reflected in the computational model. Therefore, scenarios which the textual pattern is dynamic and changes frequently, the available models require a long time and cost for adaptation. In such context, the scenario of short and noisy texts, commonly found in text communication by web and smartphones, is one of the applications that demands incremental models, since new terms can arise all the time, such as symbols, slang and abbreviations. In this way, this research project proposes to use unsupervised clustering techniques with the state-of-the-art recurrent neural networks to create a computational model of text representation able to learn continuously, associating new terms with groups of known terms, allowing terms not yet seen to have relevance by the existing model. (AU)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
BITTENCOURT, MARCIELE M.; SILVA, RENATO M.; ALMEIDA, TIAGO A. ML-MDLText: An efficient and lightweight multilabel text classifier with incremental learning. APPLIED SOFT COMPUTING, v. 96, NOV 2020. Web of Science Citations: 0.
SILVA, RENATO M.; SANTOS, RONEY L. S.; ALMEIDA, TIAGO A.; PARDO, THIAGO A. S. Towards automatically filtering fake news in Portuguese. EXPERT SYSTEMS WITH APPLICATIONS, v. 146, MAY 15 2020. Web of Science Citations: 0.
FREITAS, BRENO L.; SILVA, RENATO M.; ALMEIDA, TIAGO A. Gaussian Mixture Descriptors Learner. KNOWLEDGE-BASED SYSTEMS, v. 188, JAN 5 2020. Web of Science Citations: 0.

Please report errors in scientific publications list by writing to: cdi@fapesp.br.