Advanced search
Start date

Incorporating the semantics into the websensors construction process

Grant number: 13/14757-6
Support type:Scholarships in Brazil - Doctorate
Effective date (Start): December 01, 2013
Effective date (End): May 31, 2018
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Solange Oliveira Rezende
Grantee:Roberta Akemi Sinoara
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated scholarship(s):16/07620-2 - Semantic Representation for Text Classification, BE.EP.DR


Text Mining techniques for supporting knowledge discovery become essential as the volume and variety of digital text documents increase, either in social networks, web or inside the organizations. As well as the text sources, the possibilities of Text Mining applications are varied. Applications and researches have been developed with the goal of using the web as a powerful social sensor. In this context, the websensors arises as sensors that monitor text documents publishing and keep a time series of certain topics. The applicability of websensors is wide. According to the documents monitored, the activity of a websensor can support the understanding, the explanation of the prediction of a fact. Websensors can be built from text clustering, and therefore avoiding the need of large amounts of labeled data or intense effort of a domain specialist to define the sensors' parameters. However, the semantic aspects of the texts can be crucial to the quality and effective usage of the extracted clusters. When learning good websensors, for instance, it may require a text organization that differs documents which, despite of using the same vocabulary, present different ideas about the same subject. Text Mining researches have shown several advances in the past years; however, the semantic issue is still a challenge of the field. Motivated by this gap, this PhD research project aims to incorporate the semantics in the process of websensors construction, achieving a more refined organization which considers the ideas expressed in the documents. It will be developed a new text data representation format in order to represent semantic aspects. Besides that, clustering algorithms will be developed or adapted to make effective use of the semantic representation. Although this project is focused on the inclusion of semantics in the construction of websensors through clustering methods, it is noteworthy that its results can later be expanded to other Text Mining tasks, as document classification and sentiment analysis. (AU)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
SINOARA, ROBERTA A.; CAMACHO-COLLADOS, JOSE; ROSSI, RAFAEL G.; NAVIGLI, ROBERTO; REZENDE, SOLANGE O. Knowledge-enhanced document embeddings for text classification. KNOWLEDGE-BASED SYSTEMS, v. 163, p. 955-971, JAN 1 2019. Web of Science Citations: 4.
Academic Publications
(References retrieved automatically from State of São Paulo Research Institutions)
SINOARA, Roberta Akemi. Semantic aspects in the representation of texts for automatic classification. 2018. Doctoral Thesis - Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação São Carlos.

Please report errors in scientific publications list by writing to: