Advanced search
Start date
Betweenand

Pre-processing of social network's texts for emerging topics identification

Grant number: 10/19546-5
Support Opportunities:Scholarships in Brazil - Scientific Initiation
Start date: March 01, 2011
End date: February 29, 2012
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Solange Oliveira Rezende
Grantee:Maíra Machado Ladeira
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

Social networks consist of groups of users with common interests to promote the exchange of ideas and knowledge. This is an emergent phenomenon that generates dynamic information which involves topics related to several areas according to events in real time. Topics covering a given area of growing interest and usefulness in a given time are known as emerging topics. The extraction of knowledge from these topics has been very important and useful. However, the amount of data generated for a given subject is usually very high preventing an efficient extraction of knowledge through the manual analysis. Text mining techniques, automatic or semi-automatic, enables to detect emerging topics and the frequency they are present in text collections from social networks. Since the texts are short and informal, classical techniques of text mining have three main problems in the preparation and pre-processing of text collections from social networks: the elimination of common words, the use of slangs and abbreviations, and the errors and neologism spelling. These occurrences are very common in writing in social networking environments and usually their treatment is not covered in traditional text mining. Only by addressing these issues during the pre-processing of data, can the basis of the textual data be analyzed to extract the desired emerging topics making the pre-processing of data an important part and focus of this project. Therefore the identification and development of text mining techniques are proposed for the pre-processing of short text collections from social networks in order to identify emerging topics.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)