Advanced search
Start date
Betweenand

Hybrid model of term extraction applied in text mining

Grant number: 09/16142-3
Support Opportunities:Scholarships in Brazil - Doctorate
Start date: July 01, 2011
End date: May 31, 2014
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:Solange Oliveira Rezende
Grantee:Merley da Silva Conrado
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated scholarship(s):12/09375-4 - Word features for term extraction in online forums, BE.EP.DR

Abstract

Due to the huge amount of information in textual format currently available in the digital world, it is necessary to, automatically or semi-automatically, organize it into useful knowledge. The text mining process has been used to aimed it. One of the most important steps of this process is the term extraction. These terms have a great influence on the outcome of this process, since they represent the area of knowledge explored. It is therefore of vital importance for the efficiency of the process to ensure the quality of the terms obtained. The terms can be extracted using a statistical approach, which is in general computationally cheap, and the linguistic approach, which usually gets better results than the statistics. In this context, an interesting approach is the adoption of hybrid models for the treatment of terms, since they integrate the advantages and disadvantages of statistical and linguistic approaches. Because of this fact and the important gap related with the research int this area exclusively devoted to databases of unsupervised texts in the Portuguese language, this work aims to propose a model for the term extraction involving hybrid approaches to the context of Text Mining focused on the Portuguese Language. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
CONRADO, MERLEY DA SILVA; LAGUNA GUTIERREZ, VICTOR ANTONIO; REZENDE, SOLANGE OLIVEIRA; MURGANTE, B; GERVASI, O; MISRA, S; NEDJAH, N; ROCHA, AMAC; TANIAR, D; APDUHAN, BO. Evaluation of Normalization Techniques in Text Classification for Portuguese. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT III, v. 7335, p. 13-pg., . (09/16142-3, 11/19850-9)