| Grant number: | 08/02091-5 |
| Support Opportunities: | Scholarships in Brazil - Master |
| Start date: | March 01, 2009 |
| End date: | February 28, 2010 |
| Field of knowledge: | Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques |
| Principal Investigator: | Maria Carolina Monard |
| Grantee: | Ígor Assis Braga |
| Host Institution: | Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil |
Abstract Text Mining (TM) is of great practical importance due to the massive volume of documents available online. Nevertheless, the pattern recognition stage of TM is still highly dependable on the availability of labeled texts. The solution to this problem is the research topic of Semi-supervised (Ss) Learning, which has the potential of reducing the need of expensive labeled data acquisition. Some Ss learning approaches need more than one view (or description) of the data be available. Previous work has not dealt in deep with the extraction of two descriptions from textual data. In this work, we intend to fill this gap. In order to construct the second view of textual data, we propose a hybrid linguistic/statistical terminology extraction approach. The underlying assumption of this approach is that specialized documents are characterized by repeated use of certain lexical units or morphosyntactic constructions. (AU) | |
| News published in Agência FAPESP Newsletter about the scholarship: | |
| More itemsLess items | |
| TITULO | |
| Articles published in other media outlets ( ): | |
| More itemsLess items | |
| VEICULO: TITULO (DATA) | |
| VEICULO: TITULO (DATA) | |