| Grant number: | 11/12823-6 |
| Support Opportunities: | Scholarships in Brazil - Doctorate |
| Start date: | October 01, 2011 |
| End date: | September 30, 2015 |
| Field of knowledge: | Physical Sciences and Mathematics - Computer Science |
| Principal Investigator: | Solange Oliveira Rezende |
| Grantee: | Rafael Geraldeli Rossi |
| Host Institution: | Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil |
Abstract Due to the large amount of textual document collections available today, there is a need to develop techniques for automatic knowledge extraction and organization of these collections. Normally, documents are represented in a vector space model, in which each document is represented by a vector, and each position of this vector corresponds to a feature of the document, for example, the frequency of a word. The methods for pattern extraction using this form of representation assume that the documents in a collection as well as their characteristics are independent. Entretanto, this can lead to erroneous results. Trying to avoid this error, there are representations that model the textual documents through networks. However, in this type of representation, the traditional algorithms consider that the network are compounded by objects of the same type, as well as their relations, i.e., networks are homogeneous. This limitation can be overcome. To do this, text can be represented by heterogeneous networks, i.e., documents can be represented considering different types of objects, as the document terms or authors. Different types of relationships among these objects can also be represented. However, the use of relationships between objects of same type in a heterogeneous network is unusual. Our hypothesis is that this kind of relationship can also help the pattern extract. To prove this hypothesis, in this PhD project is proposed a representation of textual document collections using heterogeneous networks, in which an study about what are the ways to relate objects of the same type in a heterogeneous network that can produce better results for classification tasks and clustering of textual documents will be carried out. Algorithms will be adapted or developed for the extraction using the proposed representation. (AU) | |
| News published in Agência FAPESP Newsletter about the scholarship: | |
| More itemsLess items | |
| TITULO | |
| Articles published in other media outlets ( ): | |
| More itemsLess items | |
| VEICULO: TITULO (DATA) | |
| VEICULO: TITULO (DATA) | |