In text mining, traditional text representation are based on the frequency of words in the documents. Although good results for automatic text classification can be achieved with the use of this bag-of-words representation, such representation model is not suitable for all classification problems and richer text representations can be required. The objective of this internship project is to develop a semantic text representation based on NASARI approach. NASARI is a concept representation used to measure semantic similarity with good results in word similarity and sense clustering tasks. It is based on knowledge from both WordNet and Wikipedia. Thus, this project aims to enhance document representation with the semantically rich NASARI concept representation. The proposed text representation will be evaluated in text classification tasks. It is expected that the use of a NASARI-based text representation will improve text classification performance. This project is closely related to the student's doctoral project in development at Universidade de São Paulo. The internship project will be developed at Sapienza - Università di Roma, under the supervision of professor Roberto Navigli, who is one of the authors of NASARI approach.
News published in Agência FAPESP Newsletter about the scholarship: