VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil

Carolina Evaristo Scarton

Full text
Author(s):	Carolina Evaristo Scarton Total Authors: 1
Document type:	Master's Dissertation
Press:	São Carlos.
Institution:	Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:	2013-01-28
Examining board members:	Sandra Maria Aluisio; Maria José Bocorny Finatto; Thiago Alexandre Salgueiro Pardo
Advisor:	Sandra Maria Aluisio
Abstract
Building computational-linguistic base resources, like computational lexical resources (CLR), is one of the goals of Natural Language Processing (NLP). However, most computational lexicons are specific to English. One of the resources already developed for English is the VerbNet, a lexicon with domain-independent semantic and syntactic information of English verbs. It is based on Levin\'s verb classification, with mappings to Princeton\'s WordNet (WordNet). Since only a few computational studies for languages other than English have been made about Levin\'s classification, and given the lack of a Portuguese CLR similar to VerbNet, the goal of this research was to create a CLR for Brazilian Portuguese (called VerbNet.Br). The manual building of these resources is usually unfeasible because it is time consuming and it can include many human-made errors. Therefore, great efforts have been made to build such resources with the aid of computational techniques. One of these techniques is machine learning, a widely known and used method for extracting linguistic information from corpora. Another one is the use of pre-existing resources for other languages, most commonly English, to support the building of new aligned resources, taking advantage of some multilingual/cross-linguistic features (like the ones in Levin\'s verb classification). The method proposed here for the creation of VerbNet.Br is generic, therefore it may be used to build similar resources for languages other than Brazilian Portuguese. Moreover, the proposed method also allows for a future extension of the resource via subclasses of concepts. The VerbNet.Br has a four-step method: three automatic and one manual. However, experiments were also carried out without the manual step, which can be discarded without affecting precision and recall. The evaluation of the resource was intrinsic, both qualitative and quantitative. The qualitative evaluation consisted in: (a) manual analysis of some VerbNet classes, resulting in a Brazilian Portuguese gold standard; (b) comparison of this gold standard with the VerbNet.Br results, presenting promising results (almost 60% of f-measure); and (c), comparison of the VerbNet.Br results to verb clustering results, showing that both methods achieved similar results. The quantitative evaluation considered the acceptance rate of candidate members of VerbNet.Br, showing results around 90% of acceptance. One of the contributions of this research is to present the first version of VerbNet.Br. Although it still requires linguistic validation, it already provides information to be used in NLP tasks, with precision and recall of 44% and 92.89%, respectively (AU)

FAPESP's process:	10/03785-0 - VerbNet.Br: semiautomatic building of an online and domain-independent verb lexicon for the Brazilian Portuguese Language
Grantee:	Carolina Evaristo Scarton
Support Opportunities:	Scholarships in Brazil - Master

Short URL