Advanced search
Start date
Betweenand

Word features for term extraction in online forums

Grant number: 12/09375-4
Support Opportunities:Scholarships abroad - Research Internship - Doctorate
Start date: September 01, 2012
End date: February 28, 2013
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:Solange Oliveira Rezende
Grantee:Merley da Silva Conrado
Supervisor: Marilyn Walker
Host Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Institution abroad: University of California, Santa Cruz (UC Santa Cruz), United States  
Associated to the scholarship:09/16142-3 - Hybrid model of term extraction applied in text mining, BP.DR

Abstract

The Text Mining process (MT) has been used to transform, automatically or semi-automatically, large textual databases into useful knowledge. One of the most important steps in this process is term extraction. Drawing on previous work, the proposed project aims to explore and evaluate measures and features that can be used for term extraction in online forums, i.e. user generated content in social media discussions about social and political issues. The approach that we will use takes as input a textual database. Then, for each candidate term of the database, we will automatically calculate values for selected features and measures such as frequency, TF-IDF, C-value, NC-value, nominal phrase and POS. Additionally, the measures proposed by Prof. Dra. Marilyn A. Walker and her colleagues will be investigated and possibly will be used or adapted to this purpose, such as Syntactic Dependency, Opinion Generalized Dependency and Context Features. Once the features are extracted, the values of each candidate will be submitted to classifiers in order to identify candidates that are actually terms of the database domain. Furthermore, we intend to identify which of these measures, features, or subsets of them, improve performance the most for our term extraction task. This research will contribute to the research of the candidate, which is the term extraction, as well as to contribute to Professor Walker's research, which focuses on the identification of persuasion in texts from online forums and debates, where we hypothesize that term extraction could be helpful in identifying the overall position of a debate participant (stance) as well as in recognizing the different sub-arguments or propositions (argument facets) related to arguments for or against a particular topic. For example, in debates on capital punishment, subarguments consist of arguments about whether capital punishment is a deterrent to crime, whether it is moral, whether the cost of executing someone is worthwhile as opposed to life in prison, whether capital punishment is applied differentially resulting in racial discrimination, etc. We expect that terms that are extracted can be associated with different argument facets, thus extracting terms in this domain will lead to insights that may improve on our current results for stance and argument identification and as well as current results overall for term extraction in the field of NLP. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)