Advanced search
Start date
Betweenand


Automatic identification of multidocument relations

Full text
Author(s):
Erick Galani Maziero
Total Authors: 1
Document type: Master's Dissertation
Press: São Carlos.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
Thiago Alexandre Salgueiro Pardo; André Carlos Ponce de Leon Ferreira de Carvalho; Ariani Di Felippo
Advisor: Thiago Alexandre Salgueiro Pardo
Abstract

The multi-document treatment is essential in the current scenario of electronic media, in which many documents are produced about a same topic, mainly when considering the explosion of information allowed by the web. Both readers and computational applications are benefited by the discursive multi-document analysis, through which the relations (for example, equivalence, contradiction or background relations) among the portions of text are showed. In order to achieve the automatic multi-document treatment, the CST (Cross-document Structure Theory, Radev, 2000) is adopted in this work. This kind of knowledge allow (i) the appropriated treatment of phenomena like redundancy, complementarity and contradiction of information and, consequently, (ii) the production of better systems of text processing, as more intelligent web searchers and automatic summarizers. In this work, a methodology to identify these relations is presented exploring techniques of machine learning of the traditional and hierarchical paradigm. For relations with low frequency in the corpus, handcrafted rules were developed. Finally, a parser is generated containing classifiers and rules (AU)

FAPESP's process: 09/12256-4 - Multidocument relations parsing and visualization for Brazilian Portuguese
Grantee:Erick Galani Maziero
Support Opportunities: Scholarships in Brazil - Master