Advanced search
Start date
Betweenand


Methods for Automatic Discourse Analysis

Full text
Author(s):
Thiago Alexandre Salgueiro Pardo
Total Authors: 1
Document type: Doctoral Thesis
Press: São Carlos. , ilustrações, tabelas.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
Maria das Graças Volpe Nunes; Sandra Maria Aluisio; Ariadne Maria Brito Rizzoni Carvalho; Rita Maria da Silva Julia; Celso Antônio Alves Kaestner
Advisor: Maria das Graças Volpe Nunes
Field of knowledge: Physical Sciences and Mathematics - Computer Science
Indexed in: Banco de Dados Bibliográficos da USP-DEDALUS; Biblioteca Digital de Teses e Dissertações - USP
Location: Universidade de São Paulo. Instituto de Ciências Matemáticas e de Computação. Biblioteca Prof. Achille Bassi; ICMSC/T; P226ma
Abstract

Researches in Linguistics and Computational Linguistics have shown that a text is more than a simple sequence of juxtaposed sentences. Every text contains a highly elaborated underlying structure that relates its content, attributing coherence to the text. This structure is called discourse structure and is the object of study in the research area known as Discourse Analysis. Given the usefulness of this kind of knowledge for several Natural Language Processing tasks, e.g., automatic text summarization and anaphora resolution, automatic discourse analysis became a very important research topic. For Brazilian Portuguese, in particular, there are few resources and researches about it. In this scenario, this thesis aims at investigating, developing and implementing methods for automatic discourse analysis, following the Rhetorical Structure Theory mainly, one of the most used discourse theories nowadays. Based on the rhetorical annotation and analysis of a corpus of scientific texts from Computers domain, the first rhetorical analyzer for Brazilian Portuguese, called DiZer (DIscourse analyZER), was produced, together with a big amount of discourse knowledge. Novel statistical models for detecting discourse relations are presented, based on content units of increasing complexity, namely, words, concepts and argument structures. About the latter, a model for unsupervised learning of verb argument structures is presented, being applied to the 1.500 most frequent English verbs, resulting in a repository called ArgBank. DiZer and the proposed models are evaluated, producing satisfactory results. (AU)