Advanced search
Start date
Betweenand

Knowledge-based summary revision: turning multi-document extracts into abstracts

Grant number: 15/01450-5
Support type:Scholarships abroad - Research
Effective date (Start): September 15, 2015
Effective date (End): September 14, 2016
Field of knowledge:Linguistics, Literature and Arts - Linguistics - Linguistic Theory and Analysis
Principal Investigator:Ariani Di Felippo
Grantee:Ariani Di Felippo
Host: Ani Nenkova
Home Institution: Centro de Educação e Ciências Humanas (CECH). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil
Local de pesquisa : University of Pennsylvania, United States  

Abstract

Given the large amount of information available in several languages, especially on-line, Multi-Document Summarization (MDS) has become an important tool for managing information overload. With origins in the mid-1990, MDS is a Natural Language Processing (NLP) subarea, which aims at automatically producing a unique summary from a group of texts on the same topic. For Portuguese, researches in this area started only in 2011, but the methods/systems have achieved and, in some cases, exceeded the state-of-art performances in MDS. In general, the MDS applications produce extracts, i.e., summaries composed with a selection of sentences (or phrases, paragraphs, etc.) from the original texts. Even though the extractive methods have evolved, the extracts still have problems of informativeness and linguistic quality. Based on that, we propose a corpus-based investigation for delimitating strategies of abstraction or rewriting to automatic summaries revision. The rewriting rules will be proposed to generalize and specialize the content of extracts in a post-editing process. The generalization and specialization are condensation operations very extensive in scope (inter- and intra-sentential), level (lexical, syntagmatic, clausal, and sentential) and linguistic mechanisms (e.g., lexical substitution or insertion, syntactic transformations, etc.). Considering its aim, this project represents a shift away from the purely extractive MDS to partially abstractive one, generating more natural, informative, and with better linguistic quality automatic summaries. (AU)