Advanced search
Start date
Betweenand

Linguistic analysis of generalization in human multi-document summarization

Grant number: 13/12629-0
Support type:Scholarships in Brazil - Scientific Initiation
Effective date (Start): September 01, 2013
Effective date (End): August 31, 2014
Field of knowledge:Linguistics, Literature and Arts - Linguistics
Principal Investigator:Ariani Di Felippo
Grantee:Marina Delege
Home Institution: Centro de Educação e Ciências Humanas (CECH). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil

Abstract

In order to produce a summary from a collection of texts from different sources with the same topic, humans commonly reduce the content of sources texts through condensation operations such as deletion, union, intersection, generalization, etc. The humans commonly rely on cutting and pasting of material from the source texts to linguistically express the condensed content and produce the summaries. The cut-and-paste or rewriting operations are, for example, (i) sentence reduction (ii) sentence combination, (iii) syntactic transformation, (iv) lexical paraphrase, and (v) reorganization. In this undergraduate research project, we will investigate the generalization in Human Multi-document Summarization (SHM), process by which a single summary is manually produced from a collection of texts, from different sources, with the same topic. We will investigate the 50 human multi-document summaries in Portuguese of the CSTNews corpus. In CSTNews, the summaries were manually aligned at the sentence level to their source texts based on content overlap. In 82 (8.1%) cases of alignments from the total of 1007, the content of the source texts was generalized to the summaries. Being part of the SUSTENTO (2012/13246-5 FAPESP / CNPq 483231/2012-6) project, which aims at generating linguistic knowledge for Automatic Multi-document Summarization (SAM) of Portuguese, this undergraduate research project aims at describing the rewriting operations involved in generalization and systematizing them to support SAM systems.