Advanced search
Start date
Betweenand

Generation of linguistic knowledge for multi-document summarization

Grant number: 12/13246-5
Support type:Regular Research Grants
Duration: October 01, 2012 - September 30, 2014
Field of knowledge:Linguistics, Literature and Arts - Linguistics
Principal Investigator:Ariani Di Felippo
Grantee:Ariani Di Felippo
Home Institution: Centro de Educação e Ciências Humanas (CECH). Universidade Federal de São Carlos (UFSCAR). São Carlos , SP, Brazil

Abstract

Given the large amount of information available in several languages, especially on-line, Multi-Document Summarization (MDS) has become an important tool for managing information overload. With origins in the mid-1990, MDS is a Natural Language Processing (PLN) subarea which aims at automatically producing a unique summary from a group of texts on the same topic. For Brazilian Portuguese (BP) language, researches in this area started only in the last years, but the methods/systems have achieved and, in some cases, exceeded the state-of-art performances in MDS. Even with so promising scenario, SAM, in general, does not rely on linguistic subsidies to simulate more accurately the human task. Based on that, this proposal aims to generate linguistic knowledge to advance the state-of-the art in SAM, mainly involving PB. Specifically, the goal is to investigate 3 main correlated research fronts that may advance the state-of-the art: (i) linguistic characterization of multi-document summaries produced by humans; (ii) investigation of the multi-document phenomena such as redundancy, contradiction, and others, and (iii) description and formal representation of semantic-conceptual knowledge. The work fronts (i) and (ii) are justified by the fact that the SAM, unlike mono-document, is based exclusively on clues regarding the human (multi-document) summarization and superficial studies on its phenomena. The work front (iii) is justified by the fact the methods for BP summarization may be enriched or totally based on that kind of knowledge. Considering the description and formalization of linguistic knowledge to be generated by the 3 work fronts, this project has potential to achieve significant contributions in the MDS and Descriptive Linguistics. Also, it would result in training and qualification of human resources in the NLP research field, which is still very small in Brazil. (AU)