Advanced search
Start date
Betweenand


Multilingual Extractive Summarization: Investigating State-of-the-Art Methods for English and Brazilian Portuguese

Full text
Author(s):
Jorge, Germano Antonio Zani ; Bezerra, Davi Alves ; Xavier, Clarissa Castella ; Pardo, Thiago Alexandre Salgueiro
Total Authors: 4
Document type: Journal article
Source: INTELLIGENT SYSTEMS, BRACIS 2024, PT II; v. 15413, p. 12-pg., 2025-01-01.
Abstract

Automatic Text Summarization (ATS) is a Natural Language Processing (NLP) task essential for handling large volumes of information. ATS can be classified into two main types: extractive and abstractive. Extractive summarization selects sentences or phrases directly from the source text(s), while abstractive summarization generates new sentences that try to capture the original meaning of the source text(s). This paper describes our efforts to perform extractive single-document summarization in multilingual contexts. Although various summarization methods, such as PreSumm and HiStruct+, have shown promising results on English corpora like CNN/DM, there is a significant gap in applying these methods to other languages, especially Brazilian Portuguese. Additionally, these summarizers were evaluated with traditional metrics like ROUGE, which has limitations as it primarily measures superficial text overlap. To fill these gaps, we evaluate the effectiveness of these state-of-the-art methods on the CSTNews corpus (with news texts in Brazilian Portuguese) employing ROUGE and the recent BLANC metric, which measures how much the generated summary aids a pre-trained language model (like BERT) in understanding the document. Our contributions include the results and comparison of adapted models, the discussion of the BLANC metric in contrast to ROUGE, and the expansion of resources available to the Portuguese and multilingual NLP community. (AU)

FAPESP's process: 19/07665-4 - Center for Artificial Intelligence
Grantee:Fabio Gagliardi Cozman
Support Opportunities: Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program