Busca avançada
Ano de início
Entree


Multilingual Extractive Summarization: Investigating State-of-the-Art Methods for English and Brazilian Portuguese

Texto completo
Autor(es):
Jorge, Germano Antonio Zani ; Bezerra, Davi Alves ; Xavier, Clarissa Castella ; Pardo, Thiago Alexandre Salgueiro
Número total de Autores: 4
Tipo de documento: Artigo Científico
Fonte: INTELLIGENT SYSTEMS, BRACIS 2024, PT II; v. 15413, p. 12-pg., 2025-01-01.
Resumo

Automatic Text Summarization (ATS) is a Natural Language Processing (NLP) task essential for handling large volumes of information. ATS can be classified into two main types: extractive and abstractive. Extractive summarization selects sentences or phrases directly from the source text(s), while abstractive summarization generates new sentences that try to capture the original meaning of the source text(s). This paper describes our efforts to perform extractive single-document summarization in multilingual contexts. Although various summarization methods, such as PreSumm and HiStruct+, have shown promising results on English corpora like CNN/DM, there is a significant gap in applying these methods to other languages, especially Brazilian Portuguese. Additionally, these summarizers were evaluated with traditional metrics like ROUGE, which has limitations as it primarily measures superficial text overlap. To fill these gaps, we evaluate the effectiveness of these state-of-the-art methods on the CSTNews corpus (with news texts in Brazilian Portuguese) employing ROUGE and the recent BLANC metric, which measures how much the generated summary aids a pre-trained language model (like BERT) in understanding the document. Our contributions include the results and comparison of adapted models, the discussion of the BLANC metric in contrast to ROUGE, and the expansion of resources available to the Portuguese and multilingual NLP community. (AU)

Processo FAPESP: 19/07665-4 - Centro de Inteligência Artificial
Beneficiário:Fabio Gagliardi Cozman
Modalidade de apoio: Auxílio à Pesquisa - Programa eScience e Data Science - Centros de Pesquisa em Engenharia