Busca avançada
Ano de início
Entree


PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing

Texto completo
Autor(es):
Mostrar menos -
Amaral Orosco Pellicer, Lucas Francisco ; Pirozelli, Paulo ; Reali Costa, Anna Helena ; Inoue, Alexandre ; Pinheiro, V ; Gamallo, P ; Amaro, R ; Scarton, C ; Batista, F ; Silva, D ; Magro, C ; Pinto, H
Número total de Autores: 12
Tipo de documento: Artigo Científico
Fonte: COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022; v. 13208, p. 11-pg., 2022-01-01.
Resumo

Paraphrasing is a fundamental technique for many text applications. Typically, this task is performed through models that perform lexical and translation operations, which tend to present a trade-off between meaning preservation and diversity. In this paper, we present a transformer-based approach called PTT5-Paraphraser, a PPT5 model fine-tuned on TaPaCo, a large corpus of paraphrases. PTT5-Paraphraser achieves good results according to a number of metrics, showing a good compromise between diversity and fidelity to the original meaning. Two human evaluations are made to explore the paraphrases produced by our model: the first analyzes their quality in terms of preserving the meaning and diversity of sentences, while the second compares automatically generated paraphrases with human-made ones. Finally, we perform a classification task, which shows that datasets augmented with paraphrases can substantially increase the performance of classifiers. (AU)

Processo FAPESP: 19/07665-4 - Centro de Inteligência Artificial
Beneficiário:Fabio Gagliardi Cozman
Modalidade de apoio: Auxílio à Pesquisa - Programa eScience e Data Science - Centros de Pesquisa em Engenharia