Advanced search
Start date
Betweenand


PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing

Full text
Author(s):
Show less -
Amaral Orosco Pellicer, Lucas Francisco ; Pirozelli, Paulo ; Reali Costa, Anna Helena ; Inoue, Alexandre ; Pinheiro, V ; Gamallo, P ; Amaro, R ; Scarton, C ; Batista, F ; Silva, D ; Magro, C ; Pinto, H
Total Authors: 12
Document type: Journal article
Source: COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022; v. 13208, p. 11-pg., 2022-01-01.
Abstract

Paraphrasing is a fundamental technique for many text applications. Typically, this task is performed through models that perform lexical and translation operations, which tend to present a trade-off between meaning preservation and diversity. In this paper, we present a transformer-based approach called PTT5-Paraphraser, a PPT5 model fine-tuned on TaPaCo, a large corpus of paraphrases. PTT5-Paraphraser achieves good results according to a number of metrics, showing a good compromise between diversity and fidelity to the original meaning. Two human evaluations are made to explore the paraphrases produced by our model: the first analyzes their quality in terms of preserving the meaning and diversity of sentences, while the second compares automatically generated paraphrases with human-made ones. Finally, we perform a classification task, which shows that datasets augmented with paraphrases can substantially increase the performance of classifiers. (AU)

FAPESP's process: 19/07665-4 - Center for Artificial Intelligence
Grantee:Fabio Gagliardi Cozman
Support Opportunities: Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program