PTT5-Paraphraser: Diversity and Meaning Fidelity in Automatic Portuguese Paraphrasing

Amaral Orosco Pellicer, Lucas Francisco; Pirozelli, Paulo; Reali Costa, Anna Helena; Inoue, Alexandre; Pinheiro, V; Gamallo, P; Amaro, R; Scarton, C; Batista, F; Silva, D; Magro, C; Pinto, H

Full text
Author(s): Show less -	Amaral Orosco Pellicer, Lucas Francisco ; Pirozelli, Paulo ; Reali Costa, Anna Helena ; Inoue, Alexandre ; Pinheiro, V ; Gamallo, P ; Amaro, R ; Scarton, C ; Batista, F ; Silva, D ; Magro, C ; Pinto, H Total Authors: 12
Document type:	Journal article
Source:	COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022; v. 13208, p. 11-pg., 2022-01-01.
Abstract
Paraphrasing is a fundamental technique for many text applications. Typically, this task is performed through models that perform lexical and translation operations, which tend to present a trade-off between meaning preservation and diversity. In this paper, we present a transformer-based approach called PTT5-Paraphraser, a PPT5 model fine-tuned on TaPaCo, a large corpus of paraphrases. PTT5-Paraphraser achieves good results according to a number of metrics, showing a good compromise between diversity and fidelity to the original meaning. Two human evaluations are made to explore the paraphrases produced by our model: the first analyzes their quality in terms of preserving the meaning and diversity of sentences, while the second compares automatically generated paraphrases with human-made ones. Finally, we perform a classification task, which shows that datasets augmented with paraphrases can substantially increase the performance of classifiers. (AU)

FAPESP's process:	19/07665-4 - Center for Artificial Intelligence
Grantee:	Fabio Gagliardi Cozman
Support Opportunities:	Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program

Short URL