| Full text | |
| Author(s): Show less - |
Amaral Orosco Pellicer, Lucas Francisco
;
Pirozelli, Paulo
;
Reali Costa, Anna Helena
;
Inoue, Alexandre
;
Pinheiro, V
;
Gamallo, P
;
Amaro, R
;
Scarton, C
;
Batista, F
;
Silva, D
;
Magro, C
;
Pinto, H
Total Authors: 12
|
| Document type: | Journal article |
| Source: | COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022; v. 13208, p. 11-pg., 2022-01-01. |
| Abstract | |
Paraphrasing is a fundamental technique for many text applications. Typically, this task is performed through models that perform lexical and translation operations, which tend to present a trade-off between meaning preservation and diversity. In this paper, we present a transformer-based approach called PTT5-Paraphraser, a PPT5 model fine-tuned on TaPaCo, a large corpus of paraphrases. PTT5-Paraphraser achieves good results according to a number of metrics, showing a good compromise between diversity and fidelity to the original meaning. Two human evaluations are made to explore the paraphrases produced by our model: the first analyzes their quality in terms of preserving the meaning and diversity of sentences, while the second compares automatically generated paraphrases with human-made ones. Finally, we perform a classification task, which shows that datasets augmented with paraphrases can substantially increase the performance of classifiers. (AU) | |
| FAPESP's process: | 19/07665-4 - Center for Artificial Intelligence |
| Grantee: | Fabio Gagliardi Cozman |
| Support Opportunities: | Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program |