Sentence Similarity Recognition in Portuguese from Multiple Embedding Models

Rodrigues, Ana Carolina; Marcacini, Ricardo M.; Wani, MA; Kantardzic, M; Palade, V; Neagu, D; Yang, L; Chan, KY

Full text
Author(s):	Rodrigues, Ana Carolina ; Marcacini, Ricardo M. ; Wani, MA ; Kantardzic, M ; Palade, V ; Neagu, D ; Yang, L ; Chan, KY Total Authors: 8
Document type:	Journal article
Source:	2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA; v. N/A, p. 6-pg., 2022-01-01.
Abstract
Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score. (AU)

FAPESP's process:	19/07665-4 - Center for Artificial Intelligence
Grantee:	Fabio Gagliardi Cozman
Support Opportunities:	Research Grants - Research Program in eScience and Data Science - Research Centers in Engineering Program

Short URL