Sentence Similarity Recognition in Portuguese from Multiple Embedding Models

Rodrigues, Ana Carolina; Marcacini, Ricardo M.; Wani, MA; Kantardzic, M; Palade, V; Neagu, D; Yang, L; Chan, KY

Texto completo
Autor(es):	Rodrigues, Ana Carolina ; Marcacini, Ricardo M. ; Wani, MA ; Kantardzic, M ; Palade, V ; Neagu, D ; Yang, L ; Chan, KY Número total de Autores: 8
Tipo de documento:	Artigo Científico
Fonte:	2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA; v. N/A, p. 6-pg., 2022-01-01.
Resumo
Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score. (AU)

Processo FAPESP:	19/07665-4 - Centro de Inteligência Artificial
Beneficiário:	Fabio Gagliardi Cozman
Modalidade de apoio:	Auxílio à Pesquisa - Programa eScience e Data Science - Centros de Pesquisa em Engenharia

URL curto