Busca avançada
Ano de início
Entree


A resampling-based method to evaluate NLI models

Texto completo
Autor(es):
Salvatore, Felipe de Souza ; Finger, Marcelo ; Hirata Jr, Roberto ; Patriota, Alexandre G.
Número total de Autores: 4
Tipo de documento: Artigo Científico
Fonte: NATURAL LANGUAGE ENGINEERING; v. N/A, p. 28-pg., 2023-06-09.
Resumo

The recent progress of deep learning techniques has produced models capable of achieving high scores on traditional Natural Language Inference (NLI) datasets. To understand the generalization limits of these powerful models, an increasing number of adversarial evaluation schemes have appeared. These works use a similar evaluation method: they construct a new NLI test set based on sentences with known logic and semantic properties (the adversarial set), train a model on a benchmark NLI dataset, and evaluate it in the new set. Poor performance on the adversarial set is identified as a model limitation. The problem with this evaluation procedure is that it may only indicate a sampling problem. A machine learning model can perform poorly on a new test set because the text patterns presented in the adversarial set are not well represented in the training sample. To address this problem, we present a new evaluation method, the Invariance under Equivalence test (IE test). The IE test trains a model with sufficient adversarial examples and checks the model's performance on two equivalent datasets. As a case study, we apply the IE test to the state-of-the-art NLI models using synonym substitution as the form of adversarial examples. The experiment shows that, despite their high predictive power, these models usually produce different inference outputs for equivalent inputs, and, more importantly, this deficiency cannot be solved by adding adversarial observations in the training data. (AU)

Processo FAPESP: 15/24485-9 - Internet do futuro aplicada a cidades inteligentes
Beneficiário:Fabio Kon
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 14/12236-1 - AnImaLS: Anotação de Imagem em Larga Escala: o que máquinas e especialistas podem aprender interagindo?
Beneficiário:Alexandre Xavier Falcão
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 19/07665-4 - Centro de Inteligência Artificial
Beneficiário:Fabio Gagliardi Cozman
Modalidade de apoio: Auxílio à Pesquisa - Programa eScience e Data Science - Centros de Pesquisa em Engenharia
Processo FAPESP: 15/21880-4 - PROVERBS -- Sistemas Booleanos Probabilísticos Super-restritos: ferramentas de raciocínio e aplicações
Beneficiário:Marcelo Finger
Modalidade de apoio: Auxílio à Pesquisa - Regular
Processo FAPESP: 18/21934-5 - Estatística de redes: teoria, métodos e aplicações
Beneficiário:André Fujita
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 15/22308-2 - Representações intermediárias em Ciência Computacional para descoberta de conhecimento
Beneficiário:Roberto Marcondes Cesar Junior
Modalidade de apoio: Auxílio à Pesquisa - Temático