Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Distance assessment and analysis of high-dimensional samples using variational autoencoders

Texto completo
Autor(es):
Inacio, Marco [1, 2, 3] ; Izbicki, Rafael [2] ; Gyires-Toth, Balint [3]
Número total de Autores: 3
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, Sao Paulo - Brazil
[2] Univ Fed Sao Carlos, Sao Carlos - Brazil
[3] Budapest Univ Technol & Econ, Budapest - Hungary
Número total de Afiliações: 3
Tipo de documento: Artigo Científico
Fonte: INFORMATION SCIENCES; v. 557, p. 407-420, MAY 2021.
Citações Web of Science: 0
Resumo

An important question in many machine learning applications is whether two samples arise from the same generating distribution. Although an old topic in Statistics, simple accept/reject decisions given by most hypothesis tests are often not enough: it is well known that the rejection of the null hypothesis does not imply that differences between the two groups are meaningful from a practical perspective. In this work, we present a novel nonparametric approach to visually assess the dissimilarity between the datasets that goes beyond two-sample testing. The key idea of our approach is to measure the distance between two (possibly) high-dimensional datasets using variational autoencoders. We also show how this framework can be used to create a formal statistical test to test the hypothesis that both samples arise from the same distribution. We evaluate both the distance measurement and hypothesis testing approaches on simulated and real world datasets. The results show that our approach is useful for data exploration (as it, for instance, allows for quantification of the discrepancy/separability between categories of images), which can be particularly helpful in early phases of the a machine learning pipeline. (C) 2020 The Author(s). Published by Elsevier Inc. (AU)

Processo FAPESP: 19/11321-9 - Redes neurais em problemas de inferência estatística
Beneficiário:Rafael Izbicki
Modalidade de apoio: Auxílio à Pesquisa - Regular
Processo FAPESP: 17/03363-8 - Interpretabilidade e eficiência em testes de hipótese
Beneficiário:Rafael Izbicki
Modalidade de apoio: Auxílio à Pesquisa - Regular