Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

Distance assessment and analysis of high-dimensional samples using variational autoencoders

Full text
Author(s):
Inacio, Marco [1, 2, 3] ; Izbicki, Rafael [2] ; Gyires-Toth, Balint [3]
Total Authors: 3
Affiliation:
[1] Univ Sao Paulo, Sao Paulo - Brazil
[2] Univ Fed Sao Carlos, Sao Carlos - Brazil
[3] Budapest Univ Technol & Econ, Budapest - Hungary
Total Affiliations: 3
Document type: Journal article
Source: INFORMATION SCIENCES; v. 557, p. 407-420, MAY 2021.
Web of Science Citations: 0
Abstract

An important question in many machine learning applications is whether two samples arise from the same generating distribution. Although an old topic in Statistics, simple accept/reject decisions given by most hypothesis tests are often not enough: it is well known that the rejection of the null hypothesis does not imply that differences between the two groups are meaningful from a practical perspective. In this work, we present a novel nonparametric approach to visually assess the dissimilarity between the datasets that goes beyond two-sample testing. The key idea of our approach is to measure the distance between two (possibly) high-dimensional datasets using variational autoencoders. We also show how this framework can be used to create a formal statistical test to test the hypothesis that both samples arise from the same distribution. We evaluate both the distance measurement and hypothesis testing approaches on simulated and real world datasets. The results show that our approach is useful for data exploration (as it, for instance, allows for quantification of the discrepancy/separability between categories of images), which can be particularly helpful in early phases of the a machine learning pipeline. (C) 2020 The Author(s). Published by Elsevier Inc. (AU)

FAPESP's process: 19/11321-9 - Neural networks in statistical inference problems
Grantee:Rafael Izbicki
Support Opportunities: Regular Research Grants
FAPESP's process: 17/03363-8 - Interpretability and efficiency in hypothesis tests
Grantee:Rafael Izbicki
Support Opportunities: Regular Research Grants