Busca avançada
Ano de início
Entree


Fast and Scalable Outlier Detection with Metric Access Methods

Texto completo
Autor(es):
Bispo Junior, Altamir Gomes ; Ferreira Cordeiro, Robson Leonardo ; Rodrigues, JMF ; Cardoso, PJS ; Monteiro, J ; Lam, R ; Krzhizhanovskaya, VV ; Lees, MH ; Dongarra, JJ ; Sloot, PMA
Número total de Autores: 10
Tipo de documento: Artigo Científico
Fonte: COMPUTATIONAL SCIENCE - ICCS 2019, PT II; v. 11537, p. 15-pg., 2019-01-01.
Resumo

It is well-known that the existing theoretical models for outlier detection make assumptions that may not reflect the true nature of outliers in every real application. With that in mind, this paper describes an empirical study performed on unsupervised outlier detection using 8 algorithms from the state-of-the-art and 8 datasets that refer to a variety of real-world tasks of high impact, like spotting cyberattacks, clinical pathologies and abnormalities in nature. We present the lowdown on the results obtained, pointing out to the strengths and weaknesses of each technique from the application specialist's point of view, which is a shift from the designer-based point of view that is commonly considered. Interestingly, many of the techniques had unfeasibly high runtime requirements or failed to spot what the specialists consider as outliers in their own data. To tackle this issue, we propose MetricABOD: a novel angle-based outlier detection algorithm that makes the analysis up to thousands of times faster, still being in average 26% more accurate than the most accurate related work. This improvement is essential to enable outlier detection in many real-world applications for which the existing methods lead to unexpected results or unfeasible runtime requirements. Finally, we studied two real collections of text data to show that our MetricABOD works also for adimensional, purely metric data. (AU)

Processo FAPESP: 16/17078-0 - Mineração, indexação e visualização de Big Data no contexto de sistemas de apoio à decisão clínica (MIVisBD)
Beneficiário:Agma Juci Machado Traina
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 18/05714-5 - Mineração de Fluxos de Dados Frequentes e de Alta Dimensionalidade: estudo de caso em jogos digitais
Beneficiário:Robson Leonardo Ferreira Cordeiro
Modalidade de apoio: Auxílio à Pesquisa - Regular