Advanced search
Start date
Betweenand


Fast and Scalable Outlier Detection with Metric Access Methods

Full text
Author(s):
Bispo Junior, Altamir Gomes ; Ferreira Cordeiro, Robson Leonardo ; Rodrigues, JMF ; Cardoso, PJS ; Monteiro, J ; Lam, R ; Krzhizhanovskaya, VV ; Lees, MH ; Dongarra, JJ ; Sloot, PMA
Total Authors: 10
Document type: Journal article
Source: COMPUTATIONAL SCIENCE - ICCS 2019, PT II; v. 11537, p. 15-pg., 2019-01-01.
Abstract

It is well-known that the existing theoretical models for outlier detection make assumptions that may not reflect the true nature of outliers in every real application. With that in mind, this paper describes an empirical study performed on unsupervised outlier detection using 8 algorithms from the state-of-the-art and 8 datasets that refer to a variety of real-world tasks of high impact, like spotting cyberattacks, clinical pathologies and abnormalities in nature. We present the lowdown on the results obtained, pointing out to the strengths and weaknesses of each technique from the application specialist's point of view, which is a shift from the designer-based point of view that is commonly considered. Interestingly, many of the techniques had unfeasibly high runtime requirements or failed to spot what the specialists consider as outliers in their own data. To tackle this issue, we propose MetricABOD: a novel angle-based outlier detection algorithm that makes the analysis up to thousands of times faster, still being in average 26% more accurate than the most accurate related work. This improvement is essential to enable outlier detection in many real-world applications for which the existing methods lead to unexpected results or unfeasible runtime requirements. Finally, we studied two real collections of text data to show that our MetricABOD works also for adimensional, purely metric data. (AU)

FAPESP's process: 16/17078-0 - Mining, indexing and visualizing Big Data in clinical decision support systems (MIVisBD)
Grantee:Agma Juci Machado Traina
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 18/05714-5 - Mining Frequent Data Streams of High Dimensionality with a Case Study in Digital Games
Grantee:Robson Leonardo Ferreira Cordeiro
Support Opportunities: Regular Research Grants