Busca avançada
Ano de início
Entree


Fast and Scalable Outlier Detection with Sorted Hypercubes

Texto completo
Autor(es):
Cabral, Eugenio F. ; Cordeiro, Robson L. F. ; ASSOC COMP MACHINERY
Número total de Autores: 3
Tipo de documento: Artigo Científico
Fonte: CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT; v. N/A, p. 10-pg., 2020-01-01.
Resumo

Outlier detection is the task responsible for finding novel or rare phenomena that provide valuable insights in many areas of the industry. The neighborhood-based algorithms are largely used to tackle this problem due to the intuitive interpretation and wide applicability in different domains. Their major drawback is the intensive neighborhood search that takes hours or even days to complete in large data, thus being impractical in many real-world scenarios. This paper proposes HySortOD - a novel algorithm that uses an efficient hypercube-ordering-and-searching strategy for fast outlier detection. Its main focus is the analysis of data with many instances and a low-to-moderate number of dimensions. We performed comprehensive experiments using real data with up to similar to 500k instances and similar to 120 dimensions, where our new algorithm outperformed 7 state-of-the-art competitors in runtime, being up to 4 orders of magnitude faster in large data. Specifically, 12 well-known benchmark datasets were deeply investigated and one case study in the crucial task of breast cancer detection was also performed to demonstrate that our approach can be successfully used as an out-of-the-box solution for real-world, non-benchmark problems. Based on our experiments, we also identified default parameter values that allow us to be parameter-free and yet report high-quality results. (AU)

Processo FAPESP: 18/05714-5 - Mineração de Fluxos de Dados Frequentes e de Alta Dimensionalidade: estudo de caso em jogos digitais
Beneficiário:Robson Leonardo Ferreira Cordeiro
Modalidade de apoio: Auxílio à Pesquisa - Regular
Processo FAPESP: 16/17078-0 - Mineração, indexação e visualização de Big Data no contexto de sistemas de apoio à decisão clínica (MIVisBD)
Beneficiário:Agma Juci Machado Traina
Modalidade de apoio: Auxílio à Pesquisa - Temático