Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

Texto completo
Autor(es):
Campos, Guilherme O. [1] ; Zimek, Arthur [2] ; Sander, Jorg [3] ; Campello, Ricardo J. G. B. [1] ; Micenkova, Barbora [4] ; Schubert, Erich [2] ; Assent, Ira [4] ; Houle, Michael E. [5]
Número total de Autores: 8
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, SCC ICMC USP, CP 668, BR-13566590 Sao Carlos, SP - Brazil
[2] Univ Munich, D-80538 Munich - Germany
[3] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8 - Canada
[4] Aarhus Univ, Dept Comp Sci, Aabogade 34, DK-8200 Aarhus - Denmark
[5] Natl Inst Informat, Chiyoda Ku, 2-1-2 Hitotsubashi, Tokyo 1018430 - Japan
Número total de Afiliações: 5
Tipo de documento: Artigo Científico
Fonte: DATA MINING AND KNOWLEDGE DISCOVERY; v. 30, n. 4, p. 891-927, JUL 2016.
Citações Web of Science: 71
Resumo

The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results. (AU)

Processo FAPESP: 13/18698-4 - Métodos e algoritmos em aprendizado de máquina não supervisionado e semi-supervisionado
Beneficiário:Ricardo José Gabrielli Barreto Campello
Modalidade de apoio: Auxílio à Pesquisa - Regular