Advanced search
Start date

Evaluation, model selection and unsupervised outlier detection in data spaces and subspaces

Grant number: 15/06019-0
Support type:Scholarships in Brazil - Doctorate
Effective date (Start): July 01, 2015
Effective date (End): April 01, 2019
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Cooperation agreement: Coordination of Improvement of Higher Education Personnel (CAPES)
Principal researcher:Ricardo José Gabrielli Barreto Campello
Grantee:Henrique Oliveira Marques
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil
Associated scholarship(s):17/04161-0 - Evaluation, Model Selection and Unsupervised Outlier Detection in Subspaces, BE.EP.DR


Outlier detection plays an important role in the pattern discovery from data that can be considered exceptional in some sense. Detecting such patterns is relevant in general because in many data mining applications, such patterns represent extraordinary behaviors that is worth further analysis. An important distinction is that between the supervised and unsupervised techniques. In this project we focus on unsupervised outlier detection techniques. There are dozens of algorithms of this category in literature, however, each of these algorithms uses its own intuition to judge what should be considered an outlier or not, which naturally is a subjective concept. This substantially complicates the selection of a particular algorithm and also the choice of an appropriate configuration of parameters for a given algorithm in a practical application. This also makes it highly complex to evaluate the quality of the solution obtained by an algorithm or configuration adopted by the analyst, especially in light of the problem of defining a measure of quality that is not hooked on the criterion used by the algorithm itself. These issues are interrelated and refer respectively to the problems of model selection and evaluation (or validation) of results in unsupervised learning. These problems have been investigated for decades in the area of unsupervised data clustering, but only in the candidate's master a pioneer internal and relative measure for unsupervised evaluation of binary outlier detection solutions, called IREOS (Internal, Relative Evaluation of Outlier Solutions), was proposed. Although the measure represents an important step forward in the state-of-the-art in this area, measures for solutions that, instead of labels, provide scorings to the observations (that is the type of solution produced by the vast majority of well-known unsupervised outlier detection algorithms) and for solutions of outliers detected in subspaces (that, due to high dimensionality problem, is an area that has recently received considerable attention) are still notorious problems in the area. The IREOS extension for evaluation of results produced by both category of outlier detection algorithms, as well as improvements and applications that go beyond the evaluation and model selection, such as the automatic determination of the number of outliers in the dataset, represent the main objectives that this research project proposes to investigate. Also, as a second objective, we intend to investigate whether original principles used in the development of IREOS index can be adapted to the development of new outlier detection algorithms, particularly in the context of subspaces. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
MARQUES, HENRIQUE O.; CAMPELLO, RICARDO J. G. B.; SANDER, JORG; ZIMEK, ARTHUR. Internal Evaluation of Unsupervised Outlier Detection. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, v. 14, n. 4 JUL 2020. Web of Science Citations: 0.
Academic Publications
(References retrieved automatically from State of São Paulo Research Institutions)
MARQUES, Henrique Oliveira. Evaluation and model selection for unsupervised outlier detection and one-class classification. 2019. Doctoral Thesis - Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB) São Carlos.

Please report errors in scientific publications list by writing to: