Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Comparing Hard and Overlapping Clusterings

Autor(es):
Horta, Danilo [1] ; Campello, Ricardo J. G. B. [1]
Número total de Autores: 2
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, Inst Ciencias Matemat & Computacao, Campus Sao Carlos Caixa Postal 668, BR-13560970 Sao Carlos, SP - Brazil
Número total de Afiliações: 1
Tipo de documento: Artigo Científico
Fonte: JOURNAL OF MACHINE LEARNING RESEARCH; v. 16, p. 2949-2997, DEC 2015.
Citações Web of Science: 3
Resumo

Similarity measures for comparing clusterings is an important component, e.g., of evaluating clustering algorithms, for consensus clustering, and for clustering stability assessment. These measures have been studied for over 40 years in the domain of exclusive hard clusterings (exhaustive and mutually exclusive object sets). In the past years, the literature has proposed measures to handle more general clusterings (e.g., fuzzy/probabilistic clusterings). This paper provides an overview of these new measures and discusses their drawbacks. We ultimately develop a corrected-for-chance measure (13AGRI) capable of comparing exclusive hard, fuzzy/probabilistic, non-exclusive hard, and possibilistic clusterings. We prove that 13AGRI and the adjusted Rand index (ARI, by Hubert and Arabie) are equivalent in the exclusive hard domain. The reported experiments show that only 13AGRI could provide both a fine-grained evaluation across clusterings with different numbers of clusters and a constant evaluation between random clusterings, showing all the four desirable properties considered here. We identified a high correlation between 13AGRI applied to fuzzy clusterings and ARI applied to hard exclusive clusterings over 14 real data sets from the UCI repository, which corroborates the validity of 13AGRI fuzzy clustering evaluation. 13AGRI also showed good results as a clustering stability statistic for solutions produced by the expectation maximization algorithm for Gaussian mixture. Implementation and supplementary figures can be found at http : //sn. im/25a9h8u. (AU)

Processo FAPESP: 09/17469-6 - Abordagens de Agrupamento de Dados Baseadas em Subespaços e Semi-Supervisão
Beneficiário:Danilo Horta
Modalidade de apoio: Bolsas no Brasil - Doutorado
Processo FAPESP: 13/18698-4 - Métodos e algoritmos em aprendizado de máquina não supervisionado e semi-supervisionado
Beneficiário:Ricardo José Gabrielli Barreto Campello
Modalidade de apoio: Auxílio à Pesquisa - Regular