Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

A non-parametric method to estimate the number of clusters

Full text
Author(s):
Fujita, Andre [1] ; Takahashi, Daniel Y. [2, 3] ; Patriota, Alexandre G. [4]
Total Authors: 3
Affiliation:
[1] Univ Sao Paulo, Inst Math & Stat, Dept Comp Sci, BR-05508 Sao Paulo - Brazil
[2] Princeton Univ, Dept Psychol, Princeton, NJ 08544 - USA
[3] Princeton Univ, Inst Neurosci, Princeton, NJ 08544 - USA
[4] Univ Sao Paulo, Inst Math & Stat, Dept Stat, BR-05508 Sao Paulo - Brazil
Total Affiliations: 4
Document type: Journal article
Source: COMPUTATIONAL STATISTICS & DATA ANALYSIS; v. 73, p. 27-39, MAY 2014.
Web of Science Citations: 24
Abstract

An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm and identifies the maximum number of groups that breaks down the structure of the dataset. Intensive Monte Carlo simulation studies show that the slope statistic outperforms (for the considered examples) some popular methods that have been proposed in the literature. Applications in graph clustering, in iris and breast cancer datasets are shown. (C) 2013 Elsevier B.V. All rights reserved. (AU)

FAPESP's process: 11/07762-8 - Granger causality for sets of time series: development of methodologies to model selection and extensions in the frequency domain with applications to molecular biology and neuroscience
Grantee:André Fujita
Support Opportunities: Regular Research Grants