Selection of the number of clusters in functional data analysis

Zambom, Adriano Zanin; Alfonso Collazos, Julian; Dias, Ronaldo

Full text
Author(s):	Zambom, Adriano Zanin ; Alfonso Collazos, Julian ; Dias, Ronaldo Total Authors: 3
Document type:	Journal article
Source:	JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION; v. 92, n. 14, p. 19-pg., 2022-03-23.
Abstract
Identifying the number K of clusters in a dataset is one of the most difficult problems in clustering analysis. A choice of K that correctly characterizes the features of the data is essential for building meaningful clusters. In this paper we tackle the problem of estimating the number of clusters in functional data analysis by introducing a new measure that can be used with different procedures in selecting the optimal K. The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares. Simulations in challenging scenarios suggest that procedures using this measure can detect the correct number of clusters more frequently than existing methods in the literature. The application of the proposed method is illustrated on several real datasets. (AU)

FAPESP's process:	19/04535-2 - Covariate enabled variable length Markov Chains
Grantee:	Nancy Lopes Garcia
Support Opportunities:	Research Grants - Visiting Researcher Grant - International


FAPESP's process:	17/15306-9 - Incorporating Functional covariates into nonparametric regression models
Grantee:	Nancy Lopes Garcia
Support Opportunities:	Regular Research Grants


FAPESP's process:	18/04654-9 - Time series, wavelets and high dimensional data
Grantee:	Pedro Alberto Morettin
Support Opportunities:	Research Projects - Thematic Grants

Short URL