Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Coarse-refinement dilemma: On generalization bounds for data clustering

Texto completo
Autor(es):
Vaz, Yule [1] ; de Mello, Rodrigo Fernandes [1] ; Grossi Ferreira, Carlos Henrique [1]
Número total de Autores: 3
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, Inst Math & Comp Sci, Trabalhador Saocarlense Ave 400, BR-13560970 Sao Carlos, SP - Brazil
Número total de Afiliações: 1
Tipo de documento: Artigo Científico
Fonte: EXPERT SYSTEMS WITH APPLICATIONS; v. 184, DEC 1 2021.
Citações Web of Science: 0
Resumo

The data clustering problem is of central importance for the area of machine learning, given its usefulness to represent data structural similarities from input spaces. Although, data clustering counts on scarse literature of a theoretical framework with generalization guarantees. In this context, this manuscript introduces a new concept, based on multidimensional persistent homology, to analyze the conditions on which a clustering model is capable of generalizing data. As a first step, we propose a more general definition of DC problem by relying on topological spaces, instead of metric ones as typically approached in the literature. From that, we show that the data clustering problem presents an analogous dilemma to the bias-variance one, which is here referred to as the coarse-refinement dilemma, from which we conclude that: (i) highly-refined partitions and the clustering instability (overfitting); and (ii) over-coarse partitions and the lack of representativeness (underfitting). The coarse-refinement dilemma suggests the need of a relaxation of Kleinberg's richness axiom, as such axiom allows the production of unstable or unrepresentative partitions. Experimental exploration considering different clustering refinements can, then, depict such partitions. (AU)

Processo FAPESP: 17/16548-6 - Proposta de uma abordagem com garantias teóricas para a detecção de mudanças de conceito em fluxos de dados
Beneficiário:Rodrigo Fernandes de Mello
Modalidade de apoio: Bolsas no Exterior - Pesquisa