Data Stream Clustering: Introducing Recursively Extendable Aggregation Functions for Incremental Cluster Fusion Processes

Urio-Larrea, A.; Camargo, H.; Lucca, G.; Asmus, T.; Marco-Detchart, C.; Schick, L.; Lopez-Molina, C.; Andreu-Perez, J.; Bustince, H.; Dimuro, G. P.

Texto completo
Autor(es):	Urio-Larrea, A. ; Camargo, H. ; Lucca, G. ; Asmus, T. ; Marco-Detchart, C. ; Schick, L. ; Lopez-Molina, C. ; Andreu-Perez, J. ; Bustince, H. ; Dimuro, G. P. Número total de Autores: 10
Tipo de documento:	Artigo Científico
Fonte:	IEEE TRANSACTIONS ON CYBERNETICS; v. 55, n. 3, p. 15-pg., 2025-03-01.
Resumo
In data stream (DS) learning, the system has to extract knowledge from data generated continuously, usually at high speed and in large volumes, making it impossible to store the entire set of data to be processed in batch mode. Hence, machine learning models must be built incrementally by processing the incoming examples, as data arrive, while updating the model to be compatible with the current data. In fuzzy DS clustering, the model can either absorb incoming data into existing clusters or initiate a new cluster. As the volume of data increases, there is a possibility that the clusters will overlap to the point where it is convenient to merge two or more clusters into one. Then, a cluster comparison measure (CM) should be applied, to decide whether such clusters should be combined, also in an incremental manner. This defines an incremental fusion process based on aggregation functions that can aggregate the incoming inputs without storing all the previous inputs. The objective of this article is to solve the fuzzy DS clustering problem of incrementally comparing fuzzy clusters on a formal basis. First, we formalize and operationalize incremental fusion processes of fuzzy clusters by introducing recursively extendable (RE) aggregation functions, studying construction methods and different classes of such functions. Second, we propose two approaches to compare clusters: 1) similarity and 2) overlapping between clusters, based on RE aggregation functions. Finally, we analyze the effect of those incremental CMs on the online and offline phases of the well-known fuzzy clustering algorithm d-FuzzStream, showing that our new approach outperforms the original algorithm and presents better or comparable performance to other state-of-the-art DS clustering algorithms found in the literature. (AU)

Processo FAPESP:	22/09136-1 - Consecução dos Objetivos de Desenvolvimento Sustentável no contexto de enfrentamento de desastres (CODeS): ferramentas computacionais na análise integrada de dados
Beneficiário:	Norma Felicidade Lopes da Silva Valencio
Modalidade de apoio:	Auxílio à Pesquisa - Programa eScience e Data Science - Regular

URL curto