Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

A Support System for Clustering Data Streams with a Variable Number of Clusters

Texto completo
Autor(es):
Silva, Jonathan de Andrade ; Hruschka, Eduardo Raul
Número total de Autores: 2
Tipo de documento: Artigo Científico
Fonte: ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS; v. 11, n. 2 JUL 2016.
Citações Web of Science: 2
Resumo

Many algorithms for clustering data streams that are based on the widely used k-Means have been proposed in the literature. Most of these algorithms assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we propose a support system that allows not only estimating the number of clusters automatically from data but also monitoring the process of the data-stream clustering. We illustrate the potential of the proposed system by means of a prototype that implements eight algorithms for clustering data streams, namely, Stream LSearch-OMRk, StreamLSearch-BkM, Stream LSearch-IOMRk, Stream LSearch-IBkM, CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk, and StreamKM++-BkM. These algorithms are combinations of three state-of-the-art algorithms for clustering data streams with fixed k, namely, Stream LSearch, CluStream, and StreamKM++, with two algorithms for estimating the number of clusters, which are Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). We experimentally compare the performance of these algorithms using both synthetic and real-world data streams. Analyses of statistical significance suggest that the algorithms that are based on OMRk yield the best data partitions, while the algorithms that are based on BkM are more computationally efficient. Additionally, StreamKM++-OMRk and Stream LSearch-IBkM provide the best tradeoff relationship between accuracy and efficiency. (AU)

Processo FAPESP: 10/15049-7 - Agrupamento de dados em Fluxos Contínuos com Estimativa Automática do Número de Grupos
Beneficiário:Jonathan de Andrade Silva
Modalidade de apoio: Bolsas no Brasil - Doutorado