Advanced search
Start date
Betweenand

High dimensional data streams clustering

Grant number: 13/04453-0
Support type:Scholarships in Brazil - Post-Doctorate
Effective date (Start): November 01, 2013
Effective date (End): March 25, 2016
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal researcher:Rodrigo Fernandes de Mello
Grantee:Cássio Martini Martins Pereira
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

In 2009, the Brazilian Computer Society (SBC) gathered to stipulate the great challenges of computing in Brazil with perspectives to 2020. One of the appointed challenges was "how to increase our capacity to extract relevant information from data streams". One of the most attractive subareas of data streams mining is clustering, as it does not require a specialist to supervise every data base example. Traditionally, scientific experiments in the most diverse fields produce data bases with many attributes, making the analysis harder. However, most of the time, the desired clusters reside in a low dimensional subspace, or manifold, embedded in the original high dimensional space. This problem, referred to as the curse of dimensionality, has limited the success of many machine learning techniques. Few papers in the data streams area have addressed the task of clustering in high dimensional spaces. All of them, up to now, have used the concept of variance to determine the relevance of dimensions, given a fixed threshold supplied by the user a priori. This approach imposes a severe limitation, given the volatile nature of data streams. This project aims to study and propose measures of information quantification to determine feature relevance in the context of high dimensional data streams clustering. Those measures do not suffer the problems of variance, since they are based on the probabilities of data and not their scale. Furthermore, this project aims to propose mechanisms for parameter adaptation with regards to determining feature relevance, which is essential given data streams volatile nature. It is hoped that with the results of this project it will be possible to find clusters in scenarios not supported by current techniques.

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
PEREIRA, CASSIO M. M.; DE MELLO, RODRIGO F.. Persistent homology for time series and spatial data clustering. EXPERT SYSTEMS WITH APPLICATIONS, v. 42, n. 15-16, p. 6026-6038, . (13/04453-0, 14/13323-5)
PEREIRA, CASSIO M. M.; DE MELLO, RODRIGO F.. PTS: Projected Topological Stream clustering algorithm. Neurocomputing, v. 180, n. SI, p. 16-26, . (13/04453-0, 14/13323-5)

Please report errors in scientific publications list by writing to: cdi@fapesp.br.