An approach based on the stability of clustering algorithms to ensure concept drif...
Scalable descriptive models over extensive volumes of distributed data
Classifying stationary and non-stationary distributed data with the scarcity of la...
![]() | |
Author(s): |
Marcelo Keese Albertini
Total Authors: 1
|
Document type: | Doctoral Thesis |
Press: | São Carlos. |
Institution: | Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB) |
Defense date: | 2012-04-11 |
Examining board members: |
Rodrigo Fernandes de Mello;
Alexandre Cláudio Botazzo Delbem;
Estevam Rafael Hruschka Júnior;
Ana Carolina Lorena;
Ivan Nunes da Silva
|
Advisor: | Rodrigo Fernandes de Mello |
Abstract | |
Several research fields have described phenomena that produce endless sequences of samples, referred to as data streams. These phenomena usually present behavior variation and are studied by means of unsupervised induction based on data clustering. In order to cope with the characteristics of data streams, researchers have designed clustering algorithms with low time and space complexity requirements. However, predefined and static parameters (thresholds, number of clusters and learning rates) found in current algorithms still limit the application of clustering to data streams. This limitation motivated this thesis, which proposes a continuous approach to evaluate behavior variations and adapt algorithm inductive bias by changing its parameters. The main contribution of this thesis is the proposal of three approaches to adapt induction bias: i) an approach based on the design of an adaptive artificial self-organizing neural network architecture that enables behavior evaluation by means of Markov chain and Shannon entropy estimations; ii) an approach to adapt traditional data clustering algorithms according to behavior variations in sequences of data chunks; and iii) an approach based on the proposed neural network architecture to continuously adapt parameters by means of the evaluation of data stability. Additionally, in order to analyze the essential characteristics of data streams, this thesis presents a formalization for the problem of data stream clustering and a taxonomy on approaches to detect behavior variations (AU) |