Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Using dynamical systems tools to detect concept drift in data streams

Texto completo
Autor(es):
da Costa, F. G. [1] ; Rios, R. A. [2] ; de Mello, R. F. [1]
Número total de Autores: 3
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, Inst Math & Comp Sci, BR-13566590 Sao Carlos, SP - Brazil
[2] Univ Fed Bahia, Dept Comp Sci, BR-40170110 Salvador, BA - Brazil
Número total de Afiliações: 2
Tipo de documento: Artigo de Revisão
Fonte: EXPERT SYSTEMS WITH APPLICATIONS; v. 60, p. 39-50, OCT 30 2016.
Citações Web of Science: 6
Resumo

Real-world data streams may change their behaviors along time, what is referred to as concept drift. By detecting those changes, researchers obtain relevant information about the phenomena that produced such streams (e.g. temperatures in a region, bacteria population, disease occurrence, etc.). Many concept drift detection algorithms consider supervised or semi-supervised approaches which tend to be unfeasible when data is collected at high frequencies, due to the difficulties involved in labeling. Complementarily, current studies usually assume data as statistically independent and identically distributed, disregarding any temporal relationship among observations and, consequently, risking the quality of data modeling. In order to tackle both aspects, we employ dynamical system modeling to represent the temporal relationships among data observations and how they modify along time in attempt to detect concept drift. This approach considers Taken's immersion theorem to unfold consecutive windows of data observations into the phase space in attempt to represent and compare time dependencies. From this perspective, we proposed four new concept drift detection algorithms based on the unsupervised machine learning paradigm. The first algorithm builds dendrograms of consecutive phase spaces (every phase space represents the time relationships for the observations contained in a particular data window) and compare them out by using the Gromov Hausdorff distance, providing enough guarantees to detect concept drifts. The second algorithm employs the Cross Recurrence Plot and the Recurrence Quantification Analysis to detect relevant changes in consecutive phase spaces and warn about relevant data modifications. We also preprocess data windows by considering the Empirical Mode Decomposition method and Mutual Information in attempt to take only the deterministic stream behavior into account. All algorithms were implemented as plugins for the Massive Online Analysis (MOA) software and then compared to well-known algorithms from literature. Results confirm the proposed algorithms were capable of detecting most of the behavior changes, creating few false alarms. (C) 2016 Elsevier Ltd. All rights reserved. (AU)

Processo FAPESP: 14/13323-5 - Abordagem baseada na estabilidade de algoritmos de agrupamento de dados para garantir a detecção de mudanças de conceito em fluxos de dados
Beneficiário:Rodrigo Fernandes de Mello
Modalidade de apoio: Auxílio à Pesquisa - Regular