Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

Using dynamical systems tools to detect concept drift in data streams

Full text
Author(s):
da Costa, F. G. [1] ; Rios, R. A. [2] ; de Mello, R. F. [1]
Total Authors: 3
Affiliation:
[1] Univ Sao Paulo, Inst Math & Comp Sci, BR-13566590 Sao Carlos, SP - Brazil
[2] Univ Fed Bahia, Dept Comp Sci, BR-40170110 Salvador, BA - Brazil
Total Affiliations: 2
Document type: Review article
Source: EXPERT SYSTEMS WITH APPLICATIONS; v. 60, p. 39-50, OCT 30 2016.
Web of Science Citations: 6
Abstract

Real-world data streams may change their behaviors along time, what is referred to as concept drift. By detecting those changes, researchers obtain relevant information about the phenomena that produced such streams (e.g. temperatures in a region, bacteria population, disease occurrence, etc.). Many concept drift detection algorithms consider supervised or semi-supervised approaches which tend to be unfeasible when data is collected at high frequencies, due to the difficulties involved in labeling. Complementarily, current studies usually assume data as statistically independent and identically distributed, disregarding any temporal relationship among observations and, consequently, risking the quality of data modeling. In order to tackle both aspects, we employ dynamical system modeling to represent the temporal relationships among data observations and how they modify along time in attempt to detect concept drift. This approach considers Taken's immersion theorem to unfold consecutive windows of data observations into the phase space in attempt to represent and compare time dependencies. From this perspective, we proposed four new concept drift detection algorithms based on the unsupervised machine learning paradigm. The first algorithm builds dendrograms of consecutive phase spaces (every phase space represents the time relationships for the observations contained in a particular data window) and compare them out by using the Gromov Hausdorff distance, providing enough guarantees to detect concept drifts. The second algorithm employs the Cross Recurrence Plot and the Recurrence Quantification Analysis to detect relevant changes in consecutive phase spaces and warn about relevant data modifications. We also preprocess data windows by considering the Empirical Mode Decomposition method and Mutual Information in attempt to take only the deterministic stream behavior into account. All algorithms were implemented as plugins for the Massive Online Analysis (MOA) software and then compared to well-known algorithms from literature. Results confirm the proposed algorithms were capable of detecting most of the behavior changes, creating few false alarms. (C) 2016 Elsevier Ltd. All rights reserved. (AU)

FAPESP's process: 14/13323-5 - An approach based on the stability of clustering algorithms to ensure concept drift detection on data streams
Grantee:Rodrigo Fernandes de Mello
Support Opportunities: Regular Research Grants