Advanced search
Start date
Betweenand
(Reference retrieved automatically from Web of Science through information on FAPESP grant and its corresponding number as mentioned in the publication by the authors.)

Multidimensional surrogate stability to detect data stream concept drift

Full text
Author(s):
da Costa, Fausto G. ; Duarte, Felipe S. L. G. ; Vallim, Rosane M. M. ; de Mello, Rodrigo F.
Total Authors: 4
Document type: Journal article
Source: EXPERT SYSTEMS WITH APPLICATIONS; v. 87, p. 15-29, NOV 30 2017.
Web of Science Citations: 6
Abstract

Concept drift detection plays a very important role in the context of data streams. It allows to point out data behavior modifications along time, which are intrinsically associated to the phenomena responsible for producing such sequences of observations. By detecting such modifications, one can better understand those phenomena and take better decisions in different application domains, e.g. stock market, climate change, population growth, etc. Besides several proposals, most of the studies lack in formal guarantees to ensure the concept drift detection. More recently, Vallim and Mello proposed 1DFT (Unidimensional Fourier Transform), an algorithm that detects drifts on unidimensional streams while holding a stability property based on surrogate series. Motivated by their work we here propose the multidimensional surrogate stability concept, which extends their approach to multidimensional data streams. In addition, our approach, named MDFT (Multidimensional Fourier Transform), employs a different and more robust measurement to analyze drifts, which is based on the Shannon's and Von Neumann's Entropies to quantify variations in data spaces. As final contribution, MDFT allows unidimensional streams, to be reconstructed in phase spaces so their data dependencies can also be analyzed to take conclusions on concept drifts along time. Experiments considered seven 120,000-observation synthetic data streams. Synthetic data was taken into account as it allows us to define the exact points of change, using the largest Lyapunov exponent, for which our approach should trigger the concept drift events. Experiments compared MDFT against the main algorithms to detect concept drift in the context of Machine Learning (Page-Hinkley Test - PHT, Adaptive Windowing - ADWIN, and Cumulative Sum Control Chart - CUSUM) and Dynamical Systems (Recurrence Quantification Analysis using different measurements - RQA, and Permutation Entropy - PE). Results confirm MDFT outperforms the other algorithms in terms of an average measurement (using the Euclidean distance) based on: the Missed Detection Rate (MDR), the Mean Time for Detection (MTD) and the Mean Time between False Alarms (MTFA). (C) 2017 Elsevier Ltd. All rights reserved. (AU)

FAPESP's process: 14/21636-3 - Time series decomposition preserving deterministic influences
Grantee:Felipe Simões Lage Gomes Duarte
Support Opportunities: Scholarships in Brazil - Doctorate
FAPESP's process: 14/13323-5 - An approach based on the stability of clustering algorithms to ensure concept drift detection on data streams
Grantee:Rodrigo Fernandes de Mello
Support Opportunities: Regular Research Grants