Busca avançada
Ano de início
Entree
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Multidimensional surrogate stability to detect data stream concept drift

Texto completo
Autor(es):
da Costa, Fausto G. ; Duarte, Felipe S. L. G. ; Vallim, Rosane M. M. ; de Mello, Rodrigo F.
Número total de Autores: 4
Tipo de documento: Artigo Científico
Fonte: EXPERT SYSTEMS WITH APPLICATIONS; v. 87, p. 15-29, NOV 30 2017.
Citações Web of Science: 6
Resumo

Concept drift detection plays a very important role in the context of data streams. It allows to point out data behavior modifications along time, which are intrinsically associated to the phenomena responsible for producing such sequences of observations. By detecting such modifications, one can better understand those phenomena and take better decisions in different application domains, e.g. stock market, climate change, population growth, etc. Besides several proposals, most of the studies lack in formal guarantees to ensure the concept drift detection. More recently, Vallim and Mello proposed 1DFT (Unidimensional Fourier Transform), an algorithm that detects drifts on unidimensional streams while holding a stability property based on surrogate series. Motivated by their work we here propose the multidimensional surrogate stability concept, which extends their approach to multidimensional data streams. In addition, our approach, named MDFT (Multidimensional Fourier Transform), employs a different and more robust measurement to analyze drifts, which is based on the Shannon's and Von Neumann's Entropies to quantify variations in data spaces. As final contribution, MDFT allows unidimensional streams, to be reconstructed in phase spaces so their data dependencies can also be analyzed to take conclusions on concept drifts along time. Experiments considered seven 120,000-observation synthetic data streams. Synthetic data was taken into account as it allows us to define the exact points of change, using the largest Lyapunov exponent, for which our approach should trigger the concept drift events. Experiments compared MDFT against the main algorithms to detect concept drift in the context of Machine Learning (Page-Hinkley Test - PHT, Adaptive Windowing - ADWIN, and Cumulative Sum Control Chart - CUSUM) and Dynamical Systems (Recurrence Quantification Analysis using different measurements - RQA, and Permutation Entropy - PE). Results confirm MDFT outperforms the other algorithms in terms of an average measurement (using the Euclidean distance) based on: the Missed Detection Rate (MDR), the Mean Time for Detection (MTD) and the Mean Time between False Alarms (MTFA). (C) 2017 Elsevier Ltd. All rights reserved. (AU)

Processo FAPESP: 14/21636-3 - Decomposição de séries temporais preservando o viés determinístico
Beneficiário:Felipe Simões Lage Gomes Duarte
Modalidade de apoio: Bolsas no Brasil - Doutorado
Processo FAPESP: 14/13323-5 - Abordagem baseada na estabilidade de algoritmos de agrupamento de dados para garantir a detecção de mudanças de conceito em fluxos de dados
Beneficiário:Rodrigo Fernandes de Mello
Modalidade de apoio: Auxílio à Pesquisa - Regular