An approach based on the stability of clustering algorithms to ensure concept drif...
Clustering Data Streams with Automatic Estimation of Number of Clusters
Mining Frequent Data Streams of High Dimensionality with a Case Study in Digital G...
Full text | |
Author(s): |
Total Authors: 2
|
Affiliation: | [1] Univ Sao Paulo, Inst Math & Comp Sci, BR-13566590 Sao Carlos, SP - Brazil
Total Affiliations: 1
|
Document type: | Journal article |
Source: | JOURNAL OF INTELLIGENT INFORMATION SYSTEMS; v. 42, n. 3, p. 531-566, JUN 2014. |
Web of Science Citations: | 9 |
Abstract | |
The current ability to produce massive amounts of data and the impossibility in storing it motivated the development of data stream mining strategies. Despite the proposal of many techniques, this research area still lacks in approaches to mine data streams composed of multiple time series, which has applications in finance, medicine and science. Most of the current techniques for clustering streaming time series have a serious limitation in their similarity measure, which are based on the Pearson correlation. In this paper, we show the Pearson correlation is not capable of detecting similarities even for classic time series models, such as those by Box and Jenkins. This limitation motivated our proposal to cluster streaming time series based on their generating functions, which is achieved by considering features obtained using descriptive measures, such as Auto Mutual Information, the Hurst Exponent and several others. We present a new tree-based clustering algorithm, entitled TS-Stream, which uses the extracted features to produce partitions in better accordance to the time series generating functions. Experiments with synthetic data sets confirm TS-Stream outperforms ODAC, currently the most popular technique, in terms of clustering quality. Using real financial time series from the NYSE and NASDAQ, we conducted stock trading simulations employing TS-Stream to support the creation of diversified investment portfolios. Results confirmed TS-Stream increased the monetary returns in several orders of magnitude when compared to trading strategies simply based on the Moving Average Convergence Divergence financial indicator. (AU) | |
FAPESP's process: | 10/05062-6 - Wavelet-based clustering for data streams. |
Grantee: | Cássio Martini Martins Pereira |
Support Opportunities: | Scholarships in Brazil - Doctorate |