Advanced search
Start date
Betweenand


Metalearning for algorithm selection in data strams

Full text
Author(s):
Andre Luís Debiaso Rossi
Total Authors: 1
Document type: Doctoral Thesis
Press: São Carlos.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
André Carlos Ponce de Leon Ferreira de Carvalho; George Darmiton da Cunha Cavalcanti; Estevam Rafael Hruschka Júnior; Ronaldo Cristiano Prati; Carlos Manuel Milheiro de Oliveira Pinto Soares
Advisor: André Carlos Ponce de Leon Ferreira de Carvalho; Carlos Manuel Milheiro de Oliveira Pinto Soares
Abstract

Machine learning algorithms are widely employed to induce models for knowledge discovery in databases. Since most of these algorithms suppose that the underlying distribution of the data is stationary, a model is induced only once e it is applied to predict the label of new data indefinitely. However, currently, many real applications, such as transportation management systems and monitoring of sensor networks, generate data streams that can change over time. Consequently, the effectiveness of the algorithm chosen for these problems may deteriorate or other algorithms may become more suitable for the new data characteristics. This thesis proposes a metalearning based method for the management of the learning process in dynamic environments of data streams aiming to improve the general predictive performance of the learning system. This method, named MetaStream, regularly selects the most promising algorithm for arriving data according to its characteristics and past experiences. The proposed method employs machine learning techniques to generate metaknowledge, which relates the characteristics extracted from data in different time points to the predictive performance of the algorithms. Among the measures applied to extract relevant information are those commonly used in conventional metalearning for different data sets, which are adapted for the data stream particularities, and from other related areas that consider the order of the data stream. We evaluate MetaStream for three real data stream problems and six different learning algorithms. The results show the applicability of the MetaStream and its capability to improve the general predictive performance of the learning system compared to a baseline method for the majority of the cases investigated. It must be observed that an ensemble of models is usually superior to MetaStream. Thus, we analyzed the main factors that may have influenced the results and indicate possible improvements for the proposed method (AU)

FAPESP's process: 08/11569-6 - Meta-learning applied to data streams problems
Grantee:André Luis Debiaso Rossi
Support Opportunities: Scholarships in Brazil - Doctorate