Algoritmos incrementais e eficientes para árvores e regras de decisão e algoritmos baseados em proximidade

Saulo Martiello Mastelini

Full text
Author(s):	Saulo Martiello Mastelini Total Authors: 1
Document type:	Doctoral Thesis
Press:	São Carlos.
Institution:	Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:	2023-05-03
Examining board members:	André Carlos Ponce de Leon Ferreira de Carvalho; Gustavo Enrique de Almeida Prado Alves Batista; Elaine Ribeiro de Faria Paiva; Anderson de Rezende Rocha
Advisor:	André Carlos Ponce de Leon Ferreira de Carvalho
Abstract
The fast development of digital technologies has given rise to the constant production of data in different forms and from different sources. While at the beginning of machine learning (ML) studies, data scarcity was a relevant problem for many application domains, nowadays, we may have too much information to handle with traditional ML algorithms. Besides, changes in the underlying data distributions that govern the data generation might render traditional ML solutions useless in real-world applications. Online ML (OML) aims to create solutions able to process data incrementally, with limited computation resource usage, and to deal with time-changing data distributions. Despite successfully creating efficient solutions applied in diverse domains, we have seen a recent growing trend in creating OML algorithms that only focus on predictive performance and overlook computational costs. This observation is even more prevalent when considering regression tasks, using decision trees, decision rules, and ensembles thereof, which are among the most popular OML solutions. Decreasing the computational costs of OML solutions could be more relevant than a slight increase in predictive performance from a real-world application standpoint. Hence, in this thesis, we focus on creating improved and efficient OML algorithms whose primary focus is decreasing the time and memory costs of tree and decision rule-based regressors and ensemble-based regressors. The desired bi-product is improving or, at least, leaving the predictive performance unchanged. We also explore an efficient algorithm to perform incremental nearest-neighbor searches. This thesis is organized as an article collection, comprehending our most relevant publications focused on the presented theme. We tackle strategies to create low-error ensemble-based regressors, efficient strategies to build incremental decision tree regressors, propose a fast and accurate decision tree-based ensemble regressor, and explore an efficient and versatile algorithm to perform nearest neighbor search in sliding windows. (AU)

FAPESP's process:	18/07319-6 - Multi-target data stream mining
Grantee:	Saulo Martiello Mastelini
Support Opportunities:	Scholarships in Brazil - Doctorate

Short URL