Abordagens para combinar classificadores e agrupadores em problemas de classificação

Luiz Fernando Sommaggio Coletta

Full text
Author(s):	Luiz Fernando Sommaggio Coletta Total Authors: 1
Document type:	Doctoral Thesis
Press:	São Carlos.
Institution:	Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:	2015-11-23
Examining board members:	Eduardo Raul Hruschka; Gustavo Enrique de Almeida Prado Alves Batista; Gisele Lobo Pappa; Anderson de Rezende Rocha; Ivan Nunes da Silva
Advisor:	Eduardo Raul Hruschka; Moacir Antonelli Ponti
Abstract
Unsupervised learning models can provide a variety of supplementary constraints to improve the generalization capability of classifiers. Based on this assumption, an existing algorithm, named C3E (from Consensus between Classification and Clustering Ensembles), receives as inputs class probability distribution estimates for objects in a target set as well as a similarity matrix. Such a similarity matrix is typically built from clusterers induced on the target set, whereas the class probability distributions are obtained by an ensemble of classifiers induced from a training set. As a result, C3E provides refined estimates of the class probability distributions, from the consensus between classifiers and clusterers. The underlying idea is that similar new objects in the target set are more likely to share the same class label. In this thesis, a simpler version of the C3E algorithm, based on a Squared Loss function (C3E-SL), was investigated from an approach that enables the automatic estimation (from data) of its critical parameters. This approach uses a new evolutionary strategy designed to make C3E-SL more practical and flexible, making room for the development of variants of the algorithm. To address the scarcity of labeled data, a new algorithm that performs semi-supervised learning was proposed. Its mechanism exploits the intrinsic structure of the data by using the C3E-SL algorithm in a self-training procedure. Such a notion inspired the development of another algorithm based on active learning, which is able to self-adapt to learn new classes that may emerge when classifying new data. An extensive experimental analysis, focused on real-world problems, showed that the proposed algorithms are quite useful and promising. The combination of supervised and unsupervised learning yielded classifiers of great practical value and that are less dependent on user-defined parameters. The achieved results were typically better than those obtained by traditional classifiers. (AU)

FAPESP's process:	10/20830-0 - Evolutionary Algorithms for Aggregating Ensembles of Classifiers and Clusterers
Grantee:	Luiz Fernando Sommaggio Coletta
Support Opportunities:	Scholarships in Brazil - Doctorate

Short URL