Relações entre ranking, análise ROC e calibração em aprendizado de máquina

Edson Takashi Matsubara

Full text
Author(s):	Edson Takashi Matsubara Total Authors: 1
Document type:	Doctoral Thesis
Press:	São Carlos.
Institution:	Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:	2008-10-21
Examining board members:	Maria Carolina Monard; Marcelo Ladeira; Wagner Meira Junior; Solange Oliveira Rezende; Bianca Zadrozny
Advisor:	Maria Carolina Monard
Abstract
Supervised learning has been used mostly for classification. In this work we show the benefits of a welcome shift in attention from classification to ranking. A ranker is an algorithm that sorts a set of instances from highest to lowest expectation that the instance is positive, and a ranking is the outcome of this sorting. Usually a ranking is obtained by sorting scores given by classifiers. In this work, we are concerned about novel approaches to promote the use of ranking. Therefore, we present the differences and relations between ranking and classification followed by a proposal of a novel ranking algorithm called LEXRANK, whose rankings are derived not from scores, but from a simple ranking of attribute values obtained from the training data. One very important field which uses rankings as its main input is ROC analysis. The study of decision trees and ROC analysis suggested an interesting way to visualize the tree construction in ROC graphs, which has been implemented in a system called PROGROC. Focusing on ROC analysis, we observed that the slope of segments obtained from the ROC convex hull is equivalent to the likelihood ratio, which can be converted into probabilities. Interestingly, this ROC convex hull calibration method is equivalent to Pool Adjacent Violators (PAV). Furthermore, the ROC convex hull calibration method optimizes Brier Score, and the exploration of this measure leads us to find an interesting connection between the Brier Score and ROC Curves. Finally, we also investigate rankings build in the selection method which increments the labelled set of CO-TRAINING, a semi-supervised multi-view learning algorithm (AU)

FAPESP's process:	05/03792-9 - Semi-supervised multi-vision learning
Grantee:	Edson Takashi Matsubara
Support Opportunities:	Scholarships in Brazil - Doctorate

Short URL