Advanced search
Start date
Betweenand


Stochastic density ratio estimation and its application to feature selection

Full text
Author(s):
Ígor Assis Braga
Total Authors: 1
Document type: Doctoral Thesis
Press: São Carlos.
Institution: Universidade de São Paulo (USP). Instituto de Ciências Matemáticas e de Computação (ICMC/SB)
Defense date:
Examining board members:
Maria Carolina Monard; Fabio Gagliardi Cozman; Ronaldo Dias; Rodrigo Fernandes de Mello; Bianca Zadrozny
Advisor: Maria Carolina Monard; Vladimir Naumovich Vapnik
Abstract

The estimation of the ratio of two probability densities is an important statistical tool in supervised machine learning. In this work, we introduce new methods of density ratio estimation based on the solution of a multidimensional integral equation involving cumulative distribution functions. The resulting methods use the novel V -matrix, a concept that does not appear in previous density ratio estimation methods. Experiments demonstrate the good potential of this new approach against previous methods. Mutual Information - MI - estimation is a key component in feature selection and essentially depends on density ratio estimation. Using one of the methods of density ratio estimation proposed in this work, we derive a new estimator - VMI - and compare it experimentally to previously proposed MI estimators. Experiments conducted solely on mutual information estimation show that VMI compares favorably to previous estimators. Experiments applying MI estimation to feature selection in classification tasks evidence that better MI estimation leads to better feature selection performance. Parameter selection greatly impacts the classification accuracy of the kernel-based Support Vector Machines - SVM. However, this step is often overlooked in experimental comparisons, for it is time consuming and requires familiarity with the inner workings of SVM. In this work, we propose procedures for SVM parameter selection which are economic in their running time. In addition, we propose the use of a non-linear kernel function - the min kernel - that can be applied to both low- and high-dimensional cases without adding another parameter to the selection process. The combination of the proposed parameter selection procedures and the min kernel yields a convenient way of economically extracting good classification performance from SVM. The Regularized Least Squares - RLS - regression method is another kernel method that depends on proper selection of its parameters. When training data is scarce, traditional parameter selection often leads to poor regression estimation. In order to mitigate this issue, we explore a kernel that is less susceptible to overfitting - the additive INK-splines kernel. Then, we consider alternative parameter selection methods to cross-validation that have been shown to perform well for other regression methods. Experiments conducted on real-world datasets show that the additive INK-splines kernel outperforms both the RBF and the previously proposed multiplicative INK-splines kernel. They also show that the alternative parameter selection procedures fail to consistently improve performance. Still, we find that the Finite Prediction Error method with the additive INK-splines kernel performs comparably to cross-validation. (AU)