Advanced search
Start date
Betweenand


Dimensionality reduction using mean conditional entropy applied for bioinformatics and image processing problems

Full text
Author(s):
David Correa Martins Junior
Total Authors: 1
Document type: Master's Dissertation
Press: São Paulo.
Institution: Universidade de São Paulo (USP). Instituto de Matemática e Estatística (IME/SBI)
Defense date:
Examining board members:
Roberto Marcondes Cesar Junior; Junior Barrera; Maria Carolina Monard
Advisor: Roberto Marcondes Cesar Junior
Abstract

Dimensionality reduction is a very important pattern recognition problem with many applications. Among the dimensionality reduction techniques, feature selection was the main focus of this research. In general, most dimensionality reduction methods that may be found in the literature privilegiate cases in which the data is linearly separable and with only two distinct classes. Aiming at covering more generic cases, this work proposes a criterion function, based on the statistical theory principles of entropy and mutual information, to be embedded in the existing feature selection algorithms. This approach allows to classify the data, linearly separable or not, in two or more classes, taking into account a small feature subspace. Results with synthetic and real data were obtained corroborating the utility of this technique. This work addressed two bioinformatics problems. The first is about distinguishing two biological fenomena through the selection of an appropriate subset of genes. We studied a strong genes selection technique using support vector machines (SVM) which has been applied to SAGE data of human genome. Most of the strong genes found by this technique to distinguish brain tumors (glioblastoma and astrocytoma) were validated by the proposed methodology presented in this work. The second problem covered in this work is the identification of genetic network regulation, using our proposed methodology, from data produced by work of DeRisi et al about microarray of the Plasmodium falciparum genome, malaria agent, during 48 hours of its life cycle. This text presents evidences that using mean conditional entropy to estimate a probabilistic genetic network (PGN) may be very promising. In the image processing context, it is shown that this technique can be applied to obtain minimal W-operators that perform image filtering and texture recognition. (AU)