Investigation of hierarchial classification techniques for bioinformatics problems

Grant number: 06/02356-3
Support type:Scholarships in Brazil - Master
Effective date (Start): September 01, 2006
Effective date (End): February 29, 2008
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computing Methodologies and Techniques
Principal Investigator:André Carlos Ponce de Leon Ferreira de Carvalho
Grantee:Eduardo de Paula Costa
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil


In Machine Learning and Data Mining, most works in classificationproblems deal with flat classification, where each instance is classified in one of n classes, each in the same level. There are, however, more complex classification problems where the classes to be arranged are in a hierarchical structure. For these problems, the use of techniques and concepts of hierarchical classification have shown very useful. Bioinformatics is one of the research areas with great potential for the use of such techniques. The use of hierarchical classification in this setting is a relatively new area and not much explored. For this reason, there are many opportunities for research in this area. The majority of works developed in this area have used measures of predictive accuracy adopted in flat classification problems, not considering the fact that for hierarchical classification problems the costs related to misclassification errors tend to be significantly different among classes from different levels, or even from the same level. Therefore, this project intends to investigate the main methods and techniques for hierarchical classification problems in bioinformatics, considering measures of misclassification costs more specific for this approach. As a case study, a current problem in bioinformatics will be used, for example, the problem of predicting protein functional classes, to validate the classification models and to analyze the cost measures used, comparing the obtained results. Given the comparison, a new cost measure may be proposed.

