In Machine Learning and Data Mining, most works in classificationproblems deal with flat classification, where each instance is classified in one of n classes, each in the same level. There are, however, more complex classification problems where the classes to be arranged are in a hierarchical structure. For these problems, the use of techniques and concepts of hierarchical classification have shown very useful. Bioinformatics is one of the research areas with great potential for the use of such techniques. The use of hierarchical classification in this setting is a relatively new area and not much explored. For this reason, there are many opportunities for research in this area. The majority of works developed in this area have used measures of predictive accuracy adopted in flat classification problems, not considering the fact that for hierarchical classification problems the costs related to misclassification errors tend to be significantly different among classes from different levels, or even from the same level. Therefore, this project intends to investigate the main methods and techniques for hierarchical classification problems in bioinformatics, considering measures of misclassification costs more specific for this approach. As a case study, a current problem in bioinformatics will be used, for example, the problem of predicting protein functional classes, to validate the classification models and to analyze the cost measures used, comparing the obtained results. Given the comparison, a new cost measure may be proposed.
News published in Agência FAPESP Newsletter about the scholarship: