Advanced search
Start date
Betweenand

Analysis and implementation of semi-supervised algorithms for software fault prediction

Grant number: 12/18030-0
Support type:Scholarships in Brazil - Post-Doctorate
Effective date (Start): November 01, 2012
Effective date (End): April 30, 2013
Field of knowledge:Physical Sciences and Mathematics - Computer Science
Principal Investigator:André Carlos Ponce de Leon Ferreira de Carvalho
Grantee:Tiago Silva da Silva
Home Institution: Instituto de Ciências Matemáticas e de Computação (ICMC). Universidade de São Paulo (USP). São Carlos , SP, Brazil

Abstract

Due to the heterogeneity of software engineering (SE) data, software engineering applications of machine learning (ML) face unique challenges compared to other fields where they are used every day. In SE, no two real-world software systems are much alike, even within the same application domain. What we learn about one system and one process usually do not apply to another. Besides, software engineers are no experts in ML, and ML researchers are no experts in SE data. A common and critical problem with ML solutions to SE tasks is that most algorithms use a variety of parameters, which are domain and data specific. Reusing a set of parameter values from one application to another usually leads to poor results. This customization problem is one of the main reasons that prevent many ML solutions to SE problems from migrating from the research labs to industry. Therefore, new research will have to address this generic algorithm customization problem. To do so, in this project we argue that a widely used supervised learning approach for building software fault prediction models is not useful when the available fault data is rather small. We believe that under such circumstances, modules whose fault-proneness is unknown but whose descriptive software metrics are available should be properly exploited for accurate fault prediction. More specifically, a classification scheme supported by unlabeled data, semi-supervised learning, may be a better approach for these cases. We go further and argue on the applicability of transductive learning to software fault prediction. Transductive learning is a special case of semi-supervised learning, in which one does not need to build a generalizable model, but only predict the outcome of cases with unknown class information. Transductive learning might be the solution for software metrics prediction in general, and software fault prediction in particular, specially considering the lack of generalizable models across different software development projects.