Advanced search
Start date
Betweenand


BERT- and TF-IDF-based feature extraction for long-lived bug prediction in FLOSS: A comparative study

Full text
Author(s):
Gomes, Luiz ; Torres, Ricardo da Silva ; Cortes, Mario Lucio
Total Authors: 3
Document type: Journal article
Source: INFORMATION AND SOFTWARE TECHNOLOGY; v. 160, p. 12-pg., 2023-04-20.
Abstract

Context: The correct prediction of long-lived bugs could help maintenance teams to build their plan and to fix more bugs that often adversely affect software quality and disturb the user experience across versions in Free/Libre Open-Source Software (FLOSS). Machine Learning and Text Mining methods have been applied to solve many real-world prediction problems, including bug report handling.Objective: Our research aims to compare the accuracy of ML classifiers on long-lived bug prediction in FLOSS using Bidirectional Encoder Representations from Transformers (BERT)-and Term Frequency -Inverse Document Frequency (TF-IDF)-based feature extraction. Besides that, we aim to investigate BERT variants on the same task.Method: We collected bug reports from six popular FLOSS and used the Machine Learning classifiers to predict long-lived bugs. Furthermore, we compare different feature extractors, based on BERT and TF-IDF methods, in long-lived bug prediction.Results: We found that long-lived bug prediction using BERT-based feature extraction systematically outper-formed the TF-IDF. The SVM and Random Forest outperformed other classifiers in almost all datasets using BERT. Furthermore, smaller BERT architectures show themselves as competitive.Conclusion: Our results demonstrated a promising avenue to predict long-lived bugs based on BERT contextual embedding features and fine-tuning procedures. (AU)

FAPESP's process: 13/50155-0 - Combining new technologies to monitor phenology from leaves to ecosystems
Grantee:Leonor Patricia Cerdeira Morellato
Support Opportunities: Research Program on Global Climate Change - University-Industry Cooperative Research (PITE)
FAPESP's process: 14/12236-1 - AnImaLS: Annotation of Images in Large Scale: what can machines and specialists learn from interaction?
Grantee:Alexandre Xavier Falcão
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 16/50250-1 - The secret of playing football: Brazil versus the Netherlands
Grantee:Sergio Augusto Cunha
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 15/24494-8 - Communications and processing of big data in cloud and fog computing
Grantee:Nelson Luis Saldanha da Fonseca
Support Opportunities: Research Projects - Thematic Grants
FAPESP's process: 14/50715-9 - Characterizing and predicting biomass production in sugarcane and eucalyptus plantations in Brazil
Grantee:Rubens Augusto Camargo Lamparelli
Support Opportunities: Research Grants - Research Partnership for Technological Innovation - PITE
FAPESP's process: 17/20945-0 - Multi-user equipment approved in great 16/50250-1: local positioning system
Grantee:Sergio Augusto Cunha
Support Opportunities: Multi-user Equipment Program