Busca avançada
Ano de início
Entree


BERT- and TF-IDF-based feature extraction for long-lived bug prediction in FLOSS: A comparative study

Texto completo
Autor(es):
Gomes, Luiz ; Torres, Ricardo da Silva ; Cortes, Mario Lucio
Número total de Autores: 3
Tipo de documento: Artigo Científico
Fonte: INFORMATION AND SOFTWARE TECHNOLOGY; v. 160, p. 12-pg., 2023-04-20.
Resumo

Context: The correct prediction of long-lived bugs could help maintenance teams to build their plan and to fix more bugs that often adversely affect software quality and disturb the user experience across versions in Free/Libre Open-Source Software (FLOSS). Machine Learning and Text Mining methods have been applied to solve many real-world prediction problems, including bug report handling.Objective: Our research aims to compare the accuracy of ML classifiers on long-lived bug prediction in FLOSS using Bidirectional Encoder Representations from Transformers (BERT)-and Term Frequency -Inverse Document Frequency (TF-IDF)-based feature extraction. Besides that, we aim to investigate BERT variants on the same task.Method: We collected bug reports from six popular FLOSS and used the Machine Learning classifiers to predict long-lived bugs. Furthermore, we compare different feature extractors, based on BERT and TF-IDF methods, in long-lived bug prediction.Results: We found that long-lived bug prediction using BERT-based feature extraction systematically outper-formed the TF-IDF. The SVM and Random Forest outperformed other classifiers in almost all datasets using BERT. Furthermore, smaller BERT architectures show themselves as competitive.Conclusion: Our results demonstrated a promising avenue to predict long-lived bugs based on BERT contextual embedding features and fine-tuning procedures. (AU)

Processo FAPESP: 13/50155-0 - Combining new technologies to monitor phenology from leaves to ecosystems
Beneficiário:Leonor Patricia Cerdeira Morellato
Modalidade de apoio: Auxílio à Pesquisa - Programa de Pesquisa sobre Mudanças Climáticas Globais - PITE
Processo FAPESP: 14/12236-1 - AnImaLS: Anotação de Imagem em Larga Escala: o que máquinas e especialistas podem aprender interagindo?
Beneficiário:Alexandre Xavier Falcão
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 16/50250-1 - O segredo de jogar futebol: Brasil versus Holanda
Beneficiário:Sergio Augusto Cunha
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 15/24494-8 - Comunicação e processamento de big data em nuvens e névoas computacionais
Beneficiário:Nelson Luis Saldanha da Fonseca
Modalidade de apoio: Auxílio à Pesquisa - Temático
Processo FAPESP: 14/50715-9 - Characterizing and predicting biomass production in sugarcane and eucalyptus plantations in Brazil
Beneficiário:Rubens Augusto Camargo Lamparelli
Modalidade de apoio: Auxílio à Pesquisa - Parceria para Inovação Tecnológica - PITE
Processo FAPESP: 17/20945-0 - EMU concedido no processo 16/50250-1: local positioning system
Beneficiário:Sergio Augusto Cunha
Modalidade de apoio: Auxílio à Pesquisa - Programa Equipamentos Multiusuários