Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects

Bug triaging–the process of classifying and assigning software issues to appropriate developers–is a critical yet challenging task in large-scale software development. Manual triaging is time-consuming, inconsistent, and prone to human bias, which often delays issue resolution...

Full description

Saved in:
Bibliographic Details
Main Authors: Nitanta Adhikari, Rabindra Bista, Joao Carlos Ferreira
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11106424/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849239082257874944
author Nitanta Adhikari
Rabindra Bista
Joao Carlos Ferreira
author_facet Nitanta Adhikari
Rabindra Bista
Joao Carlos Ferreira
author_sort Nitanta Adhikari
collection DOAJ
description Bug triaging–the process of classifying and assigning software issues to appropriate developers–is a critical yet challenging task in large-scale software development. Manual triaging is time-consuming, inconsistent, and prone to human bias, which often delays issue resolution and misallocates developer resources. This study explores the application of machine learning to automate and improve bug triaging efficiency and accuracy. Using a dataset of over 122,000 issues from the microsoft/vscode GitHub repository, we evaluate several machine learning models including Bidirectional LSTM, CNN-LSTM, Random Forest, and Multinomial Naive Bayes. Our primary contribution is the development of an Augmented Bidirectional LSTM model that integrates enriched textual features and contextual metadata. This model, optimized using Optuna, outperforms traditional baselines, achieving a Micro F1-score of 0.6469 and Hamming Loss of 0.0133 for label prediction, and a Micro F1-score of 0.5974 with Hamming Loss of 0.0062 for assignee recommendation. In addition to demonstrating strong predictive performance, we present a robust end-to-end pipeline for data preprocessing, augmentation, model training, and evaluation using multi-label classification techniques. The study highlights how deep learning architectures, in combination with feature engineering and hyperparameter tuning, can provide scalable and generalizable components to support the automation of bug triaging. These findings contribute to the growing field of intelligent software maintenance by offering data-driven approaches that can support developer workflows and improve issue management efficiency in open-source environments.
format Article
id doaj-art-2d231003b7da4ea59c8dd2a51088f9e9
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-2d231003b7da4ea59c8dd2a51088f9e92025-08-20T04:01:15ZengIEEEIEEE Access2169-35362025-01-011313623713625410.1109/ACCESS.2025.359501111106424Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software ProjectsNitanta Adhikari0https://orcid.org/0009-0005-9048-1577Rabindra Bista1https://orcid.org/0000-0002-0638-5840Joao Carlos Ferreira2https://orcid.org/0000-0002-6662-0806Department of Computer Science and Engineering, Kathmandu University, Kavre, Dhulikhel, NepalDepartment of Computer Science and Engineering, Kathmandu University, Kavre, Dhulikhel, NepalFaculty of Logistics, Molde University College, Molde, NorwayBug triaging–the process of classifying and assigning software issues to appropriate developers–is a critical yet challenging task in large-scale software development. Manual triaging is time-consuming, inconsistent, and prone to human bias, which often delays issue resolution and misallocates developer resources. This study explores the application of machine learning to automate and improve bug triaging efficiency and accuracy. Using a dataset of over 122,000 issues from the microsoft/vscode GitHub repository, we evaluate several machine learning models including Bidirectional LSTM, CNN-LSTM, Random Forest, and Multinomial Naive Bayes. Our primary contribution is the development of an Augmented Bidirectional LSTM model that integrates enriched textual features and contextual metadata. This model, optimized using Optuna, outperforms traditional baselines, achieving a Micro F1-score of 0.6469 and Hamming Loss of 0.0133 for label prediction, and a Micro F1-score of 0.5974 with Hamming Loss of 0.0062 for assignee recommendation. In addition to demonstrating strong predictive performance, we present a robust end-to-end pipeline for data preprocessing, augmentation, model training, and evaluation using multi-label classification techniques. The study highlights how deep learning architectures, in combination with feature engineering and hyperparameter tuning, can provide scalable and generalizable components to support the automation of bug triaging. These findings contribute to the growing field of intelligent software maintenance by offering data-driven approaches that can support developer workflows and improve issue management efficiency in open-source environments.https://ieeexplore.ieee.org/document/11106424/Bug triagingnatural language processing (NLP)multi-label classificationmodel evaluation metrics
spellingShingle Nitanta Adhikari
Rabindra Bista
Joao Carlos Ferreira
Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects
IEEE Access
Bug triaging
natural language processing (NLP)
multi-label classification
model evaluation metrics
title Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects
title_full Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects
title_fullStr Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects
title_full_unstemmed Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects
title_short Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects
title_sort leveraging machine learning for enhanced bug triaging in open source software projects
topic Bug triaging
natural language processing (NLP)
multi-label classification
model evaluation metrics
url https://ieeexplore.ieee.org/document/11106424/
work_keys_str_mv AT nitantaadhikari leveragingmachinelearningforenhancedbugtriaginginopensourcesoftwareprojects
AT rabindrabista leveragingmachinelearningforenhancedbugtriaginginopensourcesoftwareprojects
AT joaocarlosferreira leveragingmachinelearningforenhancedbugtriaginginopensourcesoftwareprojects