Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects
Bug triaging–the process of classifying and assigning software issues to appropriate developers–is a critical yet challenging task in large-scale software development. Manual triaging is time-consuming, inconsistent, and prone to human bias, which often delays issue resolution...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11106424/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849239082257874944 |
|---|---|
| author | Nitanta Adhikari Rabindra Bista Joao Carlos Ferreira |
| author_facet | Nitanta Adhikari Rabindra Bista Joao Carlos Ferreira |
| author_sort | Nitanta Adhikari |
| collection | DOAJ |
| description | Bug triaging–the process of classifying and assigning software issues to appropriate developers–is a critical yet challenging task in large-scale software development. Manual triaging is time-consuming, inconsistent, and prone to human bias, which often delays issue resolution and misallocates developer resources. This study explores the application of machine learning to automate and improve bug triaging efficiency and accuracy. Using a dataset of over 122,000 issues from the microsoft/vscode GitHub repository, we evaluate several machine learning models including Bidirectional LSTM, CNN-LSTM, Random Forest, and Multinomial Naive Bayes. Our primary contribution is the development of an Augmented Bidirectional LSTM model that integrates enriched textual features and contextual metadata. This model, optimized using Optuna, outperforms traditional baselines, achieving a Micro F1-score of 0.6469 and Hamming Loss of 0.0133 for label prediction, and a Micro F1-score of 0.5974 with Hamming Loss of 0.0062 for assignee recommendation. In addition to demonstrating strong predictive performance, we present a robust end-to-end pipeline for data preprocessing, augmentation, model training, and evaluation using multi-label classification techniques. The study highlights how deep learning architectures, in combination with feature engineering and hyperparameter tuning, can provide scalable and generalizable components to support the automation of bug triaging. These findings contribute to the growing field of intelligent software maintenance by offering data-driven approaches that can support developer workflows and improve issue management efficiency in open-source environments. |
| format | Article |
| id | doaj-art-2d231003b7da4ea59c8dd2a51088f9e9 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-2d231003b7da4ea59c8dd2a51088f9e92025-08-20T04:01:15ZengIEEEIEEE Access2169-35362025-01-011313623713625410.1109/ACCESS.2025.359501111106424Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software ProjectsNitanta Adhikari0https://orcid.org/0009-0005-9048-1577Rabindra Bista1https://orcid.org/0000-0002-0638-5840Joao Carlos Ferreira2https://orcid.org/0000-0002-6662-0806Department of Computer Science and Engineering, Kathmandu University, Kavre, Dhulikhel, NepalDepartment of Computer Science and Engineering, Kathmandu University, Kavre, Dhulikhel, NepalFaculty of Logistics, Molde University College, Molde, NorwayBug triaging–the process of classifying and assigning software issues to appropriate developers–is a critical yet challenging task in large-scale software development. Manual triaging is time-consuming, inconsistent, and prone to human bias, which often delays issue resolution and misallocates developer resources. This study explores the application of machine learning to automate and improve bug triaging efficiency and accuracy. Using a dataset of over 122,000 issues from the microsoft/vscode GitHub repository, we evaluate several machine learning models including Bidirectional LSTM, CNN-LSTM, Random Forest, and Multinomial Naive Bayes. Our primary contribution is the development of an Augmented Bidirectional LSTM model that integrates enriched textual features and contextual metadata. This model, optimized using Optuna, outperforms traditional baselines, achieving a Micro F1-score of 0.6469 and Hamming Loss of 0.0133 for label prediction, and a Micro F1-score of 0.5974 with Hamming Loss of 0.0062 for assignee recommendation. In addition to demonstrating strong predictive performance, we present a robust end-to-end pipeline for data preprocessing, augmentation, model training, and evaluation using multi-label classification techniques. The study highlights how deep learning architectures, in combination with feature engineering and hyperparameter tuning, can provide scalable and generalizable components to support the automation of bug triaging. These findings contribute to the growing field of intelligent software maintenance by offering data-driven approaches that can support developer workflows and improve issue management efficiency in open-source environments.https://ieeexplore.ieee.org/document/11106424/Bug triagingnatural language processing (NLP)multi-label classificationmodel evaluation metrics |
| spellingShingle | Nitanta Adhikari Rabindra Bista Joao Carlos Ferreira Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects IEEE Access Bug triaging natural language processing (NLP) multi-label classification model evaluation metrics |
| title | Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects |
| title_full | Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects |
| title_fullStr | Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects |
| title_full_unstemmed | Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects |
| title_short | Leveraging Machine Learning for Enhanced Bug Triaging in Open-Source Software Projects |
| title_sort | leveraging machine learning for enhanced bug triaging in open source software projects |
| topic | Bug triaging natural language processing (NLP) multi-label classification model evaluation metrics |
| url | https://ieeexplore.ieee.org/document/11106424/ |
| work_keys_str_mv | AT nitantaadhikari leveragingmachinelearningforenhancedbugtriaginginopensourcesoftwareprojects AT rabindrabista leveragingmachinelearningforenhancedbugtriaginginopensourcesoftwareprojects AT joaocarlosferreira leveragingmachinelearningforenhancedbugtriaginginopensourcesoftwareprojects |