Cross-language refactoring detection method based on edit sequence

Aiming at the problems of unreliable commit message caused by developers not consistently recording refactoring operations, and language singularityin deep learning-based refactoring detection methods, a cross-language refactoring detection method RefCode was proposed. Firstly, refactoring collectio...

Full description

Saved in:
Bibliographic Details
Main Authors: Tao LI, Dongwen ZHANG, Yang ZHANG, Kun ZHENG
Format: Article
Language:zho
Published: Hebei University of Science and Technology 2024-12-01
Series:Journal of Hebei University of Science and Technology
Subjects:
Online Access:https://xuebao.hebust.edu.cn/hbkjdx/article/pdf/b202406007?st=article_issue
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Aiming at the problems of unreliable commit message caused by developers not consistently recording refactoring operations, and language singularityin deep learning-based refactoring detection methods, a cross-language refactoring detection method RefCode was proposed. Firstly, refactoring collection tools were employed to collect commit messages, code change information, and refactoring types from different programming languages, the edit sequences were generated from the code change information, and all the data were combined to create a dataset. Secondly, the CodeBERT pre-training model was combined with the BiLSTM-attention model to train and test on the dataset. Finally, the effectiveness of the proposed method was evaluated from six perspectives. The results show that RefCode achieves a significant improvement of about 50% in both precision and recall compared to the refactoring detection method which only uses commit messages as inputs to the LSTM model. The research results realize cross-language refactoring detection and effectively compensate for the defect of unreliable commit messages, which provides some reference for the detection of other programming languages and refactoring types.
ISSN:1008-1542