Feature selection method for software defect number prediction based on maximum information coefficient

The traditional feature selection method only considers the linear correlation between variables and ignores the nonlinear correlation, so it is difficult to select effective feature subsets to build the effective model to predict the number of faults in software modules.Considering the linear and n...

Full description

Saved in:
Bibliographic Details
Main Authors: Guoqing LIU, Xingqi WANG, Dan WEI, Jinglong FANG, Yanli SHAO
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2021-05-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2021025/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The traditional feature selection method only considers the linear correlation between variables and ignores the nonlinear correlation, so it is difficult to select effective feature subsets to build the effective model to predict the number of faults in software modules.Considering the linear and nonlinear relationship, a feature selection method based on maximum information coefficient (MIC) was proposed.The proposed method separated the redundancy analysis and correlation analysis into two phases.In the previous phase, the cluster algorithm, which was based on the correlation between features, was used to divide the redundant features into the same cluster.In the later phase, the features in each cluster were sorted in descending order according to the correlation between features and the number of software defects, and then the top features were selected to form the feature subset.The experimental results show that the proposed method can improve the prediction performance of software defect number prediction model by effectively removing redundant and irrelevant features.
ISSN:1000-0801