Research on medical small sample data classification based on SMOTE and gcForest

Aiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the represent...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenchang LIU, Yun WEI, Haoxuan YUAN, Yue GAO
Format: Article
Language:zho
Published: China InfoCom Media Group 2023-06-01
Series:物联网学报
Subjects:
Online Access:http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00337/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Aiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the representation learning ability of the model by adding random sampling into the multi-grained scanning and optimizing the transformation features.It also enhances the model's classification ability by updating the cascade forest’s hierarchical structure.Considering category imbalance problems in datasets, the safe-borderline-SMOTE (SBS) algorithm was proposed to dynamic interpolate around the few class samples belonging to the safety boundary, which can improve the quality of training data.The cgicForest was applied for training and learning, thus the SBS-cgicForest classification model was obtained which can support imbalanced medical small samples data.The model is used on three medical datasets for classification experiments.The results show that the performance indexes of the cgicForest model in the classification of medical small sample data with complex characteristics have increased by 4.1~5.4 percentage points, compared with the multi-grained cascade forest (gcForest) model.The performance indexes have increase by 6.6~11.2 percentage points after the combination with SBS algorithm, the F<sub>1</sub> score was 2~2.5 percentage points higher than that obtained by traditional sampling methods.It provides a reference for solving the classification problem of small medical sample data, and includes support for internet of things applications in smart medical scenarios.
ISSN:2096-3750