Research on medical small sample data classification based on SMOTE and gcForest
Aiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the represent...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
China InfoCom Media Group
2023-06-01
|
Series: | 物联网学报 |
Subjects: | |
Online Access: | http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00337/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Aiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the representation learning ability of the model by adding random sampling into the multi-grained scanning and optimizing the transformation features.It also enhances the model's classification ability by updating the cascade forest’s hierarchical structure.Considering category imbalance problems in datasets, the safe-borderline-SMOTE (SBS) algorithm was proposed to dynamic interpolate around the few class samples belonging to the safety boundary, which can improve the quality of training data.The cgicForest was applied for training and learning, thus the SBS-cgicForest classification model was obtained which can support imbalanced medical small samples data.The model is used on three medical datasets for classification experiments.The results show that the performance indexes of the cgicForest model in the classification of medical small sample data with complex characteristics have increased by 4.1~5.4 percentage points, compared with the multi-grained cascade forest (gcForest) model.The performance indexes have increase by 6.6~11.2 percentage points after the combination with SBS algorithm, the F<sub>1</sub> score was 2~2.5 percentage points higher than that obtained by traditional sampling methods.It provides a reference for solving the classification problem of small medical sample data, and includes support for internet of things applications in smart medical scenarios. |
---|---|
ISSN: | 2096-3750 |