IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST

In this research, XGBoost algorithm and the SMOTETomek approach are employed with the objective of enhancing the accuracy of diabetes classification. The study utilises 2,000 patient data points, comprising demographic and medical information, sourced from Kaggle. The dataset employed in this study...

Full description

Saved in:
Bibliographic Details
Main Authors: Fatwa Ratantja Kusumajati, Basuki Rahmat, Achmad Junaidi
Format: Article
Language:English
Published: Informatics Department, Engineering Faculty 2024-12-01
Series:Jurnal Ilmiah Kursor: Menuju Solusi Teknologi Informasi
Subjects:
Online Access:http://www.kursorjournal.org/index.php/kursor/article/view/410
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544154960625664
author Fatwa Ratantja Kusumajati
Basuki Rahmat
Achmad Junaidi
author_facet Fatwa Ratantja Kusumajati
Basuki Rahmat
Achmad Junaidi
author_sort Fatwa Ratantja Kusumajati
collection DOAJ
description In this research, XGBoost algorithm and the SMOTETomek approach are employed with the objective of enhancing the accuracy of diabetes classification. The study utilises 2,000 patient data points, comprising demographic and medical information, sourced from Kaggle. The dataset employed in this study comprises a number of variables, including pregnancies, glucose levels, blood pressure, skin thickness, insulin levels, Body Mass Index (BMI), diabetes pedigree function, age, and an outcome variable. The latter is a binary classification label, taking on the values 0 and 1. A value of 0 indicates that the patient is not affected by diabetes, whereas a value of 1 indicates that the patient has diabetes. Diabetes represents a significant public health concern in Indonesia. A significant challenge in this study was the imbalanced nature of the dataset, which included a disproportionate number of non-diabetic samples relative to diabetic samples. To address this class imbalance, the researchers employed the SMOTETomek method. SMOTETomek integrates the SMOTE (Synthetic Minority Over-sampling Technique) and Tomek links algorithms to oversample the minority class and remove borderline samples, thereby balancing the class distributions. The SMOTETomek method achieved higher accuracy (95.01%) than SMOTE and the original data (both 92.13%), highlighting the benefits of combining SMOTE with Tomek Links for XGBoost. During testing, SMOTETomek slightly reduced the minority class accuracy (0.97 vs. 0.99 for SMOTE and original data) but maintained strong F1-score and precision, indicating effective handling of data imbalance despite minor trade-offs.
format Article
id doaj-art-b5fe845077e94968bddf2defa74f071c
institution Kabale University
issn 0216-0544
2301-6914
language English
publishDate 2024-12-01
publisher Informatics Department, Engineering Faculty
record_format Article
series Jurnal Ilmiah Kursor: Menuju Solusi Teknologi Informasi
spelling doaj-art-b5fe845077e94968bddf2defa74f071c2025-01-12T15:53:13ZengInformatics Department, Engineering FacultyJurnal Ilmiah Kursor: Menuju Solusi Teknologi Informasi0216-05442301-69142024-12-0112410.21107/kursor.v12i4.410IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOSTFatwa Ratantja Kusumajati0Basuki Rahmat1Achmad Junaidi2UPN "Veteran" Jawa TimurUPN "Veteran" Jawa TimurUPN "Veteran" Jawa Timur In this research, XGBoost algorithm and the SMOTETomek approach are employed with the objective of enhancing the accuracy of diabetes classification. The study utilises 2,000 patient data points, comprising demographic and medical information, sourced from Kaggle. The dataset employed in this study comprises a number of variables, including pregnancies, glucose levels, blood pressure, skin thickness, insulin levels, Body Mass Index (BMI), diabetes pedigree function, age, and an outcome variable. The latter is a binary classification label, taking on the values 0 and 1. A value of 0 indicates that the patient is not affected by diabetes, whereas a value of 1 indicates that the patient has diabetes. Diabetes represents a significant public health concern in Indonesia. A significant challenge in this study was the imbalanced nature of the dataset, which included a disproportionate number of non-diabetic samples relative to diabetic samples. To address this class imbalance, the researchers employed the SMOTETomek method. SMOTETomek integrates the SMOTE (Synthetic Minority Over-sampling Technique) and Tomek links algorithms to oversample the minority class and remove borderline samples, thereby balancing the class distributions. The SMOTETomek method achieved higher accuracy (95.01%) than SMOTE and the original data (both 92.13%), highlighting the benefits of combining SMOTE with Tomek Links for XGBoost. During testing, SMOTETomek slightly reduced the minority class accuracy (0.97 vs. 0.99 for SMOTE and original data) but maintained strong F1-score and precision, indicating effective handling of data imbalance despite minor trade-offs. http://www.kursorjournal.org/index.php/kursor/article/view/410Balancing DataDiabetes ClassificationSMOTETomekXGBOOST
spellingShingle Fatwa Ratantja Kusumajati
Basuki Rahmat
Achmad Junaidi
IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST
Jurnal Ilmiah Kursor: Menuju Solusi Teknologi Informasi
Balancing Data
Diabetes Classification
SMOTETomek
XGBOOST
title IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST
title_full IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST
title_fullStr IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST
title_full_unstemmed IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST
title_short IMPLEMENTATION OF BALANCING DATA METHOD USING SMOTETOMEK IN DIABETES CLASSIFICATION USING XGBOOST
title_sort implementation of balancing data method using smotetomek in diabetes classification using xgboost
topic Balancing Data
Diabetes Classification
SMOTETomek
XGBOOST
url http://www.kursorjournal.org/index.php/kursor/article/view/410
work_keys_str_mv AT fatwaratantjakusumajati implementationofbalancingdatamethodusingsmotetomekindiabetesclassificationusingxgboost
AT basukirahmat implementationofbalancingdatamethodusingsmotetomekindiabetesclassificationusingxgboost
AT achmadjunaidi implementationofbalancingdatamethodusingsmotetomekindiabetesclassificationusingxgboost