Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin Dataset

Malware, or malicious software, continues to evolve alongside increasing cyberattacks targeting individual devices and critical infrastructure. Traditional detection methods, such as signature-based detection, are often ineffective against new or polymorphic malware. Therefore, advanced malware dete...

Full description

Saved in:
Bibliographic Details
Main Authors: Ines Aulia Latifah, Fauzi Adi Rafrastara, Jevan Bintoro, Wildanil Ghozi, Waleed Mahgoub Osman
Format: Article
Language:English
Published: LPPM ISB Atma Luhur 2024-11-01
Series:Jurnal Sisfokom
Subjects:
Online Access:https://jurnal.atmaluhur.ac.id/index.php/sisfokom/article/view/2294
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846157786364248064
author Ines Aulia Latifah
Fauzi Adi Rafrastara
Jevan Bintoro
Wildanil Ghozi
Waleed Mahgoub Osman
author_facet Ines Aulia Latifah
Fauzi Adi Rafrastara
Jevan Bintoro
Wildanil Ghozi
Waleed Mahgoub Osman
author_sort Ines Aulia Latifah
collection DOAJ
description Malware, or malicious software, continues to evolve alongside increasing cyberattacks targeting individual devices and critical infrastructure. Traditional detection methods, such as signature-based detection, are often ineffective against new or polymorphic malware. Therefore, advanced malware detection methods are increasingly needed to counter these evolving threats. This study aims to compare the performance of various feature selection methods combined with the XGBoost algorithm for malware detection using the Drebin dataset, and to identify the best feature selection method to enhance accuracy and efficiency. The experimental results show that XGBoost with the Information Gain method achieves the highest accuracy of 98.7%, with faster training times than other methods like Chi-Squared and ANOVA, which each achieved an accuracy of 98.3%. Information Gain yielded the best performance in accuracy and training time efficiency, while Chi-Squared and ANOVA offered competitive but slightly lower results. This study highlights that appropriate feature selection within machine learning algorithms can significantly improve malware detection accuracy, potentially aiding in real-world cybersecurity applications to prevent harmful cyberattacks.
format Article
id doaj-art-a62bbeebf7a242c084d7939df2d9bc23
institution Kabale University
issn 2301-7988
2581-0588
language English
publishDate 2024-11-01
publisher LPPM ISB Atma Luhur
record_format Article
series Jurnal Sisfokom
spelling doaj-art-a62bbeebf7a242c084d7939df2d9bc232024-11-25T04:41:49ZengLPPM ISB Atma LuhurJurnal Sisfokom2301-79882581-05882024-11-0113340340910.32736/sisfokom.v13i3.2294902Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin DatasetInes Aulia Latifah0Fauzi Adi Rafrastara1Jevan Bintoro2Wildanil Ghozi3Waleed Mahgoub Osman4Department of Informatics Engineering, Faculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaDepartment of Informatics Engineering, Faculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaDepartment of Informatics Engineering, Faculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaDepartment of Informatics Engineering, Faculty of Computer Science, Universitas Dian Nuswantoro, IndonesiaMathematics Department, College of Education Sudan University od Science and TechnologyMalware, or malicious software, continues to evolve alongside increasing cyberattacks targeting individual devices and critical infrastructure. Traditional detection methods, such as signature-based detection, are often ineffective against new or polymorphic malware. Therefore, advanced malware detection methods are increasingly needed to counter these evolving threats. This study aims to compare the performance of various feature selection methods combined with the XGBoost algorithm for malware detection using the Drebin dataset, and to identify the best feature selection method to enhance accuracy and efficiency. The experimental results show that XGBoost with the Information Gain method achieves the highest accuracy of 98.7%, with faster training times than other methods like Chi-Squared and ANOVA, which each achieved an accuracy of 98.3%. Information Gain yielded the best performance in accuracy and training time efficiency, while Chi-Squared and ANOVA offered competitive but slightly lower results. This study highlights that appropriate feature selection within machine learning algorithms can significantly improve malware detection accuracy, potentially aiding in real-world cybersecurity applications to prevent harmful cyberattacks.https://jurnal.atmaluhur.ac.id/index.php/sisfokom/article/view/2294android malware detectiondrebininformation gainxgboostmachine learning
spellingShingle Ines Aulia Latifah
Fauzi Adi Rafrastara
Jevan Bintoro
Wildanil Ghozi
Waleed Mahgoub Osman
Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin Dataset
Jurnal Sisfokom
android malware detection
drebin
information gain
xgboost
machine learning
title Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin Dataset
title_full Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin Dataset
title_fullStr Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin Dataset
title_full_unstemmed Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin Dataset
title_short Comparative Analysis of Feature Selection Methods with XGBoost for Malware Detection on the Drebin Dataset
title_sort comparative analysis of feature selection methods with xgboost for malware detection on the drebin dataset
topic android malware detection
drebin
information gain
xgboost
machine learning
url https://jurnal.atmaluhur.ac.id/index.php/sisfokom/article/view/2294
work_keys_str_mv AT inesaulialatifah comparativeanalysisoffeatureselectionmethodswithxgboostformalwaredetectiononthedrebindataset
AT fauziadirafrastara comparativeanalysisoffeatureselectionmethodswithxgboostformalwaredetectiononthedrebindataset
AT jevanbintoro comparativeanalysisoffeatureselectionmethodswithxgboostformalwaredetectiononthedrebindataset
AT wildanilghozi comparativeanalysisoffeatureselectionmethodswithxgboostformalwaredetectiononthedrebindataset
AT waleedmahgoubosman comparativeanalysisoffeatureselectionmethodswithxgboostformalwaredetectiononthedrebindataset