LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach

Email phishing remains a prevalent and sophisticated cyber threat, targeting individuals and organizations by disguising malicious intent in seemingly legitimate communications. Effective classification of phishing and legitimate emails is crucial for cybersecurity. In this study, we investigated va...

Full description

Saved in:

Bibliographic Details
Main Authors:	Aqsa Khalid, Maria Hanif, Abdul Hameed, Zeeshan Ashraf, Mrim M. Alnfiai, Salma M. Mohsen Alnefaie
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	TF-IDF Word2Vec Doc2Vec LogiTriBlend SVM XGBoost
Online Access:	https://ieeexplore.ieee.org/document/10804110/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846107051883757568
author	Aqsa Khalid Maria Hanif Abdul Hameed Zeeshan Ashraf Mrim M. Alnfiai Salma M. Mohsen Alnefaie
author_facet	Aqsa Khalid Maria Hanif Abdul Hameed Zeeshan Ashraf Mrim M. Alnfiai Salma M. Mohsen Alnefaie
author_sort	Aqsa Khalid
collection	DOAJ
description	Email phishing remains a prevalent and sophisticated cyber threat, targeting individuals and organizations by disguising malicious intent in seemingly legitimate communications. Effective classification of phishing and legitimate emails is crucial for cybersecurity. In this study, we investigated various text vectorization techniques and machine learning models to address the challenge of email classification. We utilized three vectorization techniques: TF-IDF, Word2Vec, and Doc2Vec. These techniques were applied to traditional machine learning algorithms, and their performance was evaluated against a proposed stacking model, LogiTriBlend. The dataset comprised 501 phishing and 4090 legitimate emails, undergoing preprocessing steps like stemming, lemmatization, and noise removal. To handle the dataset’s imbalance, Synthetic Minority Over-sampling Technique (SMOTE) was employed. The model combines multiple base learners, including Support Vector Machine (SVM), Logistic Regression, Random Forest, and XGBoost, with a Logistic Regression meta-learner. The experimental results indicated that the LogiTriBlend model achieved an accuracy of 99.34% using Doc2Vec, outperforming Word2Vec and TF-IDF feature extraction methods, which obtained accuracies of 99.12% and 98.80%, respectively. The Doc2Vec method resulting in superior email classification performance. Among the models tested, the proposed stacking model, LogiTriBlend, demonstrated robust results; however, the highest accuracy was consistently achieved using Doc2Vec.
format	Article
id	doaj-art-3bd4a7ca1cc94ff5bf5cf9a8b2adb97b
institution	Kabale University
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-3bd4a7ca1cc94ff5bf5cf9a8b2adb97b2024-12-27T00:00:52ZengIEEEIEEE Access2169-35362024-01-011219380719382110.1109/ACCESS.2024.351892310804110LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization ApproachAqsa Khalid0Maria Hanif1Abdul Hameed2https://orcid.org/0000-0002-6842-8631Zeeshan Ashraf3https://orcid.org/0000-0002-2700-5982Mrim M. Alnfiai4https://orcid.org/0000-0003-3837-6313Salma M. Mohsen Alnefaie5School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad, PakistanSchool of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad, PakistanDepartment of Computer Science, The University of Chenab Gujrat, Gujrat, Punjab, PakistanDepartment of Computer Science, Faculty of Computing and IT, IISAT, Gujranwala, Punjab, PakistanDepartment of Information Technology, College of Computers and Information Technology, Taif University, Taif, Saudi ArabiaPhysics Department, Taif University, Taif, Saudi ArabiaEmail phishing remains a prevalent and sophisticated cyber threat, targeting individuals and organizations by disguising malicious intent in seemingly legitimate communications. Effective classification of phishing and legitimate emails is crucial for cybersecurity. In this study, we investigated various text vectorization techniques and machine learning models to address the challenge of email classification. We utilized three vectorization techniques: TF-IDF, Word2Vec, and Doc2Vec. These techniques were applied to traditional machine learning algorithms, and their performance was evaluated against a proposed stacking model, LogiTriBlend. The dataset comprised 501 phishing and 4090 legitimate emails, undergoing preprocessing steps like stemming, lemmatization, and noise removal. To handle the dataset’s imbalance, Synthetic Minority Over-sampling Technique (SMOTE) was employed. The model combines multiple base learners, including Support Vector Machine (SVM), Logistic Regression, Random Forest, and XGBoost, with a Logistic Regression meta-learner. The experimental results indicated that the LogiTriBlend model achieved an accuracy of 99.34% using Doc2Vec, outperforming Word2Vec and TF-IDF feature extraction methods, which obtained accuracies of 99.12% and 98.80%, respectively. The Doc2Vec method resulting in superior email classification performance. Among the models tested, the proposed stacking model, LogiTriBlend, demonstrated robust results; however, the highest accuracy was consistently achieved using Doc2Vec.https://ieeexplore.ieee.org/document/10804110/TF-IDFWord2VecDoc2VecLogiTriBlendSVMXGBoost
spellingShingle	Aqsa Khalid Maria Hanif Abdul Hameed Zeeshan Ashraf Mrim M. Alnfiai Salma M. Mohsen Alnefaie LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach IEEE Access TF-IDF Word2Vec Doc2Vec LogiTriBlend SVM XGBoost
title	LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach
title_full	LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach
title_fullStr	LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach
title_full_unstemmed	LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach
title_short	LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach
title_sort	logitriblend a novel hybrid stacking approach for enhanced phishing email detection using ml models and vectorization approach
topic	TF-IDF Word2Vec Doc2Vec LogiTriBlend SVM XGBoost
url	https://ieeexplore.ieee.org/document/10804110/
work_keys_str_mv	AT aqsakhalid logitriblendanovelhybridstackingapproachforenhancedphishingemaildetectionusingmlmodelsandvectorizationapproach AT mariahanif logitriblendanovelhybridstackingapproachforenhancedphishingemaildetectionusingmlmodelsandvectorizationapproach AT abdulhameed logitriblendanovelhybridstackingapproachforenhancedphishingemaildetectionusingmlmodelsandvectorizationapproach AT zeeshanashraf logitriblendanovelhybridstackingapproachforenhancedphishingemaildetectionusingmlmodelsandvectorizationapproach AT mrimmalnfiai logitriblendanovelhybridstackingapproachforenhancedphishingemaildetectionusingmlmodelsandvectorizationapproach AT salmammohsenalnefaie logitriblendanovelhybridstackingapproachforenhancedphishingemaildetectionusingmlmodelsandvectorizationapproach

LogiTriBlend: A Novel Hybrid Stacking Approach for Enhanced Phishing Email Detection Using ML Models and Vectorization Approach

Similar Items