Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques

The proliferation of malicious webpages presents a growing threat to online security, necessitating advanced detection methods to mitigate risks. This paper proposes a novel approach that integrates Natural Language Processing (NLP) techniques with an ensemble of machine learning models for the proa...

Full description

Saved in:

Bibliographic Details
Main Authors:	Althaf Ali A, Rama Devi K, Syed Siraj Ahmed N, Ramchandran P, Parvathi S
Format:	Article
Language:	English
Published:	University of Zagreb, Faculty of organization and informatics 2024-01-01
Series:	Journal of Information and Organizational Sciences
Subjects:	Count Term frequency and Inverse document frequency Machine learning model Phishing Malicious webpages
Online Access:	https://hrcak.srce.hr/file/471912
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841550069695774720
author	Althaf Ali A Rama Devi K Syed Siraj Ahmed N Ramchandran P Parvathi S
author_facet	Althaf Ali A Rama Devi K Syed Siraj Ahmed N Ramchandran P Parvathi S
author_sort	Althaf Ali A
collection	DOAJ
description	The proliferation of malicious webpages presents a growing threat to online security, necessitating advanced detection methods to mitigate risks. This paper proposes a novel approach that integrates Natural Language Processing (NLP) techniques with an ensemble of machine learning models for the proactive detection of malicious web content. By leveraging semantic analysis, lexical patterns, and metadata extraction, the proposed framework enhances the identification of suspicious patterns in web page content. The ensemble model combines decision trees, random forests, and gradient boosting methods, optimizing classification accuracy and reducing false positives. A comprehensive evaluation using a large dataset of web pages, including both benign and malicious examples, demonstrates the superiority of the proposed method over traditional single-model approaches. With accuracy rates exceeding 98%, this framework achieves a robust, scalable solution for real-time web content analysis, providing a critical tool for cybersecurity professionals to detect and block malicious webpages before they can cause harm. Future directions include the integration of deep learning architectures and adaptive filtering techniques to further refine detection capabilities.
format	Article
id	doaj-art-f7868609338142deb1f2d3c54b404751
institution	Kabale University
issn	1846-3312 1846-9418
language	English
publishDate	2024-01-01
publisher	University of Zagreb, Faculty of organization and informatics
record_format	Article
series	Journal of Information and Organizational Sciences
spelling	doaj-art-f7868609338142deb1f2d3c54b4047512025-01-10T10:06:03ZengUniversity of Zagreb, Faculty of organization and informaticsJournal of Information and Organizational Sciences1846-33121846-94182024-01-0148229530910.31341/jios.48.2.4Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning TechniquesAlthaf Ali A0Rama Devi K1Syed Siraj Ahmed N2Ramchandran P3Parvathi S4Department of Computer Application, Madanapalle Institute of Technology & Science, Madanapalle, IndiaDepartment of Information Technology, Panimalar Engineering College, Chennai, IndiaSchool of Computer Science Engineering and Information Science, Presidency University, Bangalore, IndiaDepartment of computer Application, Parul institute of engineering and technology, Parul University, P.O.limda, Tal.waghodia, Dist.Vadodra, IndiaDepartment of Computer Science and Engineering, Erode Sengunthar Engineering College, Erode, IndiaThe proliferation of malicious webpages presents a growing threat to online security, necessitating advanced detection methods to mitigate risks. This paper proposes a novel approach that integrates Natural Language Processing (NLP) techniques with an ensemble of machine learning models for the proactive detection of malicious web content. By leveraging semantic analysis, lexical patterns, and metadata extraction, the proposed framework enhances the identification of suspicious patterns in web page content. The ensemble model combines decision trees, random forests, and gradient boosting methods, optimizing classification accuracy and reducing false positives. A comprehensive evaluation using a large dataset of web pages, including both benign and malicious examples, demonstrates the superiority of the proposed method over traditional single-model approaches. With accuracy rates exceeding 98%, this framework achieves a robust, scalable solution for real-time web content analysis, providing a critical tool for cybersecurity professionals to detect and block malicious webpages before they can cause harm. Future directions include the integration of deep learning architectures and adaptive filtering techniques to further refine detection capabilities.https://hrcak.srce.hr/file/471912CountTerm frequency and Inverse document frequencyMachine learning modelPhishingMalicious webpages
spellingShingle	Althaf Ali A Rama Devi K Syed Siraj Ahmed N Ramchandran P Parvathi S Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques Journal of Information and Organizational Sciences Count Term frequency and Inverse document frequency Machine learning model Phishing Malicious webpages
title	Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques
title_full	Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques
title_fullStr	Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques
title_full_unstemmed	Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques
title_short	Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques
title_sort	proactive detection of malicious webpages using hybrid natural language processing and ensemble learning techniques
topic	Count Term frequency and Inverse document frequency Machine learning model Phishing Malicious webpages
url	https://hrcak.srce.hr/file/471912
work_keys_str_mv	AT althafalia proactivedetectionofmaliciouswebpagesusinghybridnaturallanguageprocessingandensemblelearningtechniques AT ramadevik proactivedetectionofmaliciouswebpagesusinghybridnaturallanguageprocessingandensemblelearningtechniques AT syedsirajahmedn proactivedetectionofmaliciouswebpagesusinghybridnaturallanguageprocessingandensemblelearningtechniques AT ramchandranp proactivedetectionofmaliciouswebpagesusinghybridnaturallanguageprocessingandensemblelearningtechniques AT parvathis proactivedetectionofmaliciouswebpagesusinghybridnaturallanguageprocessingandensemblelearningtechniques

Proactive Detection of Malicious Webpages Using Hybrid Natural Language Processing and Ensemble Learning Techniques

Similar Items