A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5

Abstract In recent years, research on opinion mining from X (formerly Twitter) has rapidly advanced, focusing on processing tweets to determine user sentiments about events. Many researchers prefer using machine and deep learning techniques for this analysis. This work proposes a novel approach inte...

Full description

Saved in:
Bibliographic Details
Main Authors: Fatima Es-sabery, Ibrahim Es-sabery, Junaid Qadir, Beatriz Sainz-de-Abajo, Begonya Garcia-Zapirain
Format: Article
Language:English
Published: SpringerOpen 2024-12-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-024-01014-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846112466701910016
author Fatima Es-sabery
Ibrahim Es-sabery
Junaid Qadir
Beatriz Sainz-de-Abajo
Begonya Garcia-Zapirain
author_facet Fatima Es-sabery
Ibrahim Es-sabery
Junaid Qadir
Beatriz Sainz-de-Abajo
Begonya Garcia-Zapirain
author_sort Fatima Es-sabery
collection DOAJ
description Abstract In recent years, research on opinion mining from X (formerly Twitter) has rapidly advanced, focusing on processing tweets to determine user sentiments about events. Many researchers prefer using machine and deep learning techniques for this analysis. This work proposes a novel approach integrating the C4.5 procedure, fuzzy rule patterns, and convolutional neural networks. The approach involves six steps: pre-processing to remove noisy data, vectorizing tweets with word embedding, extracting sentiment and contextual features using convolutional neural networks, fuzzifying outputs with a Gaussian fuzzifier to handle ambiguity, constructing a fuzzy tree and rule base using a fuzzy version of C4.5, and classifying tweets with fuzzy General Reasoning. This method combines the benefits of convolutional neural networks and C4.5 while addressing imprecise data with fuzzy logic. Implemented on a Hadoop framework-based cluster with five computing units, the approach was extensively tested. The results showed that the model performs exceptionally well on the COVID-19_Sentiments dataset, surpassing other classification algorithms with a precision rate of 94.56%, false-negative rate of 5.28%, classification rate of 95.15%, F1-score of 94.63%, kappa statistic of 95.12%, execution time of 11.81 s, false-positive rate of 4.26%, error rate of 4.26%, specificity of 95.74%, recall of 94.72%, stability with a mean deviation standard of 0.09%, convergence starting around the 75th round, and significantly reduced complexity in terms of time and space.
format Article
id doaj-art-1bd226a05d6e405388a6ec6549c8aa7a
institution Kabale University
issn 2196-1115
language English
publishDate 2024-12-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-1bd226a05d6e405388a6ec6549c8aa7a2024-12-22T12:31:05ZengSpringerOpenJournal of Big Data2196-11152024-12-0111115510.1186/s40537-024-01014-4A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5Fatima Es-sabery0Ibrahim Es-sabery1Junaid Qadir2Beatriz Sainz-de-Abajo3Begonya Garcia-Zapirain4Department of computer science, Sultan Moulay Slimane UniversityDepartment of computer science, Sultan Moulay Slimane UniversityDepartment of Electrical, Electronic, and Telecommunications Engineering and Naval Architecture, University of GenoaDepartment of Signal Theory, Communications, and Telematics Engineering, Universidad de ValladolideVIDA Research Group, University of DeustoAbstract In recent years, research on opinion mining from X (formerly Twitter) has rapidly advanced, focusing on processing tweets to determine user sentiments about events. Many researchers prefer using machine and deep learning techniques for this analysis. This work proposes a novel approach integrating the C4.5 procedure, fuzzy rule patterns, and convolutional neural networks. The approach involves six steps: pre-processing to remove noisy data, vectorizing tweets with word embedding, extracting sentiment and contextual features using convolutional neural networks, fuzzifying outputs with a Gaussian fuzzifier to handle ambiguity, constructing a fuzzy tree and rule base using a fuzzy version of C4.5, and classifying tweets with fuzzy General Reasoning. This method combines the benefits of convolutional neural networks and C4.5 while addressing imprecise data with fuzzy logic. Implemented on a Hadoop framework-based cluster with five computing units, the approach was extensively tested. The results showed that the model performs exceptionally well on the COVID-19_Sentiments dataset, surpassing other classification algorithms with a precision rate of 94.56%, false-negative rate of 5.28%, classification rate of 95.15%, F1-score of 94.63%, kappa statistic of 95.12%, execution time of 11.81 s, false-positive rate of 4.26%, error rate of 4.26%, specificity of 95.74%, recall of 94.72%, stability with a mean deviation standard of 0.09%, convergence starting around the 75th round, and significantly reduced complexity in terms of time and space.https://doi.org/10.1186/s40537-024-01014-4Fuzzy version of C4.5 procedureConvolutional neural networkFuzzy rule patternHadoop frameworkX opinion miningSentiment analysis
spellingShingle Fatima Es-sabery
Ibrahim Es-sabery
Junaid Qadir
Beatriz Sainz-de-Abajo
Begonya Garcia-Zapirain
A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5
Journal of Big Data
Fuzzy version of C4.5 procedure
Convolutional neural network
Fuzzy rule pattern
Hadoop framework
X opinion mining
Sentiment analysis
title A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5
title_full A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5
title_fullStr A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5
title_full_unstemmed A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5
title_short A hybrid Hadoop-based sentiment analysis classifier for tweets associated with COVID-19 utilizing two machine learning algorithms: CNN, and fuzzy C4.5
title_sort hybrid hadoop based sentiment analysis classifier for tweets associated with covid 19 utilizing two machine learning algorithms cnn and fuzzy c4 5
topic Fuzzy version of C4.5 procedure
Convolutional neural network
Fuzzy rule pattern
Hadoop framework
X opinion mining
Sentiment analysis
url https://doi.org/10.1186/s40537-024-01014-4
work_keys_str_mv AT fatimaessabery ahybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT ibrahimessabery ahybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT junaidqadir ahybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT beatrizsainzdeabajo ahybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT begonyagarciazapirain ahybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT fatimaessabery hybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT ibrahimessabery hybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT junaidqadir hybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT beatrizsainzdeabajo hybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45
AT begonyagarciazapirain hybridhadoopbasedsentimentanalysisclassifierfortweetsassociatedwithcovid19utilizingtwomachinelearningalgorithmscnnandfuzzyc45