Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection

Abstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the developmen...

Full description

Saved in:
Bibliographic Details
Main Authors: Md. Rashadur Rahman, Rezaul Karim, Mohammad Shamsul Arefin, Pranab Kumar Dhar, Gahangir Hossain, Tetsuya Shimamura
Format: Article
Language:English
Published: Springer 2025-01-01
Series:Discover Applied Sciences
Subjects:
Online Access:https://doi.org/10.1007/s42452-024-06444-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544365857570816
author Md. Rashadur Rahman
Rezaul Karim
Mohammad Shamsul Arefin
Pranab Kumar Dhar
Gahangir Hossain
Tetsuya Shimamura
author_facet Md. Rashadur Rahman
Rezaul Karim
Mohammad Shamsul Arefin
Pranab Kumar Dhar
Gahangir Hossain
Tetsuya Shimamura
author_sort Md. Rashadur Rahman
collection DOAJ
description Abstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the development of automated claim detection systems to support efficient automated or semi-automated fact-checking processes. Existing claim detection systems predominantly focus on the English language, with limited resources available for other regional languages like Bangla. This paper proposes a novel ensemble machine learning framework for the effective detection of claims in a low-resource language like Bangla, a critical initial step in the automated fact-checking process. The proposed weighted ensemble technique combines Support Vector Machines, Bernoulli Naive Bayes, and Decision Trees as base classifiers to effectively detect claims. An annotated text dataset comprising 5010 sentences sourced from various online platforms, including several online fact-checking sites, was developed. To determine the optimal model and feature representation for claim detection, various machine learning algorithms were evaluated using BoW, TF-IDF, Word2Vec, and FastText features. The efficacy of ensemble models was examined by investigating both averaging and weighting strategies. Evaluation metrics showcased that the proposed weighted ensemble approach outperformed all baseline models, achieving a maximum F1 score of 0.87. To the best of our knowledge, this study is the first and only approach to claim detection in the Bangla language, with the potential for extension to other resource-constrained languages. Our work aspires to serve as a crucial tool in the fight against misinformation by advancing the accuracy and transparency of information.
format Article
id doaj-art-766dfbebb1834bfca7e96a2b86ebf741
institution Kabale University
issn 3004-9261
language English
publishDate 2025-01-01
publisher Springer
record_format Article
series Discover Applied Sciences
spelling doaj-art-766dfbebb1834bfca7e96a2b86ebf7412025-01-12T12:35:09ZengSpringerDiscover Applied Sciences3004-92612025-01-017112510.1007/s42452-024-06444-6Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detectionMd. Rashadur Rahman0Rezaul Karim1Mohammad Shamsul Arefin2Pranab Kumar Dhar3Gahangir Hossain4Tetsuya Shimamura5Department of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyUniversity of North TexasDepartment of Information and Computer Sciences, Saitama UniversityAbstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the development of automated claim detection systems to support efficient automated or semi-automated fact-checking processes. Existing claim detection systems predominantly focus on the English language, with limited resources available for other regional languages like Bangla. This paper proposes a novel ensemble machine learning framework for the effective detection of claims in a low-resource language like Bangla, a critical initial step in the automated fact-checking process. The proposed weighted ensemble technique combines Support Vector Machines, Bernoulli Naive Bayes, and Decision Trees as base classifiers to effectively detect claims. An annotated text dataset comprising 5010 sentences sourced from various online platforms, including several online fact-checking sites, was developed. To determine the optimal model and feature representation for claim detection, various machine learning algorithms were evaluated using BoW, TF-IDF, Word2Vec, and FastText features. The efficacy of ensemble models was examined by investigating both averaging and weighting strategies. Evaluation metrics showcased that the proposed weighted ensemble approach outperformed all baseline models, achieving a maximum F1 score of 0.87. To the best of our knowledge, this study is the first and only approach to claim detection in the Bangla language, with the potential for extension to other resource-constrained languages. Our work aspires to serve as a crucial tool in the fight against misinformation by advancing the accuracy and transparency of information.https://doi.org/10.1007/s42452-024-06444-6Check-worthinessFact-checkingClaim detectionNlp for low resource languageComputational journalism
spellingShingle Md. Rashadur Rahman
Rezaul Karim
Mohammad Shamsul Arefin
Pranab Kumar Dhar
Gahangir Hossain
Tetsuya Shimamura
Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
Discover Applied Sciences
Check-worthiness
Fact-checking
Claim detection
Nlp for low resource language
Computational journalism
title Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_full Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_fullStr Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_full_unstemmed Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_short Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_sort facilitating automated fact checking a machine learning based weighted ensemble technique for claim detection
topic Check-worthiness
Fact-checking
Claim detection
Nlp for low resource language
Computational journalism
url https://doi.org/10.1007/s42452-024-06444-6
work_keys_str_mv AT mdrashadurrahman facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection
AT rezaulkarim facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection
AT mohammadshamsularefin facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection
AT pranabkumardhar facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection
AT gahangirhossain facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection
AT tetsuyashimamura facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection