Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection

Abstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the developmen...

Full description

Saved in:

Bibliographic Details
Main Authors:	Md. Rashadur Rahman, Rezaul Karim, Mohammad Shamsul Arefin, Pranab Kumar Dhar, Gahangir Hossain, Tetsuya Shimamura
Format:	Article
Language:	English
Published:	Springer 2025-01-01
Series:	Discover Applied Sciences
Subjects:	Check-worthiness Fact-checking Claim detection Nlp for low resource language Computational journalism
Online Access:	https://doi.org/10.1007/s42452-024-06444-6
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841544365857570816
author	Md. Rashadur Rahman Rezaul Karim Mohammad Shamsul Arefin Pranab Kumar Dhar Gahangir Hossain Tetsuya Shimamura
author_facet	Md. Rashadur Rahman Rezaul Karim Mohammad Shamsul Arefin Pranab Kumar Dhar Gahangir Hossain Tetsuya Shimamura
author_sort	Md. Rashadur Rahman
collection	DOAJ
description	Abstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the development of automated claim detection systems to support efficient automated or semi-automated fact-checking processes. Existing claim detection systems predominantly focus on the English language, with limited resources available for other regional languages like Bangla. This paper proposes a novel ensemble machine learning framework for the effective detection of claims in a low-resource language like Bangla, a critical initial step in the automated fact-checking process. The proposed weighted ensemble technique combines Support Vector Machines, Bernoulli Naive Bayes, and Decision Trees as base classifiers to effectively detect claims. An annotated text dataset comprising 5010 sentences sourced from various online platforms, including several online fact-checking sites, was developed. To determine the optimal model and feature representation for claim detection, various machine learning algorithms were evaluated using BoW, TF-IDF, Word2Vec, and FastText features. The efficacy of ensemble models was examined by investigating both averaging and weighting strategies. Evaluation metrics showcased that the proposed weighted ensemble approach outperformed all baseline models, achieving a maximum F1 score of 0.87. To the best of our knowledge, this study is the first and only approach to claim detection in the Bangla language, with the potential for extension to other resource-constrained languages. Our work aspires to serve as a crucial tool in the fight against misinformation by advancing the accuracy and transparency of information.
format	Article
id	doaj-art-766dfbebb1834bfca7e96a2b86ebf741
institution	Kabale University
issn	3004-9261
language	English
publishDate	2025-01-01
publisher	Springer
record_format	Article
series	Discover Applied Sciences
spelling	doaj-art-766dfbebb1834bfca7e96a2b86ebf7412025-01-12T12:35:09ZengSpringerDiscover Applied Sciences3004-92612025-01-017112510.1007/s42452-024-06444-6Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detectionMd. Rashadur Rahman0Rezaul Karim1Mohammad Shamsul Arefin2Pranab Kumar Dhar3Gahangir Hossain4Tetsuya Shimamura5Department of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyUniversity of North TexasDepartment of Information and Computer Sciences, Saitama UniversityAbstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the development of automated claim detection systems to support efficient automated or semi-automated fact-checking processes. Existing claim detection systems predominantly focus on the English language, with limited resources available for other regional languages like Bangla. This paper proposes a novel ensemble machine learning framework for the effective detection of claims in a low-resource language like Bangla, a critical initial step in the automated fact-checking process. The proposed weighted ensemble technique combines Support Vector Machines, Bernoulli Naive Bayes, and Decision Trees as base classifiers to effectively detect claims. An annotated text dataset comprising 5010 sentences sourced from various online platforms, including several online fact-checking sites, was developed. To determine the optimal model and feature representation for claim detection, various machine learning algorithms were evaluated using BoW, TF-IDF, Word2Vec, and FastText features. The efficacy of ensemble models was examined by investigating both averaging and weighting strategies. Evaluation metrics showcased that the proposed weighted ensemble approach outperformed all baseline models, achieving a maximum F1 score of 0.87. To the best of our knowledge, this study is the first and only approach to claim detection in the Bangla language, with the potential for extension to other resource-constrained languages. Our work aspires to serve as a crucial tool in the fight against misinformation by advancing the accuracy and transparency of information.https://doi.org/10.1007/s42452-024-06444-6Check-worthinessFact-checkingClaim detectionNlp for low resource languageComputational journalism
spellingShingle	Md. Rashadur Rahman Rezaul Karim Mohammad Shamsul Arefin Pranab Kumar Dhar Gahangir Hossain Tetsuya Shimamura Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection Discover Applied Sciences Check-worthiness Fact-checking Claim detection Nlp for low resource language Computational journalism
title	Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_full	Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_fullStr	Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_full_unstemmed	Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_short	Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
title_sort	facilitating automated fact checking a machine learning based weighted ensemble technique for claim detection
topic	Check-worthiness Fact-checking Claim detection Nlp for low resource language Computational journalism
url	https://doi.org/10.1007/s42452-024-06444-6
work_keys_str_mv	AT mdrashadurrahman facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT rezaulkarim facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT mohammadshamsularefin facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT pranabkumardhar facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT gahangirhossain facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT tetsuyashimamura facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection

Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection

Similar Items