Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection
Abstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the developmen...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2025-01-01
|
Series: | Discover Applied Sciences |
Subjects: | |
Online Access: | https://doi.org/10.1007/s42452-024-06444-6 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544365857570816 |
---|---|
author | Md. Rashadur Rahman Rezaul Karim Mohammad Shamsul Arefin Pranab Kumar Dhar Gahangir Hossain Tetsuya Shimamura |
author_facet | Md. Rashadur Rahman Rezaul Karim Mohammad Shamsul Arefin Pranab Kumar Dhar Gahangir Hossain Tetsuya Shimamura |
author_sort | Md. Rashadur Rahman |
collection | DOAJ |
description | Abstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the development of automated claim detection systems to support efficient automated or semi-automated fact-checking processes. Existing claim detection systems predominantly focus on the English language, with limited resources available for other regional languages like Bangla. This paper proposes a novel ensemble machine learning framework for the effective detection of claims in a low-resource language like Bangla, a critical initial step in the automated fact-checking process. The proposed weighted ensemble technique combines Support Vector Machines, Bernoulli Naive Bayes, and Decision Trees as base classifiers to effectively detect claims. An annotated text dataset comprising 5010 sentences sourced from various online platforms, including several online fact-checking sites, was developed. To determine the optimal model and feature representation for claim detection, various machine learning algorithms were evaluated using BoW, TF-IDF, Word2Vec, and FastText features. The efficacy of ensemble models was examined by investigating both averaging and weighting strategies. Evaluation metrics showcased that the proposed weighted ensemble approach outperformed all baseline models, achieving a maximum F1 score of 0.87. To the best of our knowledge, this study is the first and only approach to claim detection in the Bangla language, with the potential for extension to other resource-constrained languages. Our work aspires to serve as a crucial tool in the fight against misinformation by advancing the accuracy and transparency of information. |
format | Article |
id | doaj-art-766dfbebb1834bfca7e96a2b86ebf741 |
institution | Kabale University |
issn | 3004-9261 |
language | English |
publishDate | 2025-01-01 |
publisher | Springer |
record_format | Article |
series | Discover Applied Sciences |
spelling | doaj-art-766dfbebb1834bfca7e96a2b86ebf7412025-01-12T12:35:09ZengSpringerDiscover Applied Sciences3004-92612025-01-017112510.1007/s42452-024-06444-6Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detectionMd. Rashadur Rahman0Rezaul Karim1Mohammad Shamsul Arefin2Pranab Kumar Dhar3Gahangir Hossain4Tetsuya Shimamura5Department of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyDepartment of Computer Science and Engineering, Chittagong University of Engineering and TechnologyUniversity of North TexasDepartment of Information and Computer Sciences, Saitama UniversityAbstract The rapid digitization of media, driven by technological advancements, has accelerated the spread of information through online platforms. This has heightened the need for robust fact-checking mechanisms to counter misinformation. The prevalence of misinformation necessitates the development of automated claim detection systems to support efficient automated or semi-automated fact-checking processes. Existing claim detection systems predominantly focus on the English language, with limited resources available for other regional languages like Bangla. This paper proposes a novel ensemble machine learning framework for the effective detection of claims in a low-resource language like Bangla, a critical initial step in the automated fact-checking process. The proposed weighted ensemble technique combines Support Vector Machines, Bernoulli Naive Bayes, and Decision Trees as base classifiers to effectively detect claims. An annotated text dataset comprising 5010 sentences sourced from various online platforms, including several online fact-checking sites, was developed. To determine the optimal model and feature representation for claim detection, various machine learning algorithms were evaluated using BoW, TF-IDF, Word2Vec, and FastText features. The efficacy of ensemble models was examined by investigating both averaging and weighting strategies. Evaluation metrics showcased that the proposed weighted ensemble approach outperformed all baseline models, achieving a maximum F1 score of 0.87. To the best of our knowledge, this study is the first and only approach to claim detection in the Bangla language, with the potential for extension to other resource-constrained languages. Our work aspires to serve as a crucial tool in the fight against misinformation by advancing the accuracy and transparency of information.https://doi.org/10.1007/s42452-024-06444-6Check-worthinessFact-checkingClaim detectionNlp for low resource languageComputational journalism |
spellingShingle | Md. Rashadur Rahman Rezaul Karim Mohammad Shamsul Arefin Pranab Kumar Dhar Gahangir Hossain Tetsuya Shimamura Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection Discover Applied Sciences Check-worthiness Fact-checking Claim detection Nlp for low resource language Computational journalism |
title | Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection |
title_full | Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection |
title_fullStr | Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection |
title_full_unstemmed | Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection |
title_short | Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detection |
title_sort | facilitating automated fact checking a machine learning based weighted ensemble technique for claim detection |
topic | Check-worthiness Fact-checking Claim detection Nlp for low resource language Computational journalism |
url | https://doi.org/10.1007/s42452-024-06444-6 |
work_keys_str_mv | AT mdrashadurrahman facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT rezaulkarim facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT mohammadshamsularefin facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT pranabkumardhar facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT gahangirhossain facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection AT tetsuyashimamura facilitatingautomatedfactcheckingamachinelearningbasedweightedensembletechniqueforclaimdetection |