CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis

IntroductionThe unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This study aims to enhance pandemic preparedness by leveraging real-time social media analysis to detect and monitor resource needs.Met...

Full description

Saved in:
Bibliographic Details
Main Authors: Soofi Shafiya, Mudasir Ahmad Wani, Suraiya Jabin, Mohammad ELAffendi, Jahiruddin
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-08-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2025.1623090/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849233385863512064
author Soofi Shafiya
Mudasir Ahmad Wani
Suraiya Jabin
Mohammad ELAffendi
Jahiruddin
author_facet Soofi Shafiya
Mudasir Ahmad Wani
Suraiya Jabin
Mohammad ELAffendi
Jahiruddin
author_sort Soofi Shafiya
collection DOAJ
description IntroductionThe unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This study aims to enhance pandemic preparedness by leveraging real-time social media analysis to detect and monitor resource needs.MethodsUsing SnScrape, over 27.5 million tweets for the duration of November 2019 to March 2023 were collected using COVID-19-related hashtags. Tweets from April 2021, a peak pandemic period, were selected to create the CoViNAR dataset. BERTopic enabled context-aware filtering, resulting in a novel dataset of 14,000 annotated tweets categorized as “Need”, “Availability”, and “Not-relevant”. The CoViNAR dataset was used to train various machine learning classifiers, with experiments conducted using three context-aware word embedding techniques.ResultsThe best classifier, trained with DistilBERT embeddings, achieved an accuracy of 96.42%, 96.44% precision, 96.42% recall, and an F1-score of 96.43% on the Test dataset. Temporal analysis of classified tweets from the US, UK, and India between November 2019 and March 2023 revealed a strong correlation between “Need/Availability” tweet counts and COVID-19 case surges.DiscussionThe results demonstrate the effectiveness of the proposed approach in capturing real-time indicators of resource shortages and availability. The strong correlation with case surges underscores its potential as a proactive tool for public health authorities, enabling improved resource allocation and early crisis intervention during pandemics.
format Article
id doaj-art-2d6abdb07c8f45b29c9f346a6cfbfd3b
institution Kabale University
issn 2624-8212
language English
publishDate 2025-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Artificial Intelligence
spelling doaj-art-2d6abdb07c8f45b29c9f346a6cfbfd3b2025-08-20T05:32:51ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122025-08-01810.3389/frai.2025.16230901623090CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysisSoofi Shafiya0Mudasir Ahmad Wani1Suraiya Jabin2Mohammad ELAffendi3 Jahiruddin4Department of Computer Science, Faculty of Sciences, Jamia Millia Islamia, New Delhi, IndiaEIAS Data Science & Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi ArabiaDepartment of Computer Science, Faculty of Sciences, Jamia Millia Islamia, New Delhi, IndiaEIAS Data Science & Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi ArabiaDepartment of Computer Science, Faculty of Sciences, Jamia Millia Islamia, New Delhi, IndiaIntroductionThe unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This study aims to enhance pandemic preparedness by leveraging real-time social media analysis to detect and monitor resource needs.MethodsUsing SnScrape, over 27.5 million tweets for the duration of November 2019 to March 2023 were collected using COVID-19-related hashtags. Tweets from April 2021, a peak pandemic period, were selected to create the CoViNAR dataset. BERTopic enabled context-aware filtering, resulting in a novel dataset of 14,000 annotated tweets categorized as “Need”, “Availability”, and “Not-relevant”. The CoViNAR dataset was used to train various machine learning classifiers, with experiments conducted using three context-aware word embedding techniques.ResultsThe best classifier, trained with DistilBERT embeddings, achieved an accuracy of 96.42%, 96.44% precision, 96.42% recall, and an F1-score of 96.43% on the Test dataset. Temporal analysis of classified tweets from the US, UK, and India between November 2019 and March 2023 revealed a strong correlation between “Need/Availability” tweet counts and COVID-19 case surges.DiscussionThe results demonstrate the effectiveness of the proposed approach in capturing real-time indicators of resource shortages and availability. The strong correlation with case surges underscores its potential as a proactive tool for public health authorities, enabling improved resource allocation and early crisis intervention during pandemics.https://www.frontiersin.org/articles/10.3389/frai.2025.1623090/fullBERTopicCOVID-19natural language processingsocial mediaDistilBERTSVM
spellingShingle Soofi Shafiya
Mudasir Ahmad Wani
Suraiya Jabin
Mohammad ELAffendi
Jahiruddin
CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis
Frontiers in Artificial Intelligence
BERTopic
COVID-19
natural language processing
social media
DistilBERT
SVM
title CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis
title_full CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis
title_fullStr CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis
title_full_unstemmed CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis
title_short CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis
title_sort covinar a context aware social media dataset for pandemic severity level prediction and analysis
topic BERTopic
COVID-19
natural language processing
social media
DistilBERT
SVM
url https://www.frontiersin.org/articles/10.3389/frai.2025.1623090/full
work_keys_str_mv AT soofishafiya covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis
AT mudasirahmadwani covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis
AT suraiyajabin covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis
AT mohammadelaffendi covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis
AT jahiruddin covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis