CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis
IntroductionThe unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This study aims to enhance pandemic preparedness by leveraging real-time social media analysis to detect and monitor resource needs.Met...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-08-01
|
| Series: | Frontiers in Artificial Intelligence |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/frai.2025.1623090/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849233385863512064 |
|---|---|
| author | Soofi Shafiya Mudasir Ahmad Wani Suraiya Jabin Mohammad ELAffendi Jahiruddin |
| author_facet | Soofi Shafiya Mudasir Ahmad Wani Suraiya Jabin Mohammad ELAffendi Jahiruddin |
| author_sort | Soofi Shafiya |
| collection | DOAJ |
| description | IntroductionThe unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This study aims to enhance pandemic preparedness by leveraging real-time social media analysis to detect and monitor resource needs.MethodsUsing SnScrape, over 27.5 million tweets for the duration of November 2019 to March 2023 were collected using COVID-19-related hashtags. Tweets from April 2021, a peak pandemic period, were selected to create the CoViNAR dataset. BERTopic enabled context-aware filtering, resulting in a novel dataset of 14,000 annotated tweets categorized as “Need”, “Availability”, and “Not-relevant”. The CoViNAR dataset was used to train various machine learning classifiers, with experiments conducted using three context-aware word embedding techniques.ResultsThe best classifier, trained with DistilBERT embeddings, achieved an accuracy of 96.42%, 96.44% precision, 96.42% recall, and an F1-score of 96.43% on the Test dataset. Temporal analysis of classified tweets from the US, UK, and India between November 2019 and March 2023 revealed a strong correlation between “Need/Availability” tweet counts and COVID-19 case surges.DiscussionThe results demonstrate the effectiveness of the proposed approach in capturing real-time indicators of resource shortages and availability. The strong correlation with case surges underscores its potential as a proactive tool for public health authorities, enabling improved resource allocation and early crisis intervention during pandemics. |
| format | Article |
| id | doaj-art-2d6abdb07c8f45b29c9f346a6cfbfd3b |
| institution | Kabale University |
| issn | 2624-8212 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Artificial Intelligence |
| spelling | doaj-art-2d6abdb07c8f45b29c9f346a6cfbfd3b2025-08-20T05:32:51ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122025-08-01810.3389/frai.2025.16230901623090CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysisSoofi Shafiya0Mudasir Ahmad Wani1Suraiya Jabin2Mohammad ELAffendi3 Jahiruddin4Department of Computer Science, Faculty of Sciences, Jamia Millia Islamia, New Delhi, IndiaEIAS Data Science & Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi ArabiaDepartment of Computer Science, Faculty of Sciences, Jamia Millia Islamia, New Delhi, IndiaEIAS Data Science & Blockchain Laboratory, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi ArabiaDepartment of Computer Science, Faculty of Sciences, Jamia Millia Islamia, New Delhi, IndiaIntroductionThe unprecedented COVID-19 pandemic exposed critical weaknesses in global health management, particularly in resource allocation and demand forecasting. This study aims to enhance pandemic preparedness by leveraging real-time social media analysis to detect and monitor resource needs.MethodsUsing SnScrape, over 27.5 million tweets for the duration of November 2019 to March 2023 were collected using COVID-19-related hashtags. Tweets from April 2021, a peak pandemic period, were selected to create the CoViNAR dataset. BERTopic enabled context-aware filtering, resulting in a novel dataset of 14,000 annotated tweets categorized as “Need”, “Availability”, and “Not-relevant”. The CoViNAR dataset was used to train various machine learning classifiers, with experiments conducted using three context-aware word embedding techniques.ResultsThe best classifier, trained with DistilBERT embeddings, achieved an accuracy of 96.42%, 96.44% precision, 96.42% recall, and an F1-score of 96.43% on the Test dataset. Temporal analysis of classified tweets from the US, UK, and India between November 2019 and March 2023 revealed a strong correlation between “Need/Availability” tweet counts and COVID-19 case surges.DiscussionThe results demonstrate the effectiveness of the proposed approach in capturing real-time indicators of resource shortages and availability. The strong correlation with case surges underscores its potential as a proactive tool for public health authorities, enabling improved resource allocation and early crisis intervention during pandemics.https://www.frontiersin.org/articles/10.3389/frai.2025.1623090/fullBERTopicCOVID-19natural language processingsocial mediaDistilBERTSVM |
| spellingShingle | Soofi Shafiya Mudasir Ahmad Wani Suraiya Jabin Mohammad ELAffendi Jahiruddin CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis Frontiers in Artificial Intelligence BERTopic COVID-19 natural language processing social media DistilBERT SVM |
| title | CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis |
| title_full | CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis |
| title_fullStr | CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis |
| title_full_unstemmed | CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis |
| title_short | CoViNAR: a context-aware social media dataset for pandemic severity level prediction and analysis |
| title_sort | covinar a context aware social media dataset for pandemic severity level prediction and analysis |
| topic | BERTopic COVID-19 natural language processing social media DistilBERT SVM |
| url | https://www.frontiersin.org/articles/10.3389/frai.2025.1623090/full |
| work_keys_str_mv | AT soofishafiya covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis AT mudasirahmadwani covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis AT suraiyajabin covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis AT mohammadelaffendi covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis AT jahiruddin covinaracontextawaresocialmediadatasetforpandemicseveritylevelpredictionandanalysis |