An automated privacy-preserving self-supervised classification of COVID-19 from lung CT scan images minimizing the requirements of large data annotation

Abstract This study presents a novel privacy-preserving self-supervised (SSL) framework for COVID-19 classification from lung CT scans, utilizing federated learning (FL) enhanced with Paillier homomorphic encryption (PHE) to prevent third-party attacks during training. The FL-SSL based framework emp...

Full description

Saved in:
Bibliographic Details
Main Authors: Sadia Sultana Chowa, Md Rahad Islam Bhuiyan, Mst. Sazia Tahosin, Asif Karim, Sidratul Montaha, Md. Mehedi Hassan, Mohd Asif Shah, Sami Azam
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-83972-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559653383667712
author Sadia Sultana Chowa
Md Rahad Islam Bhuiyan
Mst. Sazia Tahosin
Asif Karim
Sidratul Montaha
Md. Mehedi Hassan
Mohd Asif Shah
Sami Azam
author_facet Sadia Sultana Chowa
Md Rahad Islam Bhuiyan
Mst. Sazia Tahosin
Asif Karim
Sidratul Montaha
Md. Mehedi Hassan
Mohd Asif Shah
Sami Azam
author_sort Sadia Sultana Chowa
collection DOAJ
description Abstract This study presents a novel privacy-preserving self-supervised (SSL) framework for COVID-19 classification from lung CT scans, utilizing federated learning (FL) enhanced with Paillier homomorphic encryption (PHE) to prevent third-party attacks during training. The FL-SSL based framework employs two publicly available lung CT scan datasets which are considered as labeled and an unlabeled dataset. The unlabeled dataset is split into three subsets which are assumed to be collected from three hospitals. Training is done using the Bootstrap Your Own Latent (BYOL) contrastive learning SSL framework with a VGG19 encoder followed by attention CNN blocks (VGG19 + attention CNN). The input datasets are processed by selecting the largest lung portion of each lung CT scan using an automated selection approach and a 64 × 64 input size is utilized to reduce computational complexity. Healthcare privacy issues are addressed by collaborative training across decentralized datasets and secure aggregation with PHE, underscoring the effectiveness of this approach. Three subsets of the dataset are used to train the local BYOL model, which together optimizes the central encoder. The labeled dataset is employed to train the central encoder (updated VGG19 + attention CNN), resulting in an accuracy of 97.19%, a precision of 97.43%, and a recall of 98.18%. The reliability of the framework’s performance is demonstrated through statistical analysis and five-fold cross-validation. The efficacy of the proposed framework is further showcased by showing its performance on three distinct modality datasets: skin cancer, breast cancer, and chest X-rays. In conclusion, this study offers a promising solution for accurate diagnosis of chest X-rays, preserving privacy and overcoming the challenges of dataset scarcity and computational complexity.
format Article
id doaj-art-406c4126128f4f169149a25e4134699f
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-406c4126128f4f169149a25e4134699f2025-01-05T12:18:21ZengNature PortfolioScientific Reports2045-23222025-01-0115112010.1038/s41598-024-83972-6An automated privacy-preserving self-supervised classification of COVID-19 from lung CT scan images minimizing the requirements of large data annotationSadia Sultana Chowa0Md Rahad Islam Bhuiyan1Mst. Sazia Tahosin2Asif Karim3Sidratul Montaha4Md. Mehedi Hassan5Mohd Asif Shah6Sami Azam7Faculty of Science and Technology, Charles Darwin UniversityFaculty of Science and Technology, Charles Darwin UniversityHealth Informatics Research Laboratory (HIRL), Department of Computer Science and Engineering, Daffodil International UniversityFaculty of Science and Technology, Charles Darwin UniversityDepartment of Computer Science, University of CalgaryComputer Science and Engineering Discipline, Khulna UniversityDepartment of Economics, Bakhtar UniversityFaculty of Science and Technology, Charles Darwin UniversityAbstract This study presents a novel privacy-preserving self-supervised (SSL) framework for COVID-19 classification from lung CT scans, utilizing federated learning (FL) enhanced with Paillier homomorphic encryption (PHE) to prevent third-party attacks during training. The FL-SSL based framework employs two publicly available lung CT scan datasets which are considered as labeled and an unlabeled dataset. The unlabeled dataset is split into three subsets which are assumed to be collected from three hospitals. Training is done using the Bootstrap Your Own Latent (BYOL) contrastive learning SSL framework with a VGG19 encoder followed by attention CNN blocks (VGG19 + attention CNN). The input datasets are processed by selecting the largest lung portion of each lung CT scan using an automated selection approach and a 64 × 64 input size is utilized to reduce computational complexity. Healthcare privacy issues are addressed by collaborative training across decentralized datasets and secure aggregation with PHE, underscoring the effectiveness of this approach. Three subsets of the dataset are used to train the local BYOL model, which together optimizes the central encoder. The labeled dataset is employed to train the central encoder (updated VGG19 + attention CNN), resulting in an accuracy of 97.19%, a precision of 97.43%, and a recall of 98.18%. The reliability of the framework’s performance is demonstrated through statistical analysis and five-fold cross-validation. The efficacy of the proposed framework is further showcased by showing its performance on three distinct modality datasets: skin cancer, breast cancer, and chest X-rays. In conclusion, this study offers a promising solution for accurate diagnosis of chest X-rays, preserving privacy and overcoming the challenges of dataset scarcity and computational complexity.https://doi.org/10.1038/s41598-024-83972-6Self-supervised learningContrastive learningVGG-19Attention-CNNFederated learningPrivacy-preserving
spellingShingle Sadia Sultana Chowa
Md Rahad Islam Bhuiyan
Mst. Sazia Tahosin
Asif Karim
Sidratul Montaha
Md. Mehedi Hassan
Mohd Asif Shah
Sami Azam
An automated privacy-preserving self-supervised classification of COVID-19 from lung CT scan images minimizing the requirements of large data annotation
Scientific Reports
Self-supervised learning
Contrastive learning
VGG-19
Attention-CNN
Federated learning
Privacy-preserving
title An automated privacy-preserving self-supervised classification of COVID-19 from lung CT scan images minimizing the requirements of large data annotation
title_full An automated privacy-preserving self-supervised classification of COVID-19 from lung CT scan images minimizing the requirements of large data annotation
title_fullStr An automated privacy-preserving self-supervised classification of COVID-19 from lung CT scan images minimizing the requirements of large data annotation
title_full_unstemmed An automated privacy-preserving self-supervised classification of COVID-19 from lung CT scan images minimizing the requirements of large data annotation
title_short An automated privacy-preserving self-supervised classification of COVID-19 from lung CT scan images minimizing the requirements of large data annotation
title_sort automated privacy preserving self supervised classification of covid 19 from lung ct scan images minimizing the requirements of large data annotation
topic Self-supervised learning
Contrastive learning
VGG-19
Attention-CNN
Federated learning
Privacy-preserving
url https://doi.org/10.1038/s41598-024-83972-6
work_keys_str_mv AT sadiasultanachowa anautomatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT mdrahadislambhuiyan anautomatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT mstsaziatahosin anautomatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT asifkarim anautomatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT sidratulmontaha anautomatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT mdmehedihassan anautomatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT mohdasifshah anautomatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT samiazam anautomatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT sadiasultanachowa automatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT mdrahadislambhuiyan automatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT mstsaziatahosin automatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT asifkarim automatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT sidratulmontaha automatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT mdmehedihassan automatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT mohdasifshah automatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation
AT samiazam automatedprivacypreservingselfsupervisedclassificationofcovid19fromlungctscanimagesminimizingtherequirementsoflargedataannotation