Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset

Abstract Oral cancer poses a critical global health challenge, with early detection significantly improving patient survival rates and treatment outcomes. This study proposes an advanced deep learning-based diagnostic model, LightSE-MobileViT, specifically designed to classify oral cancer using medi...

Full description

Saved in:
Bibliographic Details
Main Authors: Md Firoz Kabir, Md Yousuf Ahmad, Roise Uddin, Martin Cordero, Shashi Kant
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00442-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849342440317648896
author Md Firoz Kabir
Md Yousuf Ahmad
Roise Uddin
Martin Cordero
Shashi Kant
author_facet Md Firoz Kabir
Md Yousuf Ahmad
Roise Uddin
Martin Cordero
Shashi Kant
author_sort Md Firoz Kabir
collection DOAJ
description Abstract Oral cancer poses a critical global health challenge, with early detection significantly improving patient survival rates and treatment outcomes. This study proposes an advanced deep learning-based diagnostic model, LightSE-MobileViT, specifically designed to classify oral cancer using medical imaging. The Oral Cancer Classification dataset used in this study comprises clinically validated lip and tongue images collected from various ENT hospitals in Ahmedabad. The original dataset consisted of 131 images (87 cancerous and 44 non-cancerous). To address class imbalance and enhance model generalizability, data augmentation techniques were employed, expanding the dataset to 981 images with equal distribution across both classes. Our proposed model, LightSE-MobileViT, integrates a lightweight convolutional neural network (CNN) backbone consisting of sequential convolutional layers enhanced with batch normalization and rectified linear unit activations. To further enrich feature representation and spatial attention, a Squeeze-and-Excitation block is embedded after the third convolutional layer. Subsequently, a MobileViT transformer encoder is employed, effectively capturing global contextual information through efficient multi-headed self-attention mechanisms. Experimental evaluations revealed that LightSE-MobileViT achieved superior diagnostic performance, attaining an accuracy of 98.39%, precision and recall values approaching 1.00 for both cancerous and non-cancerous categories, a macro F1-score of 0.98, and an ROC-AUC of 1.00. Comparative analysis demonstrated notable improvements over benchmark models, including CST-CNN (98% accuracy), MobileNetV2 (97% accuracy), DenseNet121 (97% accuracy), and InceptionV3 (90% accuracy). The exceptional performance of LightSE-MobileViT underscores its robust capability and clinical applicability, suggesting significant potential for deployment in automated oral cancer screening, thus facilitating early detection and timely intervention.
format Article
id doaj-art-52b6f5ab05a34d848b5b09c7f25689f5
institution Kabale University
issn 2731-0809
language English
publishDate 2025-07-01
publisher Springer
record_format Article
series Discover Artificial Intelligence
spelling doaj-art-52b6f5ab05a34d848b5b09c7f25689f52025-08-20T03:43:22ZengSpringerDiscover Artificial Intelligence2731-08092025-07-015112110.1007/s44163-025-00442-2Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image datasetMd Firoz Kabir0Md Yousuf Ahmad1Roise Uddin2Martin Cordero3Shashi Kant4University of the CumberlandsTrine UniversityPacific States UniversityUniversity of the CumberlandsBule Hora UniversityAbstract Oral cancer poses a critical global health challenge, with early detection significantly improving patient survival rates and treatment outcomes. This study proposes an advanced deep learning-based diagnostic model, LightSE-MobileViT, specifically designed to classify oral cancer using medical imaging. The Oral Cancer Classification dataset used in this study comprises clinically validated lip and tongue images collected from various ENT hospitals in Ahmedabad. The original dataset consisted of 131 images (87 cancerous and 44 non-cancerous). To address class imbalance and enhance model generalizability, data augmentation techniques were employed, expanding the dataset to 981 images with equal distribution across both classes. Our proposed model, LightSE-MobileViT, integrates a lightweight convolutional neural network (CNN) backbone consisting of sequential convolutional layers enhanced with batch normalization and rectified linear unit activations. To further enrich feature representation and spatial attention, a Squeeze-and-Excitation block is embedded after the third convolutional layer. Subsequently, a MobileViT transformer encoder is employed, effectively capturing global contextual information through efficient multi-headed self-attention mechanisms. Experimental evaluations revealed that LightSE-MobileViT achieved superior diagnostic performance, attaining an accuracy of 98.39%, precision and recall values approaching 1.00 for both cancerous and non-cancerous categories, a macro F1-score of 0.98, and an ROC-AUC of 1.00. Comparative analysis demonstrated notable improvements over benchmark models, including CST-CNN (98% accuracy), MobileNetV2 (97% accuracy), DenseNet121 (97% accuracy), and InceptionV3 (90% accuracy). The exceptional performance of LightSE-MobileViT underscores its robust capability and clinical applicability, suggesting significant potential for deployment in automated oral cancer screening, thus facilitating early detection and timely intervention.https://doi.org/10.1007/s44163-025-00442-2Oral cancer detectionLightweight deep learningMobileViT transformerSqueeze-and-excitation (SE) moduleMedical imaging
spellingShingle Md Firoz Kabir
Md Yousuf Ahmad
Roise Uddin
Martin Cordero
Shashi Kant
Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset
Discover Artificial Intelligence
Oral cancer detection
Lightweight deep learning
MobileViT transformer
Squeeze-and-excitation (SE) module
Medical imaging
title Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset
title_full Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset
title_fullStr Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset
title_full_unstemmed Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset
title_short Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset
title_sort accurate and lightweight oral cancer detection using se mobilevit on clinically validated image dataset
topic Oral cancer detection
Lightweight deep learning
MobileViT transformer
Squeeze-and-excitation (SE) module
Medical imaging
url https://doi.org/10.1007/s44163-025-00442-2
work_keys_str_mv AT mdfirozkabir accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset
AT mdyousufahmad accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset
AT roiseuddin accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset
AT martincordero accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset
AT shashikant accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset