Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset

Abstract Oral cancer poses a critical global health challenge, with early detection significantly improving patient survival rates and treatment outcomes. This study proposes an advanced deep learning-based diagnostic model, LightSE-MobileViT, specifically designed to classify oral cancer using medi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Md Firoz Kabir, Md Yousuf Ahmad, Roise Uddin, Martin Cordero, Shashi Kant
Format:	Article
Language:	English
Published:	Springer 2025-07-01
Series:	Discover Artificial Intelligence
Subjects:	Oral cancer detection Lightweight deep learning MobileViT transformer Squeeze-and-excitation (SE) module Medical imaging
Online Access:	https://doi.org/10.1007/s44163-025-00442-2
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Oral cancer poses a critical global health challenge, with early detection significantly improving patient survival rates and treatment outcomes. This study proposes an advanced deep learning-based diagnostic model, LightSE-MobileViT, specifically designed to classify oral cancer using medical imaging. The Oral Cancer Classification dataset used in this study comprises clinically validated lip and tongue images collected from various ENT hospitals in Ahmedabad. The original dataset consisted of 131 images (87 cancerous and 44 non-cancerous). To address class imbalance and enhance model generalizability, data augmentation techniques were employed, expanding the dataset to 981 images with equal distribution across both classes. Our proposed model, LightSE-MobileViT, integrates a lightweight convolutional neural network (CNN) backbone consisting of sequential convolutional layers enhanced with batch normalization and rectified linear unit activations. To further enrich feature representation and spatial attention, a Squeeze-and-Excitation block is embedded after the third convolutional layer. Subsequently, a MobileViT transformer encoder is employed, effectively capturing global contextual information through efficient multi-headed self-attention mechanisms. Experimental evaluations revealed that LightSE-MobileViT achieved superior diagnostic performance, attaining an accuracy of 98.39%, precision and recall values approaching 1.00 for both cancerous and non-cancerous categories, a macro F1-score of 0.98, and an ROC-AUC of 1.00. Comparative analysis demonstrated notable improvements over benchmark models, including CST-CNN (98% accuracy), MobileNetV2 (97% accuracy), DenseNet121 (97% accuracy), and InceptionV3 (90% accuracy). The exceptional performance of LightSE-MobileViT underscores its robust capability and clinical applicability, suggesting significant potential for deployment in automated oral cancer screening, thus facilitating early detection and timely intervention.
ISSN:	2731-0809

Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset

Similar Items