Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset
Abstract Oral cancer poses a critical global health challenge, with early detection significantly improving patient survival rates and treatment outcomes. This study proposes an advanced deep learning-based diagnostic model, LightSE-MobileViT, specifically designed to classify oral cancer using medi...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-07-01
|
| Series: | Discover Artificial Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44163-025-00442-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849342440317648896 |
|---|---|
| author | Md Firoz Kabir Md Yousuf Ahmad Roise Uddin Martin Cordero Shashi Kant |
| author_facet | Md Firoz Kabir Md Yousuf Ahmad Roise Uddin Martin Cordero Shashi Kant |
| author_sort | Md Firoz Kabir |
| collection | DOAJ |
| description | Abstract Oral cancer poses a critical global health challenge, with early detection significantly improving patient survival rates and treatment outcomes. This study proposes an advanced deep learning-based diagnostic model, LightSE-MobileViT, specifically designed to classify oral cancer using medical imaging. The Oral Cancer Classification dataset used in this study comprises clinically validated lip and tongue images collected from various ENT hospitals in Ahmedabad. The original dataset consisted of 131 images (87 cancerous and 44 non-cancerous). To address class imbalance and enhance model generalizability, data augmentation techniques were employed, expanding the dataset to 981 images with equal distribution across both classes. Our proposed model, LightSE-MobileViT, integrates a lightweight convolutional neural network (CNN) backbone consisting of sequential convolutional layers enhanced with batch normalization and rectified linear unit activations. To further enrich feature representation and spatial attention, a Squeeze-and-Excitation block is embedded after the third convolutional layer. Subsequently, a MobileViT transformer encoder is employed, effectively capturing global contextual information through efficient multi-headed self-attention mechanisms. Experimental evaluations revealed that LightSE-MobileViT achieved superior diagnostic performance, attaining an accuracy of 98.39%, precision and recall values approaching 1.00 for both cancerous and non-cancerous categories, a macro F1-score of 0.98, and an ROC-AUC of 1.00. Comparative analysis demonstrated notable improvements over benchmark models, including CST-CNN (98% accuracy), MobileNetV2 (97% accuracy), DenseNet121 (97% accuracy), and InceptionV3 (90% accuracy). The exceptional performance of LightSE-MobileViT underscores its robust capability and clinical applicability, suggesting significant potential for deployment in automated oral cancer screening, thus facilitating early detection and timely intervention. |
| format | Article |
| id | doaj-art-52b6f5ab05a34d848b5b09c7f25689f5 |
| institution | Kabale University |
| issn | 2731-0809 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Artificial Intelligence |
| spelling | doaj-art-52b6f5ab05a34d848b5b09c7f25689f52025-08-20T03:43:22ZengSpringerDiscover Artificial Intelligence2731-08092025-07-015112110.1007/s44163-025-00442-2Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image datasetMd Firoz Kabir0Md Yousuf Ahmad1Roise Uddin2Martin Cordero3Shashi Kant4University of the CumberlandsTrine UniversityPacific States UniversityUniversity of the CumberlandsBule Hora UniversityAbstract Oral cancer poses a critical global health challenge, with early detection significantly improving patient survival rates and treatment outcomes. This study proposes an advanced deep learning-based diagnostic model, LightSE-MobileViT, specifically designed to classify oral cancer using medical imaging. The Oral Cancer Classification dataset used in this study comprises clinically validated lip and tongue images collected from various ENT hospitals in Ahmedabad. The original dataset consisted of 131 images (87 cancerous and 44 non-cancerous). To address class imbalance and enhance model generalizability, data augmentation techniques were employed, expanding the dataset to 981 images with equal distribution across both classes. Our proposed model, LightSE-MobileViT, integrates a lightweight convolutional neural network (CNN) backbone consisting of sequential convolutional layers enhanced with batch normalization and rectified linear unit activations. To further enrich feature representation and spatial attention, a Squeeze-and-Excitation block is embedded after the third convolutional layer. Subsequently, a MobileViT transformer encoder is employed, effectively capturing global contextual information through efficient multi-headed self-attention mechanisms. Experimental evaluations revealed that LightSE-MobileViT achieved superior diagnostic performance, attaining an accuracy of 98.39%, precision and recall values approaching 1.00 for both cancerous and non-cancerous categories, a macro F1-score of 0.98, and an ROC-AUC of 1.00. Comparative analysis demonstrated notable improvements over benchmark models, including CST-CNN (98% accuracy), MobileNetV2 (97% accuracy), DenseNet121 (97% accuracy), and InceptionV3 (90% accuracy). The exceptional performance of LightSE-MobileViT underscores its robust capability and clinical applicability, suggesting significant potential for deployment in automated oral cancer screening, thus facilitating early detection and timely intervention.https://doi.org/10.1007/s44163-025-00442-2Oral cancer detectionLightweight deep learningMobileViT transformerSqueeze-and-excitation (SE) moduleMedical imaging |
| spellingShingle | Md Firoz Kabir Md Yousuf Ahmad Roise Uddin Martin Cordero Shashi Kant Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset Discover Artificial Intelligence Oral cancer detection Lightweight deep learning MobileViT transformer Squeeze-and-excitation (SE) module Medical imaging |
| title | Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset |
| title_full | Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset |
| title_fullStr | Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset |
| title_full_unstemmed | Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset |
| title_short | Accurate and lightweight oral cancer detection using SE-MobileViT on clinically validated image dataset |
| title_sort | accurate and lightweight oral cancer detection using se mobilevit on clinically validated image dataset |
| topic | Oral cancer detection Lightweight deep learning MobileViT transformer Squeeze-and-excitation (SE) module Medical imaging |
| url | https://doi.org/10.1007/s44163-025-00442-2 |
| work_keys_str_mv | AT mdfirozkabir accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset AT mdyousufahmad accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset AT roiseuddin accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset AT martincordero accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset AT shashikant accurateandlightweightoralcancerdetectionusingsemobilevitonclinicallyvalidatedimagedataset |