A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
Abstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, pa...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-08897-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849235389022208000 |
|---|---|
| author | V. Sharmila S. Geetha |
| author_facet | V. Sharmila S. Geetha |
| author_sort | V. Sharmila |
| collection | DOAJ |
| description | Abstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, particularly when dealing with heterogeneous data such as medical text and endoscopic images. To bridge these gaps, this study proposes a novel recurrent multimodal principal gradient K-proximal sparse transformer (RMP-GKPS-transformer) framework for comprehensive GI disease classification. The approach integrates clinical text and WCE images using a robust multi-modal fusion strategy that incorporates Bio-RoBERTa for textual feature extraction, a graph vision spatial channel attention transformer network for image feature learning, and cross-attention mechanisms for modality alignment. Further, the model employs principal component analysis (PCA) for dimensionality reduction and gradient boosting machines (GBMs) for semantic conflict resolution. Classification is performed using an ensemble of random forest KNN, proximal policy optimization (PPO), and a sparse radial basis function (RBF) kernel to ensure accuracy and interpretability. Experimental evaluation on publicly available datasets achieved 99.82% accuracy, a Dice coefficient of 98.7%, and significantly lower execution time compared to state-of-the-art methods. The results confirm the framework’s effectiveness in aligning and leveraging multi-modal data for precise classification of six GI diseases, offering a scalable and interpretable solution for enhanced clinical decision-making in gastroenterology. |
| format | Article |
| id | doaj-art-9f1e11eec97442c190e861fc05e423de |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-9f1e11eec97442c190e861fc05e423de2025-08-20T04:02:46ZengNature PortfolioScientific Reports2045-23222025-07-0115112010.1038/s41598-025-08897-0A recurrent multimodal sparse transformer framework for gastrointestinal disease classificationV. Sharmila0S. Geetha1School of Computer Science and Engineering, Vellore Institute of TechnologySchool of Computer Science and Engineering, Vellore Institute of TechnologyAbstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, particularly when dealing with heterogeneous data such as medical text and endoscopic images. To bridge these gaps, this study proposes a novel recurrent multimodal principal gradient K-proximal sparse transformer (RMP-GKPS-transformer) framework for comprehensive GI disease classification. The approach integrates clinical text and WCE images using a robust multi-modal fusion strategy that incorporates Bio-RoBERTa for textual feature extraction, a graph vision spatial channel attention transformer network for image feature learning, and cross-attention mechanisms for modality alignment. Further, the model employs principal component analysis (PCA) for dimensionality reduction and gradient boosting machines (GBMs) for semantic conflict resolution. Classification is performed using an ensemble of random forest KNN, proximal policy optimization (PPO), and a sparse radial basis function (RBF) kernel to ensure accuracy and interpretability. Experimental evaluation on publicly available datasets achieved 99.82% accuracy, a Dice coefficient of 98.7%, and significantly lower execution time compared to state-of-the-art methods. The results confirm the framework’s effectiveness in aligning and leveraging multi-modal data for precise classification of six GI diseases, offering a scalable and interpretable solution for enhanced clinical decision-making in gastroenterology.https://doi.org/10.1038/s41598-025-08897-0Gastrointestinal disease classificationMultimodal feature fusionBio-RoBERTaSparse transformer networkCross-attention mechanism |
| spellingShingle | V. Sharmila S. Geetha A recurrent multimodal sparse transformer framework for gastrointestinal disease classification Scientific Reports Gastrointestinal disease classification Multimodal feature fusion Bio-RoBERTa Sparse transformer network Cross-attention mechanism |
| title | A recurrent multimodal sparse transformer framework for gastrointestinal disease classification |
| title_full | A recurrent multimodal sparse transformer framework for gastrointestinal disease classification |
| title_fullStr | A recurrent multimodal sparse transformer framework for gastrointestinal disease classification |
| title_full_unstemmed | A recurrent multimodal sparse transformer framework for gastrointestinal disease classification |
| title_short | A recurrent multimodal sparse transformer framework for gastrointestinal disease classification |
| title_sort | recurrent multimodal sparse transformer framework for gastrointestinal disease classification |
| topic | Gastrointestinal disease classification Multimodal feature fusion Bio-RoBERTa Sparse transformer network Cross-attention mechanism |
| url | https://doi.org/10.1038/s41598-025-08897-0 |
| work_keys_str_mv | AT vsharmila arecurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification AT sgeetha arecurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification AT vsharmila recurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification AT sgeetha recurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification |