A recurrent multimodal sparse transformer framework for gastrointestinal disease classification

Abstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, pa...

Full description

Saved in:
Bibliographic Details
Main Authors: V. Sharmila, S. Geetha
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-08897-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235389022208000
author V. Sharmila
S. Geetha
author_facet V. Sharmila
S. Geetha
author_sort V. Sharmila
collection DOAJ
description Abstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, particularly when dealing with heterogeneous data such as medical text and endoscopic images. To bridge these gaps, this study proposes a novel recurrent multimodal principal gradient K-proximal sparse transformer (RMP-GKPS-transformer) framework for comprehensive GI disease classification. The approach integrates clinical text and WCE images using a robust multi-modal fusion strategy that incorporates Bio-RoBERTa for textual feature extraction, a graph vision spatial channel attention transformer network for image feature learning, and cross-attention mechanisms for modality alignment. Further, the model employs principal component analysis (PCA) for dimensionality reduction and gradient boosting machines (GBMs) for semantic conflict resolution. Classification is performed using an ensemble of random forest KNN, proximal policy optimization (PPO), and a sparse radial basis function (RBF) kernel to ensure accuracy and interpretability. Experimental evaluation on publicly available datasets achieved 99.82% accuracy, a Dice coefficient of 98.7%, and significantly lower execution time compared to state-of-the-art methods. The results confirm the framework’s effectiveness in aligning and leveraging multi-modal data for precise classification of six GI diseases, offering a scalable and interpretable solution for enhanced clinical decision-making in gastroenterology.
format Article
id doaj-art-9f1e11eec97442c190e861fc05e423de
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-9f1e11eec97442c190e861fc05e423de2025-08-20T04:02:46ZengNature PortfolioScientific Reports2045-23222025-07-0115112010.1038/s41598-025-08897-0A recurrent multimodal sparse transformer framework for gastrointestinal disease classificationV. Sharmila0S. Geetha1School of Computer Science and Engineering, Vellore Institute of TechnologySchool of Computer Science and Engineering, Vellore Institute of TechnologyAbstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, particularly when dealing with heterogeneous data such as medical text and endoscopic images. To bridge these gaps, this study proposes a novel recurrent multimodal principal gradient K-proximal sparse transformer (RMP-GKPS-transformer) framework for comprehensive GI disease classification. The approach integrates clinical text and WCE images using a robust multi-modal fusion strategy that incorporates Bio-RoBERTa for textual feature extraction, a graph vision spatial channel attention transformer network for image feature learning, and cross-attention mechanisms for modality alignment. Further, the model employs principal component analysis (PCA) for dimensionality reduction and gradient boosting machines (GBMs) for semantic conflict resolution. Classification is performed using an ensemble of random forest KNN, proximal policy optimization (PPO), and a sparse radial basis function (RBF) kernel to ensure accuracy and interpretability. Experimental evaluation on publicly available datasets achieved 99.82% accuracy, a Dice coefficient of 98.7%, and significantly lower execution time compared to state-of-the-art methods. The results confirm the framework’s effectiveness in aligning and leveraging multi-modal data for precise classification of six GI diseases, offering a scalable and interpretable solution for enhanced clinical decision-making in gastroenterology.https://doi.org/10.1038/s41598-025-08897-0Gastrointestinal disease classificationMultimodal feature fusionBio-RoBERTaSparse transformer networkCross-attention mechanism
spellingShingle V. Sharmila
S. Geetha
A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
Scientific Reports
Gastrointestinal disease classification
Multimodal feature fusion
Bio-RoBERTa
Sparse transformer network
Cross-attention mechanism
title A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_full A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_fullStr A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_full_unstemmed A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_short A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_sort recurrent multimodal sparse transformer framework for gastrointestinal disease classification
topic Gastrointestinal disease classification
Multimodal feature fusion
Bio-RoBERTa
Sparse transformer network
Cross-attention mechanism
url https://doi.org/10.1038/s41598-025-08897-0
work_keys_str_mv AT vsharmila arecurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification
AT sgeetha arecurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification
AT vsharmila recurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification
AT sgeetha recurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification