A recurrent multimodal sparse transformer framework for gastrointestinal disease classification

Abstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, pa...

Full description

Saved in:

Bibliographic Details
Main Authors:	V. Sharmila, S. Geetha
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Reports
Subjects:	Gastrointestinal disease classification Multimodal feature fusion Bio-RoBERTa Sparse transformer network Cross-attention mechanism
Online Access:	https://doi.org/10.1038/s41598-025-08897-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849235389022208000
author	V. Sharmila S. Geetha
author_facet	V. Sharmila S. Geetha
author_sort	V. Sharmila
collection	DOAJ
description	Abstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, particularly when dealing with heterogeneous data such as medical text and endoscopic images. To bridge these gaps, this study proposes a novel recurrent multimodal principal gradient K-proximal sparse transformer (RMP-GKPS-transformer) framework for comprehensive GI disease classification. The approach integrates clinical text and WCE images using a robust multi-modal fusion strategy that incorporates Bio-RoBERTa for textual feature extraction, a graph vision spatial channel attention transformer network for image feature learning, and cross-attention mechanisms for modality alignment. Further, the model employs principal component analysis (PCA) for dimensionality reduction and gradient boosting machines (GBMs) for semantic conflict resolution. Classification is performed using an ensemble of random forest KNN, proximal policy optimization (PPO), and a sparse radial basis function (RBF) kernel to ensure accuracy and interpretability. Experimental evaluation on publicly available datasets achieved 99.82% accuracy, a Dice coefficient of 98.7%, and significantly lower execution time compared to state-of-the-art methods. The results confirm the framework’s effectiveness in aligning and leveraging multi-modal data for precise classification of six GI diseases, offering a scalable and interpretable solution for enhanced clinical decision-making in gastroenterology.
format	Article
id	doaj-art-9f1e11eec97442c190e861fc05e423de
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-07-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-9f1e11eec97442c190e861fc05e423de2025-08-20T04:02:46ZengNature PortfolioScientific Reports2045-23222025-07-0115112010.1038/s41598-025-08897-0A recurrent multimodal sparse transformer framework for gastrointestinal disease classificationV. Sharmila0S. Geetha1School of Computer Science and Engineering, Vellore Institute of TechnologySchool of Computer Science and Engineering, Vellore Institute of TechnologyAbstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, particularly when dealing with heterogeneous data such as medical text and endoscopic images. To bridge these gaps, this study proposes a novel recurrent multimodal principal gradient K-proximal sparse transformer (RMP-GKPS-transformer) framework for comprehensive GI disease classification. The approach integrates clinical text and WCE images using a robust multi-modal fusion strategy that incorporates Bio-RoBERTa for textual feature extraction, a graph vision spatial channel attention transformer network for image feature learning, and cross-attention mechanisms for modality alignment. Further, the model employs principal component analysis (PCA) for dimensionality reduction and gradient boosting machines (GBMs) for semantic conflict resolution. Classification is performed using an ensemble of random forest KNN, proximal policy optimization (PPO), and a sparse radial basis function (RBF) kernel to ensure accuracy and interpretability. Experimental evaluation on publicly available datasets achieved 99.82% accuracy, a Dice coefficient of 98.7%, and significantly lower execution time compared to state-of-the-art methods. The results confirm the framework’s effectiveness in aligning and leveraging multi-modal data for precise classification of six GI diseases, offering a scalable and interpretable solution for enhanced clinical decision-making in gastroenterology.https://doi.org/10.1038/s41598-025-08897-0Gastrointestinal disease classificationMultimodal feature fusionBio-RoBERTaSparse transformer networkCross-attention mechanism
spellingShingle	V. Sharmila S. Geetha A recurrent multimodal sparse transformer framework for gastrointestinal disease classification Scientific Reports Gastrointestinal disease classification Multimodal feature fusion Bio-RoBERTa Sparse transformer network Cross-attention mechanism
title	A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_full	A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_fullStr	A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_full_unstemmed	A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_short	A recurrent multimodal sparse transformer framework for gastrointestinal disease classification
title_sort	recurrent multimodal sparse transformer framework for gastrointestinal disease classification
topic	Gastrointestinal disease classification Multimodal feature fusion Bio-RoBERTa Sparse transformer network Cross-attention mechanism
url	https://doi.org/10.1038/s41598-025-08897-0
work_keys_str_mv	AT vsharmila arecurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification AT sgeetha arecurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification AT vsharmila recurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification AT sgeetha recurrentmultimodalsparsetransformerframeworkforgastrointestinaldiseaseclassification

A recurrent multimodal sparse transformer framework for gastrointestinal disease classification

Similar Items