FTNet-HiLa: An adaptive multimodal histopathological image categorization network
The integration of artificial intelligence in medical imaging has witnessed a surge in neural network applications for pathological image classification, with Vision Transformers (ViTs) emerging as highly accurate models in general visual recognition tasks. Addressing the challenge of limited pathol...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-01-01
|
Series: | Ain Shams Engineering Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2090447924005926 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The integration of artificial intelligence in medical imaging has witnessed a surge in neural network applications for pathological image classification, with Vision Transformers (ViTs) emerging as highly accurate models in general visual recognition tasks. Addressing the challenge of limited pathological image data, this study introduces HiLa, a novel training framework for ViTs. The framework leverages pre-training on non-medical data, followed by an adaptive fine-tuning process using medical data to bridge the data gap. Furthermore, to fuse the text and image modalities, we propose the Co-attention Fusion Block (CaFB) module, enabling the development of a Multi-scale full Transformer-based network (FTNet). Extensive experiments on diverse datasets demonstrate the efficacy of the HiLa framework, achieving 99.77% accuracy on the BreakHis dataset for 8-class classification. Combined with FTNet-HiLa, it sets a new benchmark in benign-malignant classification. These findings highlight the improved performance of ViT models with HiLa and its scalability to multimodal applications. |
---|---|
ISSN: | 2090-4479 |