FTNet-HiLa: An adaptive multimodal histopathological image categorization network
The integration of artificial intelligence in medical imaging has witnessed a surge in neural network applications for pathological image classification, with Vision Transformers (ViTs) emerging as highly accurate models in general visual recognition tasks. Addressing the challenge of limited pathol...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-01-01
|
Series: | Ain Shams Engineering Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2090447924005926 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841526276828954624 |
---|---|
author | Shuo Yin Dong Zhang YongKang Zhang Xing Zhao XuYing Zhao |
author_facet | Shuo Yin Dong Zhang YongKang Zhang Xing Zhao XuYing Zhao |
author_sort | Shuo Yin |
collection | DOAJ |
description | The integration of artificial intelligence in medical imaging has witnessed a surge in neural network applications for pathological image classification, with Vision Transformers (ViTs) emerging as highly accurate models in general visual recognition tasks. Addressing the challenge of limited pathological image data, this study introduces HiLa, a novel training framework for ViTs. The framework leverages pre-training on non-medical data, followed by an adaptive fine-tuning process using medical data to bridge the data gap. Furthermore, to fuse the text and image modalities, we propose the Co-attention Fusion Block (CaFB) module, enabling the development of a Multi-scale full Transformer-based network (FTNet). Extensive experiments on diverse datasets demonstrate the efficacy of the HiLa framework, achieving 99.77% accuracy on the BreakHis dataset for 8-class classification. Combined with FTNet-HiLa, it sets a new benchmark in benign-malignant classification. These findings highlight the improved performance of ViT models with HiLa and its scalability to multimodal applications. |
format | Article |
id | doaj-art-6c25989bd1874c40b1bb87fc08f5f882 |
institution | Kabale University |
issn | 2090-4479 |
language | English |
publishDate | 2025-01-01 |
publisher | Elsevier |
record_format | Article |
series | Ain Shams Engineering Journal |
spelling | doaj-art-6c25989bd1874c40b1bb87fc08f5f8822025-01-17T04:49:25ZengElsevierAin Shams Engineering Journal2090-44792025-01-01161103211FTNet-HiLa: An adaptive multimodal histopathological image categorization networkShuo Yin0Dong Zhang1YongKang Zhang2Xing Zhao3XuYing Zhao4School of Mathematical Sciences, Capital Normal University, Beijing, 100048, ChinaSchool of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, ChinaSchool of Computer Science and Engineering, Beihang University, Beijing, 100191, ChinaSchool of Mathematical Sciences, Capital Normal University, Beijing, 100048, China; Beijing Advanced Innovation Center for Imaging Technology, Capital Normal University, Beijing, 100048, China; Shenzhen National Applied Mathematics Center, Southern University of Science and Technology, Shenzhen, 518055, ChinaSchool of Mathematical Sciences, Capital Normal University, Beijing, 100048, China; Corresponding author.The integration of artificial intelligence in medical imaging has witnessed a surge in neural network applications for pathological image classification, with Vision Transformers (ViTs) emerging as highly accurate models in general visual recognition tasks. Addressing the challenge of limited pathological image data, this study introduces HiLa, a novel training framework for ViTs. The framework leverages pre-training on non-medical data, followed by an adaptive fine-tuning process using medical data to bridge the data gap. Furthermore, to fuse the text and image modalities, we propose the Co-attention Fusion Block (CaFB) module, enabling the development of a Multi-scale full Transformer-based network (FTNet). Extensive experiments on diverse datasets demonstrate the efficacy of the HiLa framework, achieving 99.77% accuracy on the BreakHis dataset for 8-class classification. Combined with FTNet-HiLa, it sets a new benchmark in benign-malignant classification. These findings highlight the improved performance of ViT models with HiLa and its scalability to multimodal applications.http://www.sciencedirect.com/science/article/pii/S2090447924005926Image classificationHistopathologicalMultimodalTransformer-based |
spellingShingle | Shuo Yin Dong Zhang YongKang Zhang Xing Zhao XuYing Zhao FTNet-HiLa: An adaptive multimodal histopathological image categorization network Ain Shams Engineering Journal Image classification Histopathological Multimodal Transformer-based |
title | FTNet-HiLa: An adaptive multimodal histopathological image categorization network |
title_full | FTNet-HiLa: An adaptive multimodal histopathological image categorization network |
title_fullStr | FTNet-HiLa: An adaptive multimodal histopathological image categorization network |
title_full_unstemmed | FTNet-HiLa: An adaptive multimodal histopathological image categorization network |
title_short | FTNet-HiLa: An adaptive multimodal histopathological image categorization network |
title_sort | ftnet hila an adaptive multimodal histopathological image categorization network |
topic | Image classification Histopathological Multimodal Transformer-based |
url | http://www.sciencedirect.com/science/article/pii/S2090447924005926 |
work_keys_str_mv | AT shuoyin ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork AT dongzhang ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork AT yongkangzhang ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork AT xingzhao ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork AT xuyingzhao ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork |