FTNet-HiLa: An adaptive multimodal histopathological image categorization network

The integration of artificial intelligence in medical imaging has witnessed a surge in neural network applications for pathological image classification, with Vision Transformers (ViTs) emerging as highly accurate models in general visual recognition tasks. Addressing the challenge of limited pathol...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuo Yin, Dong Zhang, YongKang Zhang, Xing Zhao, XuYing Zhao
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Ain Shams Engineering Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2090447924005926
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841526276828954624
author Shuo Yin
Dong Zhang
YongKang Zhang
Xing Zhao
XuYing Zhao
author_facet Shuo Yin
Dong Zhang
YongKang Zhang
Xing Zhao
XuYing Zhao
author_sort Shuo Yin
collection DOAJ
description The integration of artificial intelligence in medical imaging has witnessed a surge in neural network applications for pathological image classification, with Vision Transformers (ViTs) emerging as highly accurate models in general visual recognition tasks. Addressing the challenge of limited pathological image data, this study introduces HiLa, a novel training framework for ViTs. The framework leverages pre-training on non-medical data, followed by an adaptive fine-tuning process using medical data to bridge the data gap. Furthermore, to fuse the text and image modalities, we propose the Co-attention Fusion Block (CaFB) module, enabling the development of a Multi-scale full Transformer-based network (FTNet). Extensive experiments on diverse datasets demonstrate the efficacy of the HiLa framework, achieving 99.77% accuracy on the BreakHis dataset for 8-class classification. Combined with FTNet-HiLa, it sets a new benchmark in benign-malignant classification. These findings highlight the improved performance of ViT models with HiLa and its scalability to multimodal applications.
format Article
id doaj-art-6c25989bd1874c40b1bb87fc08f5f882
institution Kabale University
issn 2090-4479
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Ain Shams Engineering Journal
spelling doaj-art-6c25989bd1874c40b1bb87fc08f5f8822025-01-17T04:49:25ZengElsevierAin Shams Engineering Journal2090-44792025-01-01161103211FTNet-HiLa: An adaptive multimodal histopathological image categorization networkShuo Yin0Dong Zhang1YongKang Zhang2Xing Zhao3XuYing Zhao4School of Mathematical Sciences, Capital Normal University, Beijing, 100048, ChinaSchool of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, ChinaSchool of Computer Science and Engineering, Beihang University, Beijing, 100191, ChinaSchool of Mathematical Sciences, Capital Normal University, Beijing, 100048, China; Beijing Advanced Innovation Center for Imaging Technology, Capital Normal University, Beijing, 100048, China; Shenzhen National Applied Mathematics Center, Southern University of Science and Technology, Shenzhen, 518055, ChinaSchool of Mathematical Sciences, Capital Normal University, Beijing, 100048, China; Corresponding author.The integration of artificial intelligence in medical imaging has witnessed a surge in neural network applications for pathological image classification, with Vision Transformers (ViTs) emerging as highly accurate models in general visual recognition tasks. Addressing the challenge of limited pathological image data, this study introduces HiLa, a novel training framework for ViTs. The framework leverages pre-training on non-medical data, followed by an adaptive fine-tuning process using medical data to bridge the data gap. Furthermore, to fuse the text and image modalities, we propose the Co-attention Fusion Block (CaFB) module, enabling the development of a Multi-scale full Transformer-based network (FTNet). Extensive experiments on diverse datasets demonstrate the efficacy of the HiLa framework, achieving 99.77% accuracy on the BreakHis dataset for 8-class classification. Combined with FTNet-HiLa, it sets a new benchmark in benign-malignant classification. These findings highlight the improved performance of ViT models with HiLa and its scalability to multimodal applications.http://www.sciencedirect.com/science/article/pii/S2090447924005926Image classificationHistopathologicalMultimodalTransformer-based
spellingShingle Shuo Yin
Dong Zhang
YongKang Zhang
Xing Zhao
XuYing Zhao
FTNet-HiLa: An adaptive multimodal histopathological image categorization network
Ain Shams Engineering Journal
Image classification
Histopathological
Multimodal
Transformer-based
title FTNet-HiLa: An adaptive multimodal histopathological image categorization network
title_full FTNet-HiLa: An adaptive multimodal histopathological image categorization network
title_fullStr FTNet-HiLa: An adaptive multimodal histopathological image categorization network
title_full_unstemmed FTNet-HiLa: An adaptive multimodal histopathological image categorization network
title_short FTNet-HiLa: An adaptive multimodal histopathological image categorization network
title_sort ftnet hila an adaptive multimodal histopathological image categorization network
topic Image classification
Histopathological
Multimodal
Transformer-based
url http://www.sciencedirect.com/science/article/pii/S2090447924005926
work_keys_str_mv AT shuoyin ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork
AT dongzhang ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork
AT yongkangzhang ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork
AT xingzhao ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork
AT xuyingzhao ftnethilaanadaptivemultimodalhistopathologicalimagecategorizationnetwork