Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images

Abstract The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex arc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mouhamed Laid Abimouloud, Khaled Bensid, Mohamed Elleuch, Mohamed Ben Ammar, Monji Kherallah
Format:	Article
Language:	English
Published:	SpringerOpen 2025-01-01
Series:	Visual Computing for Industry, Biomedicine, and Art
Subjects:	Breast cancer Convolutional vision transformer Histopathological images Multi classification Brekhis
Online Access:	https://doi.org/10.1186/s42492-024-00181-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841544981016215552
author	Mouhamed Laid Abimouloud Khaled Bensid Mohamed Elleuch Mohamed Ben Ammar Monji Kherallah
author_facet	Mouhamed Laid Abimouloud Khaled Bensid Mohamed Elleuch Mohamed Ben Ammar Monji Kherallah
author_sort	Mouhamed Laid Abimouloud
collection	DOAJ
description	Abstract The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .
format	Article
id	doaj-art-785fe16be0b3406189c8b1b0c2b23adf
institution	Kabale University
issn	2524-4442
language	English
publishDate	2025-01-01
publisher	SpringerOpen
record_format	Article
series	Visual Computing for Industry, Biomedicine, and Art
spelling	doaj-art-785fe16be0b3406189c8b1b0c2b23adf2025-01-12T12:06:51ZengSpringerOpenVisual Computing for Industry, Biomedicine, and Art2524-44422025-01-018112710.1186/s42492-024-00181-8Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology imagesMouhamed Laid Abimouloud0Khaled Bensid1Mohamed Elleuch2Mohamed Ben Ammar3Monji Kherallah4National Engineering School of Sfax, University of SfaxLaboratory of Electrical Engineering (LAGE), University of KASDI Merbah OuarglaNational School of Computer Science (ENSI), University of ManoubaDepartment of Information Systems, Faculty of Computing and Information Technology, Northern Border UniversityFaculty of SciencesAbstract The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .https://doi.org/10.1186/s42492-024-00181-8Breast cancerConvolutional vision transformerHistopathological imagesMulti classificationBrekhis
spellingShingle	Mouhamed Laid Abimouloud Khaled Bensid Mohamed Elleuch Mohamed Ben Ammar Monji Kherallah Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images Visual Computing for Industry, Biomedicine, and Art Breast cancer Convolutional vision transformer Histopathological images Multi classification Brekhis
title	Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_full	Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_fullStr	Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_full_unstemmed	Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_short	Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_sort	advancing breast cancer diagnosis token vision transformers for faster and accurate classification of histopathology images
topic	Breast cancer Convolutional vision transformer Histopathological images Multi classification Brekhis
url	https://doi.org/10.1186/s42492-024-00181-8
work_keys_str_mv	AT mouhamedlaidabimouloud advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages AT khaledbensid advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages AT mohamedelleuch advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages AT mohamedbenammar advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages AT monjikherallah advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages

Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images

Similar Items