Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images

Abstract The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex arc...

Full description

Saved in:
Bibliographic Details
Main Authors: Mouhamed Laid Abimouloud, Khaled Bensid, Mohamed Elleuch, Mohamed Ben Ammar, Monji Kherallah
Format: Article
Language:English
Published: SpringerOpen 2025-01-01
Series:Visual Computing for Industry, Biomedicine, and Art
Subjects:
Online Access:https://doi.org/10.1186/s42492-024-00181-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544981016215552
author Mouhamed Laid Abimouloud
Khaled Bensid
Mohamed Elleuch
Mohamed Ben Ammar
Monji Kherallah
author_facet Mouhamed Laid Abimouloud
Khaled Bensid
Mohamed Elleuch
Mohamed Ben Ammar
Monji Kherallah
author_sort Mouhamed Laid Abimouloud
collection DOAJ
description Abstract The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .
format Article
id doaj-art-785fe16be0b3406189c8b1b0c2b23adf
institution Kabale University
issn 2524-4442
language English
publishDate 2025-01-01
publisher SpringerOpen
record_format Article
series Visual Computing for Industry, Biomedicine, and Art
spelling doaj-art-785fe16be0b3406189c8b1b0c2b23adf2025-01-12T12:06:51ZengSpringerOpenVisual Computing for Industry, Biomedicine, and Art2524-44422025-01-018112710.1186/s42492-024-00181-8Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology imagesMouhamed Laid Abimouloud0Khaled Bensid1Mohamed Elleuch2Mohamed Ben Ammar3Monji Kherallah4National Engineering School of Sfax, University of SfaxLaboratory of Electrical Engineering (LAGE), University of KASDI Merbah OuarglaNational School of Computer Science (ENSI), University of ManoubaDepartment of Information Systems, Faculty of Computing and Information Technology, Northern Border UniversityFaculty of SciencesAbstract The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .https://doi.org/10.1186/s42492-024-00181-8Breast cancerConvolutional vision transformerHistopathological imagesMulti classificationBrekhis
spellingShingle Mouhamed Laid Abimouloud
Khaled Bensid
Mohamed Elleuch
Mohamed Ben Ammar
Monji Kherallah
Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
Visual Computing for Industry, Biomedicine, and Art
Breast cancer
Convolutional vision transformer
Histopathological images
Multi classification
Brekhis
title Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_full Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_fullStr Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_full_unstemmed Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_short Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images
title_sort advancing breast cancer diagnosis token vision transformers for faster and accurate classification of histopathology images
topic Breast cancer
Convolutional vision transformer
Histopathological images
Multi classification
Brekhis
url https://doi.org/10.1186/s42492-024-00181-8
work_keys_str_mv AT mouhamedlaidabimouloud advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages
AT khaledbensid advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages
AT mohamedelleuch advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages
AT mohamedbenammar advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages
AT monjikherallah advancingbreastcancerdiagnosistokenvisiontransformersforfasterandaccurateclassificationofhistopathologyimages