FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba

Abstract Multimodal image fusion aims to integrate information from different imaging techniques to produce a comprehensive, detail-rich single image for downstream vision tasks. Existing methods based on local convolutional neural networks (CNNs) struggle to capture global features efficiently, whi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xinyu Xie, Yawen Cui, Tao Tan, Xubin Zheng, Zitong Yu
Format:	Article
Language:	English
Published:	Springer 2024-12-01
Series:	Visual Intelligence
Subjects:	Multimodal Image fusion Feature enhancement Mamba
Online Access:	https://doi.org/10.1007/s44267-024-00072-9
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841559032411717632
author	Xinyu Xie Yawen Cui Tao Tan Xubin Zheng Zitong Yu
author_facet	Xinyu Xie Yawen Cui Tao Tan Xubin Zheng Zitong Yu
author_sort	Xinyu Xie
collection	DOAJ
description	Abstract Multimodal image fusion aims to integrate information from different imaging techniques to produce a comprehensive, detail-rich single image for downstream vision tasks. Existing methods based on local convolutional neural networks (CNNs) struggle to capture global features efficiently, while Transformer-based models are computationally expensive, although they excel at global modeling. Mamba addresses these limitations by leveraging selective structured state space models (S4) to effectively handle long-range dependencies while maintaining linear complexity. In this paper, we propose FusionMamba, a novel dynamic feature enhancement framework that aims to overcome the challenges faced by CNNs and Vision Transformers (ViTs) in computer vision tasks. The framework improves the visual state-space model Mamba by integrating dynamic convolution and channel attention mechanisms, which not only retains its powerful global feature modeling capability, but also greatly reduces redundancy and enhances the expressiveness of local features. In addition, we have developed a new module called the dynamic feature fusion module (DFFM). It combines the dynamic feature enhancement module (DFEM) for texture enhancement and disparity perception with the cross-modal fusion Mamba module (CMFM), which focuses on enhancing the inter-modal correlation while suppressing redundant information. Experiments show that FusionMamba achieves state-of-the-art performance in a variety of multimodal image fusion tasks as well as downstream experiments, demonstrating its broad applicability and superiority.
format	Article
id	doaj-art-de47da3dd51e47cf8bf6e25c0832fdc8
institution	Kabale University
issn	2731-9008
language	English
publishDate	2024-12-01
publisher	Springer
record_format	Article
series	Visual Intelligence
spelling	doaj-art-de47da3dd51e47cf8bf6e25c0832fdc82025-01-05T12:50:16ZengSpringerVisual Intelligence2731-90082024-12-012111810.1007/s44267-024-00072-9FusionMamba: dynamic feature enhancement for multimodal image fusion with MambaXinyu Xie0Yawen Cui1Tao Tan2Xubin Zheng3Zitong Yu4School of Computing and Information Technology, Great Bay UniversityDepartment of Electrical and Electronic Engineering, The Hong Kong Polytechnic UniversityFaculty of Applied Sciences, Macao Polytechnic UniversitySchool of Computing and Information Technology, Great Bay UniversitySchool of Computing and Information Technology, Great Bay UniversityAbstract Multimodal image fusion aims to integrate information from different imaging techniques to produce a comprehensive, detail-rich single image for downstream vision tasks. Existing methods based on local convolutional neural networks (CNNs) struggle to capture global features efficiently, while Transformer-based models are computationally expensive, although they excel at global modeling. Mamba addresses these limitations by leveraging selective structured state space models (S4) to effectively handle long-range dependencies while maintaining linear complexity. In this paper, we propose FusionMamba, a novel dynamic feature enhancement framework that aims to overcome the challenges faced by CNNs and Vision Transformers (ViTs) in computer vision tasks. The framework improves the visual state-space model Mamba by integrating dynamic convolution and channel attention mechanisms, which not only retains its powerful global feature modeling capability, but also greatly reduces redundancy and enhances the expressiveness of local features. In addition, we have developed a new module called the dynamic feature fusion module (DFFM). It combines the dynamic feature enhancement module (DFEM) for texture enhancement and disparity perception with the cross-modal fusion Mamba module (CMFM), which focuses on enhancing the inter-modal correlation while suppressing redundant information. Experiments show that FusionMamba achieves state-of-the-art performance in a variety of multimodal image fusion tasks as well as downstream experiments, demonstrating its broad applicability and superiority.https://doi.org/10.1007/s44267-024-00072-9MultimodalImage fusionFeature enhancementMamba
spellingShingle	Xinyu Xie Yawen Cui Tao Tan Xubin Zheng Zitong Yu FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba Visual Intelligence Multimodal Image fusion Feature enhancement Mamba
title	FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba
title_full	FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba
title_fullStr	FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba
title_full_unstemmed	FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba
title_short	FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba
title_sort	fusionmamba dynamic feature enhancement for multimodal image fusion with mamba
topic	Multimodal Image fusion Feature enhancement Mamba
url	https://doi.org/10.1007/s44267-024-00072-9
work_keys_str_mv	AT xinyuxie fusionmambadynamicfeatureenhancementformultimodalimagefusionwithmamba AT yawencui fusionmambadynamicfeatureenhancementformultimodalimagefusionwithmamba AT taotan fusionmambadynamicfeatureenhancementformultimodalimagefusionwithmamba AT xubinzheng fusionmambadynamicfeatureenhancementformultimodalimagefusionwithmamba AT zitongyu fusionmambadynamicfeatureenhancementformultimodalimagefusionwithmamba

FusionMamba: dynamic feature enhancement for multimodal image fusion with Mamba

Similar Items