Progressive multi-subspace fusion for text-image matching
Abstract Text-image cross-model matching is a core challenge in multimodal machine learning, aiming to enable efficient retrieval of images and texts across different modalities. The difficulty in this task stems from the inherent gap between text and image representations, which can lead to subopti...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-06-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-025-01946-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Text-image cross-model matching is a core challenge in multimodal machine learning, aiming to enable efficient retrieval of images and texts across different modalities. The difficulty in this task stems from the inherent gap between text and image representations, which can lead to suboptimal retrieval performance. Traditional approaches attempt to learn a shared representation space where both image and text can be directly compared. However, they often fail to account for the varying levels of semantic information captured in different layers of the encoders, resulting in inadequate alignment between the modalities. To address these limitations, we propose a novel approach called Progressive Multi-Subspace Fusion, dubbed PMSF for text-image matching. Our model reduces the model gap by using a progressive learning process, starting with shallow representations and moving to deeper layers. We use a dual-tower structure to encode multi-level features for both image and text, which are then mapped to corresponding auxiliary subspaces. These subspaces are fused through an adaptive GPO pooling strategy, enabling joint learning of a shared representation space. Experimental results on benchmark datasets, including Flickr30K and MSCOCO, show that PMSF significantly improves retrieval performance, achieving a Rsum score of 516.9 and 510.7, outperforming 23 state-of-the-art methods. |
|---|---|
| ISSN: | 2199-4536 2198-6053 |