INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image Retrieval

The Pedestrian Text-Image Retrieval task aims to retrieve the target pedestrian image based on textual description. The primary challenge of this task lies in mapping two heteromodal data (visual and textual descriptions) into a unified feature space. Previous approaches have focused on global or lo...

Full description

Saved in:
Bibliographic Details
Main Authors: Kehao Wang, Yuhui Wang, Lian Xue, Qifeng Li
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10818620/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Pedestrian Text-Image Retrieval task aims to retrieve the target pedestrian image based on textual description. The primary challenge of this task lies in mapping two heteromodal data (visual and textual descriptions) into a unified feature space. Previous approaches have focused on global or local matching methods. However, global matching methods are susceptible to result in weak alignment, while local matching methods may lead to ambiguous matching phenomenon. In order to address the issues arising from the above methods, we introduce Implicit Neighbourhood Reranking Network (INRNet) which utilizes a bilateral feature extractor to learn global image-text matching knowledge and leverages nearest neighbors as prior knowledge to mine positive samples. Specifically, our proposed approach involves using the bilateral feature extractor to extract features from both texts and pedestrian images and employs a Similarity Distribution Matching (SDM) method to establish preliminary global text-image alignment. Subsequently we establish a Neighborhood Data Construction Mechanism (NDCM), restructuring the data for re-ranking tasks. Finally, we input the restructured data into our Implicit Neighborhood Inference (INI) module, utilizing nearest neighbor intersection to optimize retrieval performance. Through extensive experimentation, our proposed method demonstrates superior performance across three public datasets.
ISSN:2169-3536