INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image Retrieval

The Pedestrian Text-Image Retrieval task aims to retrieve the target pedestrian image based on textual description. The primary challenge of this task lies in mapping two heteromodal data (visual and textual descriptions) into a unified feature space. Previous approaches have focused on global or lo...

Full description

Saved in:
Bibliographic Details
Main Authors: Kehao Wang, Yuhui Wang, Lian Xue, Qifeng Li
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10818620/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841557020919988224
author Kehao Wang
Yuhui Wang
Lian Xue
Qifeng Li
author_facet Kehao Wang
Yuhui Wang
Lian Xue
Qifeng Li
author_sort Kehao Wang
collection DOAJ
description The Pedestrian Text-Image Retrieval task aims to retrieve the target pedestrian image based on textual description. The primary challenge of this task lies in mapping two heteromodal data (visual and textual descriptions) into a unified feature space. Previous approaches have focused on global or local matching methods. However, global matching methods are susceptible to result in weak alignment, while local matching methods may lead to ambiguous matching phenomenon. In order to address the issues arising from the above methods, we introduce Implicit Neighbourhood Reranking Network (INRNet) which utilizes a bilateral feature extractor to learn global image-text matching knowledge and leverages nearest neighbors as prior knowledge to mine positive samples. Specifically, our proposed approach involves using the bilateral feature extractor to extract features from both texts and pedestrian images and employs a Similarity Distribution Matching (SDM) method to establish preliminary global text-image alignment. Subsequently we establish a Neighborhood Data Construction Mechanism (NDCM), restructuring the data for re-ranking tasks. Finally, we input the restructured data into our Implicit Neighborhood Inference (INI) module, utilizing nearest neighbor intersection to optimize retrieval performance. Through extensive experimentation, our proposed method demonstrates superior performance across three public datasets.
format Article
id doaj-art-70cd2ca0333b463495cbeb2ab5fc6b37
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-70cd2ca0333b463495cbeb2ab5fc6b372025-01-07T00:01:38ZengIEEEIEEE Access2169-35362025-01-01131470148010.1109/ACCESS.2024.351853510818620INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image RetrievalKehao Wang0https://orcid.org/0000-0001-9843-8104Yuhui Wang1Lian Xue2Qifeng Li3School of Information Engineering, Wuhan University of Technology, Wuhan, ChinaSchool of Information Engineering, Wuhan University of Technology, Wuhan, ChinaSchool of Artificial Intelligence, Wuhan Technology and Business University, Wuhan, ChinaSchool of Information Engineering, Wuhan University of Technology, Wuhan, ChinaThe Pedestrian Text-Image Retrieval task aims to retrieve the target pedestrian image based on textual description. The primary challenge of this task lies in mapping two heteromodal data (visual and textual descriptions) into a unified feature space. Previous approaches have focused on global or local matching methods. However, global matching methods are susceptible to result in weak alignment, while local matching methods may lead to ambiguous matching phenomenon. In order to address the issues arising from the above methods, we introduce Implicit Neighbourhood Reranking Network (INRNet) which utilizes a bilateral feature extractor to learn global image-text matching knowledge and leverages nearest neighbors as prior knowledge to mine positive samples. Specifically, our proposed approach involves using the bilateral feature extractor to extract features from both texts and pedestrian images and employs a Similarity Distribution Matching (SDM) method to establish preliminary global text-image alignment. Subsequently we establish a Neighborhood Data Construction Mechanism (NDCM), restructuring the data for re-ranking tasks. Finally, we input the restructured data into our Implicit Neighborhood Inference (INI) module, utilizing nearest neighbor intersection to optimize retrieval performance. Through extensive experimentation, our proposed method demonstrates superior performance across three public datasets.https://ieeexplore.ieee.org/document/10818620/Pedestrian text-image retrievalnearest-neighbor datare-rankingmultimodal retrieval
spellingShingle Kehao Wang
Yuhui Wang
Lian Xue
Qifeng Li
INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image Retrieval
IEEE Access
Pedestrian text-image retrieval
nearest-neighbor data
re-ranking
multimodal retrieval
title INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image Retrieval
title_full INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image Retrieval
title_fullStr INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image Retrieval
title_full_unstemmed INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image Retrieval
title_short INRNet: Neighborhood Re-Ranking-Based Method for Pedestrian Text-Image Retrieval
title_sort inrnet neighborhood re ranking based method for pedestrian text image retrieval
topic Pedestrian text-image retrieval
nearest-neighbor data
re-ranking
multimodal retrieval
url https://ieeexplore.ieee.org/document/10818620/
work_keys_str_mv AT kehaowang inrnetneighborhoodrerankingbasedmethodforpedestriantextimageretrieval
AT yuhuiwang inrnetneighborhoodrerankingbasedmethodforpedestriantextimageretrieval
AT lianxue inrnetneighborhoodrerankingbasedmethodforpedestriantextimageretrieval
AT qifengli inrnetneighborhoodrerankingbasedmethodforpedestriantextimageretrieval