AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification

Abstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token represen...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuze Geng, Qiudong Yu, Haowei Wang, Ziyi Song
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-76781-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846171860115390464
author Shuze Geng
Qiudong Yu
Haowei Wang
Ziyi Song
author_facet Shuze Geng
Qiudong Yu
Haowei Wang
Ziyi Song
author_sort Shuze Geng
collection DOAJ
description Abstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token representation convergence. To address these issues, we introduce AIRHF-Net: an Adaptive Interaction Representation Hierarchical Fusion Network, named AIRHF-Net, designed to enhance pedestrian identity recognition in occluded scenarios. Our approach begins with the development of an Adaptive Local-Window Interaction Encoder (AL-WIE), which aims to overcome the inherent subjective limitations of traditional sparse attention mechanisms. This innovative encoder merges window attention, adaptive local attention, and interaction attention, facilitating automatic localization and focusing on visible pedestrian regions within images. It effectively extracts contextual information from window-level features while minimizing the impact of occlusion noise. Additionally, recognizing that ViTs may lose spatial information in deeper structural layers, we implement a Local Hierarchical Encoder (LHE). This component segments the input sequence in the spatial dimension, integrating features from various spatial positions to construct hierarchical local representations that substantially enhance feature discriminability. To further augment the quality and breadth of datasets, we adopt an Occlusion Data Augmentation Strategy (ODAS), which bolsters the model’s capacity to extract critical information under occluded conditions. Extensive experiments demonstrate that our method achieves improved performance on the Occluded-DukeMTMC dataset, with a rank-1 accuracy of 69.6% and an mAP of 61.6%.
format Article
id doaj-art-028048a0e2ee4199925daa3f68711cee
institution Kabale University
issn 2045-2322
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-028048a0e2ee4199925daa3f68711cee2024-11-10T12:26:35ZengNature PortfolioScientific Reports2045-23222024-11-0114112110.1038/s41598-024-76781-4AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identificationShuze Geng0Qiudong Yu1Haowei Wang2Ziyi Song3School of Information Technology and Engineering, Tianjin University of Technology and EducationSchool of Information Technology and Engineering, Tianjin University of Technology and EducationSchool of Artificial Intelligence, Hebei University of TechnologySchool of Artificial Intelligence, Hebei University of TechnologyAbstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token representation convergence. To address these issues, we introduce AIRHF-Net: an Adaptive Interaction Representation Hierarchical Fusion Network, named AIRHF-Net, designed to enhance pedestrian identity recognition in occluded scenarios. Our approach begins with the development of an Adaptive Local-Window Interaction Encoder (AL-WIE), which aims to overcome the inherent subjective limitations of traditional sparse attention mechanisms. This innovative encoder merges window attention, adaptive local attention, and interaction attention, facilitating automatic localization and focusing on visible pedestrian regions within images. It effectively extracts contextual information from window-level features while minimizing the impact of occlusion noise. Additionally, recognizing that ViTs may lose spatial information in deeper structural layers, we implement a Local Hierarchical Encoder (LHE). This component segments the input sequence in the spatial dimension, integrating features from various spatial positions to construct hierarchical local representations that substantially enhance feature discriminability. To further augment the quality and breadth of datasets, we adopt an Occlusion Data Augmentation Strategy (ODAS), which bolsters the model’s capacity to extract critical information under occluded conditions. Extensive experiments demonstrate that our method achieves improved performance on the Occluded-DukeMTMC dataset, with a rank-1 accuracy of 69.6% and an mAP of 61.6%.https://doi.org/10.1038/s41598-024-76781-4OccludedRe-identificationAdaptive interactionHierarchical fusion
spellingShingle Shuze Geng
Qiudong Yu
Haowei Wang
Ziyi Song
AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
Scientific Reports
Occluded
Re-identification
Adaptive interaction
Hierarchical fusion
title AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_full AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_fullStr AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_full_unstemmed AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_short AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_sort airhf net an adaptive interaction representation hierarchical fusion network for occluded person re identification
topic Occluded
Re-identification
Adaptive interaction
Hierarchical fusion
url https://doi.org/10.1038/s41598-024-76781-4
work_keys_str_mv AT shuzegeng airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification
AT qiudongyu airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification
AT haoweiwang airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification
AT ziyisong airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification