AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification

Abstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token represen...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shuze Geng, Qiudong Yu, Haowei Wang, Ziyi Song
Format:	Article
Language:	English
Published:	Nature Portfolio 2024-11-01
Series:	Scientific Reports
Subjects:	Occluded Re-identification Adaptive interaction Hierarchical fusion
Online Access:	https://doi.org/10.1038/s41598-024-76781-4
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846171860115390464
author	Shuze Geng Qiudong Yu Haowei Wang Ziyi Song
author_facet	Shuze Geng Qiudong Yu Haowei Wang Ziyi Song
author_sort	Shuze Geng
collection	DOAJ
description	Abstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token representation convergence. To address these issues, we introduce AIRHF-Net: an Adaptive Interaction Representation Hierarchical Fusion Network, named AIRHF-Net, designed to enhance pedestrian identity recognition in occluded scenarios. Our approach begins with the development of an Adaptive Local-Window Interaction Encoder (AL-WIE), which aims to overcome the inherent subjective limitations of traditional sparse attention mechanisms. This innovative encoder merges window attention, adaptive local attention, and interaction attention, facilitating automatic localization and focusing on visible pedestrian regions within images. It effectively extracts contextual information from window-level features while minimizing the impact of occlusion noise. Additionally, recognizing that ViTs may lose spatial information in deeper structural layers, we implement a Local Hierarchical Encoder (LHE). This component segments the input sequence in the spatial dimension, integrating features from various spatial positions to construct hierarchical local representations that substantially enhance feature discriminability. To further augment the quality and breadth of datasets, we adopt an Occlusion Data Augmentation Strategy (ODAS), which bolsters the model’s capacity to extract critical information under occluded conditions. Extensive experiments demonstrate that our method achieves improved performance on the Occluded-DukeMTMC dataset, with a rank-1 accuracy of 69.6% and an mAP of 61.6%.
format	Article
id	doaj-art-028048a0e2ee4199925daa3f68711cee
institution	Kabale University
issn	2045-2322
language	English
publishDate	2024-11-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-028048a0e2ee4199925daa3f68711cee2024-11-10T12:26:35ZengNature PortfolioScientific Reports2045-23222024-11-0114112110.1038/s41598-024-76781-4AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identificationShuze Geng0Qiudong Yu1Haowei Wang2Ziyi Song3School of Information Technology and Engineering, Tianjin University of Technology and EducationSchool of Information Technology and Engineering, Tianjin University of Technology and EducationSchool of Artificial Intelligence, Hebei University of TechnologySchool of Artificial Intelligence, Hebei University of TechnologyAbstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token representation convergence. To address these issues, we introduce AIRHF-Net: an Adaptive Interaction Representation Hierarchical Fusion Network, named AIRHF-Net, designed to enhance pedestrian identity recognition in occluded scenarios. Our approach begins with the development of an Adaptive Local-Window Interaction Encoder (AL-WIE), which aims to overcome the inherent subjective limitations of traditional sparse attention mechanisms. This innovative encoder merges window attention, adaptive local attention, and interaction attention, facilitating automatic localization and focusing on visible pedestrian regions within images. It effectively extracts contextual information from window-level features while minimizing the impact of occlusion noise. Additionally, recognizing that ViTs may lose spatial information in deeper structural layers, we implement a Local Hierarchical Encoder (LHE). This component segments the input sequence in the spatial dimension, integrating features from various spatial positions to construct hierarchical local representations that substantially enhance feature discriminability. To further augment the quality and breadth of datasets, we adopt an Occlusion Data Augmentation Strategy (ODAS), which bolsters the model’s capacity to extract critical information under occluded conditions. Extensive experiments demonstrate that our method achieves improved performance on the Occluded-DukeMTMC dataset, with a rank-1 accuracy of 69.6% and an mAP of 61.6%.https://doi.org/10.1038/s41598-024-76781-4OccludedRe-identificationAdaptive interactionHierarchical fusion
spellingShingle	Shuze Geng Qiudong Yu Haowei Wang Ziyi Song AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification Scientific Reports Occluded Re-identification Adaptive interaction Hierarchical fusion
title	AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_full	AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_fullStr	AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_full_unstemmed	AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_short	AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
title_sort	airhf net an adaptive interaction representation hierarchical fusion network for occluded person re identification
topic	Occluded Re-identification Adaptive interaction Hierarchical fusion
url	https://doi.org/10.1038/s41598-024-76781-4
work_keys_str_mv	AT shuzegeng airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification AT qiudongyu airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification AT haoweiwang airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification AT ziyisong airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification

AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification

Similar Items