AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification
Abstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token represen...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-11-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-024-76781-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846171860115390464 |
|---|---|
| author | Shuze Geng Qiudong Yu Haowei Wang Ziyi Song |
| author_facet | Shuze Geng Qiudong Yu Haowei Wang Ziyi Song |
| author_sort | Shuze Geng |
| collection | DOAJ |
| description | Abstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token representation convergence. To address these issues, we introduce AIRHF-Net: an Adaptive Interaction Representation Hierarchical Fusion Network, named AIRHF-Net, designed to enhance pedestrian identity recognition in occluded scenarios. Our approach begins with the development of an Adaptive Local-Window Interaction Encoder (AL-WIE), which aims to overcome the inherent subjective limitations of traditional sparse attention mechanisms. This innovative encoder merges window attention, adaptive local attention, and interaction attention, facilitating automatic localization and focusing on visible pedestrian regions within images. It effectively extracts contextual information from window-level features while minimizing the impact of occlusion noise. Additionally, recognizing that ViTs may lose spatial information in deeper structural layers, we implement a Local Hierarchical Encoder (LHE). This component segments the input sequence in the spatial dimension, integrating features from various spatial positions to construct hierarchical local representations that substantially enhance feature discriminability. To further augment the quality and breadth of datasets, we adopt an Occlusion Data Augmentation Strategy (ODAS), which bolsters the model’s capacity to extract critical information under occluded conditions. Extensive experiments demonstrate that our method achieves improved performance on the Occluded-DukeMTMC dataset, with a rank-1 accuracy of 69.6% and an mAP of 61.6%. |
| format | Article |
| id | doaj-art-028048a0e2ee4199925daa3f68711cee |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-028048a0e2ee4199925daa3f68711cee2024-11-10T12:26:35ZengNature PortfolioScientific Reports2045-23222024-11-0114112110.1038/s41598-024-76781-4AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identificationShuze Geng0Qiudong Yu1Haowei Wang2Ziyi Song3School of Information Technology and Engineering, Tianjin University of Technology and EducationSchool of Information Technology and Engineering, Tianjin University of Technology and EducationSchool of Artificial Intelligence, Hebei University of TechnologySchool of Artificial Intelligence, Hebei University of TechnologyAbstract To tackle the high resource consumption in occluded person re-identification, sparse attention mechanisms based on Vision Transformers (ViTs) have become popular. However, they often suffer from performance degradation with long sequences, omission of crucial information, and token representation convergence. To address these issues, we introduce AIRHF-Net: an Adaptive Interaction Representation Hierarchical Fusion Network, named AIRHF-Net, designed to enhance pedestrian identity recognition in occluded scenarios. Our approach begins with the development of an Adaptive Local-Window Interaction Encoder (AL-WIE), which aims to overcome the inherent subjective limitations of traditional sparse attention mechanisms. This innovative encoder merges window attention, adaptive local attention, and interaction attention, facilitating automatic localization and focusing on visible pedestrian regions within images. It effectively extracts contextual information from window-level features while minimizing the impact of occlusion noise. Additionally, recognizing that ViTs may lose spatial information in deeper structural layers, we implement a Local Hierarchical Encoder (LHE). This component segments the input sequence in the spatial dimension, integrating features from various spatial positions to construct hierarchical local representations that substantially enhance feature discriminability. To further augment the quality and breadth of datasets, we adopt an Occlusion Data Augmentation Strategy (ODAS), which bolsters the model’s capacity to extract critical information under occluded conditions. Extensive experiments demonstrate that our method achieves improved performance on the Occluded-DukeMTMC dataset, with a rank-1 accuracy of 69.6% and an mAP of 61.6%.https://doi.org/10.1038/s41598-024-76781-4OccludedRe-identificationAdaptive interactionHierarchical fusion |
| spellingShingle | Shuze Geng Qiudong Yu Haowei Wang Ziyi Song AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification Scientific Reports Occluded Re-identification Adaptive interaction Hierarchical fusion |
| title | AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification |
| title_full | AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification |
| title_fullStr | AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification |
| title_full_unstemmed | AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification |
| title_short | AIRHF-Net: an adaptive interaction representation hierarchical fusion network for occluded person re-identification |
| title_sort | airhf net an adaptive interaction representation hierarchical fusion network for occluded person re identification |
| topic | Occluded Re-identification Adaptive interaction Hierarchical fusion |
| url | https://doi.org/10.1038/s41598-024-76781-4 |
| work_keys_str_mv | AT shuzegeng airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification AT qiudongyu airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification AT haoweiwang airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification AT ziyisong airhfnetanadaptiveinteractionrepresentationhierarchicalfusionnetworkforoccludedpersonreidentification |