Low-Resolution Target Detection with High-Frequency Information Preservation
In the absence of high-frequency visual observation, low-resolution (LR) targets (e.g., objects, human body keypoints) are intrinsically difficult to detect in unconstrained images. This challenge can be further exasperated by typical downsampling operations (e.g., pooling, stride) of existing deep...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/1/103 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841549469316808704 |
---|---|
author | Feng Zhang Hongyang Bai Wenlong Yin Ze Li Hailong Ma Lei Chen |
author_facet | Feng Zhang Hongyang Bai Wenlong Yin Ze Li Hailong Ma Lei Chen |
author_sort | Feng Zhang |
collection | DOAJ |
description | In the absence of high-frequency visual observation, low-resolution (LR) targets (e.g., objects, human body keypoints) are intrinsically difficult to detect in unconstrained images. This challenge can be further exasperated by typical downsampling operations (e.g., pooling, stride) of existing deep networks (e.g., CNNs). To tackle this challenge, in this work, we introduce a generic, <i>High-Frequency Information Preservation</i> (<b>HFIP</b>) block as a replacement for existing downsampling operations. It is composed of two key components: (1) the decoupled high-frequency learning component, which extracts the high-frequency information along the vertical and horizontal directions separately, and (2) the dilated frequency-aware channel correlation component, which decomposes the input feature map into multiple smaller ones in a dilated manner, concatenates them by channel, and then correlates the combined channels in the frequency space. Our module can generally be integrated into existing network architectures for target detection (e.g., YOLO, HRNet). Extensive experiments on low-resolution human pose estimation and object detection tasks show that our HFIP technique can generally boost the performance of state-of-the-art detection models significantly, e.g., improving the object detection accuracy of YOLOv5s by an absolute margin of 3.30% in mAP under a resolution of 640 × 640 compared to the COCO benchmark. |
format | Article |
id | doaj-art-e8185dbb6f8347cbb330d173bfcc0ac0 |
institution | Kabale University |
issn | 2076-3417 |
language | English |
publishDate | 2024-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj-art-e8185dbb6f8347cbb330d173bfcc0ac02025-01-10T13:14:27ZengMDPI AGApplied Sciences2076-34172024-12-0115110310.3390/app15010103Low-Resolution Target Detection with High-Frequency Information PreservationFeng Zhang0Hongyang Bai1Wenlong Yin2Ze Li3Hailong Ma4Lei Chen5School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaIn the absence of high-frequency visual observation, low-resolution (LR) targets (e.g., objects, human body keypoints) are intrinsically difficult to detect in unconstrained images. This challenge can be further exasperated by typical downsampling operations (e.g., pooling, stride) of existing deep networks (e.g., CNNs). To tackle this challenge, in this work, we introduce a generic, <i>High-Frequency Information Preservation</i> (<b>HFIP</b>) block as a replacement for existing downsampling operations. It is composed of two key components: (1) the decoupled high-frequency learning component, which extracts the high-frequency information along the vertical and horizontal directions separately, and (2) the dilated frequency-aware channel correlation component, which decomposes the input feature map into multiple smaller ones in a dilated manner, concatenates them by channel, and then correlates the combined channels in the frequency space. Our module can generally be integrated into existing network architectures for target detection (e.g., YOLO, HRNet). Extensive experiments on low-resolution human pose estimation and object detection tasks show that our HFIP technique can generally boost the performance of state-of-the-art detection models significantly, e.g., improving the object detection accuracy of YOLOv5s by an absolute margin of 3.30% in mAP under a resolution of 640 × 640 compared to the COCO benchmark.https://www.mdpi.com/2076-3417/15/1/103low-resolution objectdownsampling operationhigh-frequencytarget detection |
spellingShingle | Feng Zhang Hongyang Bai Wenlong Yin Ze Li Hailong Ma Lei Chen Low-Resolution Target Detection with High-Frequency Information Preservation Applied Sciences low-resolution object downsampling operation high-frequency target detection |
title | Low-Resolution Target Detection with High-Frequency Information Preservation |
title_full | Low-Resolution Target Detection with High-Frequency Information Preservation |
title_fullStr | Low-Resolution Target Detection with High-Frequency Information Preservation |
title_full_unstemmed | Low-Resolution Target Detection with High-Frequency Information Preservation |
title_short | Low-Resolution Target Detection with High-Frequency Information Preservation |
title_sort | low resolution target detection with high frequency information preservation |
topic | low-resolution object downsampling operation high-frequency target detection |
url | https://www.mdpi.com/2076-3417/15/1/103 |
work_keys_str_mv | AT fengzhang lowresolutiontargetdetectionwithhighfrequencyinformationpreservation AT hongyangbai lowresolutiontargetdetectionwithhighfrequencyinformationpreservation AT wenlongyin lowresolutiontargetdetectionwithhighfrequencyinformationpreservation AT zeli lowresolutiontargetdetectionwithhighfrequencyinformationpreservation AT hailongma lowresolutiontargetdetectionwithhighfrequencyinformationpreservation AT leichen lowresolutiontargetdetectionwithhighfrequencyinformationpreservation |