Low-Resolution Target Detection with High-Frequency Information Preservation

In the absence of high-frequency visual observation, low-resolution (LR) targets (e.g., objects, human body keypoints) are intrinsically difficult to detect in unconstrained images. This challenge can be further exasperated by typical downsampling operations (e.g., pooling, stride) of existing deep...

Full description

Saved in:
Bibliographic Details
Main Authors: Feng Zhang, Hongyang Bai, Wenlong Yin, Ze Li, Hailong Ma, Lei Chen
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/1/103
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841549469316808704
author Feng Zhang
Hongyang Bai
Wenlong Yin
Ze Li
Hailong Ma
Lei Chen
author_facet Feng Zhang
Hongyang Bai
Wenlong Yin
Ze Li
Hailong Ma
Lei Chen
author_sort Feng Zhang
collection DOAJ
description In the absence of high-frequency visual observation, low-resolution (LR) targets (e.g., objects, human body keypoints) are intrinsically difficult to detect in unconstrained images. This challenge can be further exasperated by typical downsampling operations (e.g., pooling, stride) of existing deep networks (e.g., CNNs). To tackle this challenge, in this work, we introduce a generic, <i>High-Frequency Information Preservation</i> (<b>HFIP</b>) block as a replacement for existing downsampling operations. It is composed of two key components: (1) the decoupled high-frequency learning component, which extracts the high-frequency information along the vertical and horizontal directions separately, and (2) the dilated frequency-aware channel correlation component, which decomposes the input feature map into multiple smaller ones in a dilated manner, concatenates them by channel, and then correlates the combined channels in the frequency space. Our module can generally be integrated into existing network architectures for target detection (e.g., YOLO, HRNet). Extensive experiments on low-resolution human pose estimation and object detection tasks show that our HFIP technique can generally boost the performance of state-of-the-art detection models significantly, e.g., improving the object detection accuracy of YOLOv5s by an absolute margin of 3.30% in mAP under a resolution of 640 × 640 compared to the COCO benchmark.
format Article
id doaj-art-e8185dbb6f8347cbb330d173bfcc0ac0
institution Kabale University
issn 2076-3417
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-e8185dbb6f8347cbb330d173bfcc0ac02025-01-10T13:14:27ZengMDPI AGApplied Sciences2076-34172024-12-0115110310.3390/app15010103Low-Resolution Target Detection with High-Frequency Information PreservationFeng Zhang0Hongyang Bai1Wenlong Yin2Ze Li3Hailong Ma4Lei Chen5School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaSchool of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, ChinaIn the absence of high-frequency visual observation, low-resolution (LR) targets (e.g., objects, human body keypoints) are intrinsically difficult to detect in unconstrained images. This challenge can be further exasperated by typical downsampling operations (e.g., pooling, stride) of existing deep networks (e.g., CNNs). To tackle this challenge, in this work, we introduce a generic, <i>High-Frequency Information Preservation</i> (<b>HFIP</b>) block as a replacement for existing downsampling operations. It is composed of two key components: (1) the decoupled high-frequency learning component, which extracts the high-frequency information along the vertical and horizontal directions separately, and (2) the dilated frequency-aware channel correlation component, which decomposes the input feature map into multiple smaller ones in a dilated manner, concatenates them by channel, and then correlates the combined channels in the frequency space. Our module can generally be integrated into existing network architectures for target detection (e.g., YOLO, HRNet). Extensive experiments on low-resolution human pose estimation and object detection tasks show that our HFIP technique can generally boost the performance of state-of-the-art detection models significantly, e.g., improving the object detection accuracy of YOLOv5s by an absolute margin of 3.30% in mAP under a resolution of 640 × 640 compared to the COCO benchmark.https://www.mdpi.com/2076-3417/15/1/103low-resolution objectdownsampling operationhigh-frequencytarget detection
spellingShingle Feng Zhang
Hongyang Bai
Wenlong Yin
Ze Li
Hailong Ma
Lei Chen
Low-Resolution Target Detection with High-Frequency Information Preservation
Applied Sciences
low-resolution object
downsampling operation
high-frequency
target detection
title Low-Resolution Target Detection with High-Frequency Information Preservation
title_full Low-Resolution Target Detection with High-Frequency Information Preservation
title_fullStr Low-Resolution Target Detection with High-Frequency Information Preservation
title_full_unstemmed Low-Resolution Target Detection with High-Frequency Information Preservation
title_short Low-Resolution Target Detection with High-Frequency Information Preservation
title_sort low resolution target detection with high frequency information preservation
topic low-resolution object
downsampling operation
high-frequency
target detection
url https://www.mdpi.com/2076-3417/15/1/103
work_keys_str_mv AT fengzhang lowresolutiontargetdetectionwithhighfrequencyinformationpreservation
AT hongyangbai lowresolutiontargetdetectionwithhighfrequencyinformationpreservation
AT wenlongyin lowresolutiontargetdetectionwithhighfrequencyinformationpreservation
AT zeli lowresolutiontargetdetectionwithhighfrequencyinformationpreservation
AT hailongma lowresolutiontargetdetectionwithhighfrequencyinformationpreservation
AT leichen lowresolutiontargetdetectionwithhighfrequencyinformationpreservation