HawkEye Conv-Driven YOLOv10 with Advanced Feature Pyramid Networks for Small Object Detection in UAV Imagery

Current mainstream computer vision algorithms focus on designing suitable network architectures and loss functions to fit training data. However, the accuracy of small object detection remains lower than for other scales, and the design of convolution operators limits the model’s performance. For UA...

Full description

Saved in:
Bibliographic Details
Main Authors: Yihang Li, Wenzhong Yang, Liejun Wang, Xiaoming Tao, Yabo Yin, Danny Chen
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/8/12/713
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Current mainstream computer vision algorithms focus on designing suitable network architectures and loss functions to fit training data. However, the accuracy of small object detection remains lower than for other scales, and the design of convolution operators limits the model’s performance. For UAV small object detection, standard convolutions, due to their fixed kernel size, cannot adaptively capture small object spatial information. Many convolutional variants have scattered sampling points, leading to blurred boundaries and reduced accuracy. In response, we propose HawkEye Conv (HEConv), which utilizes stable sampling and dynamic offsets with random selection. By varying the convolution kernel design, HEConv reduces the accuracy gap between small and larger objects while offering multiple versions and plug-and-play capabilities. We also develop HawkEye Spatial Pyramid Pooling and Gradual Dynamic Feature Pyramid Network modules to validate HEConv. Experiments on the RFRB agricultural and VisDrone2019 urban datasets demonstrate that, compared to YOLOv10, our model improves AP<sub>50</sub> by 11.9% and 6.2%, AP<sub>S</sub> by 11.5% and 5%, and F1-score by 5% and 7%. Importantly, it enhances small object detection without sacrificing large object accuracy, thereby reducing the multi-scale performance gap.
ISSN:2504-446X