Density-Aware DETR With Dynamic Query for End-to-End Tiny Object Detection

End-to-end DEtection TRansformer (DETRs) are leading a new trend in various object detection tasks. However, when it comes to the ubiquitous tiny objects in aerial imagery, the potential of DETRs still remains under-explored. In this work, we observe that the expansive field of view of remote sensin...

Full description

Saved in:
Bibliographic Details
Main Authors: Xianhang Ye, Chang Xu, Haoran Zhu, Fang Xu, Haijian Zhang, Wen Yang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11007261/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:End-to-end DEtection TRansformer (DETRs) are leading a new trend in various object detection tasks. However, when it comes to the ubiquitous tiny objects in aerial imagery, the potential of DETRs still remains under-explored. In this work, we observe that the expansive field of view of remote sensing images often results in a limited pixel representation of tiny objects coupled with a substantial variance in the number of instances across images. The significantly varied tiny object number per image conflicts with DETRs' fixed set of object queries. A large number of queries are necessary to ensure high recall in dense scenarios, while sparse scenarios benefit from fewer, more distinct queries. To tackle this issue, we propose a Density-aware DETR with Dynamic Query (D3Q). D3Q adaptively determines the optimal number of object queries for each image by explicitly estimating its object density. This dynamic query mechanism enables efficient and accurate tiny object detection under both dense and sparse object distributions. In addition, we introduce a refined box loss designed for tiny object detection that further stabilizes training. Through these strategies, D3Q effectively adapts to both dense and sparse scenarios, overcoming the limitations of fixed query in DETR. Extensive experiments on challenging tiny object detection benchmarks demonstrate the superior performance of D3Q compared to state-of-the-art methods. Particularly, when integrated with DINO, D3Q achieves an impressive 32.1% mAP on the AI-TOD-v2 dataset, setting a new state-of-the-art performance.
ISSN:1939-1404
2151-1535