Density-Aware DETR With Dynamic Query for End-to-End Tiny Object Detection
End-to-end DEtection TRansformer (DETRs) are leading a new trend in various object detection tasks. However, when it comes to the ubiquitous tiny objects in aerial imagery, the potential of DETRs still remains under-explored. In this work, we observe that the expansive field of view of remote sensin...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11007261/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | End-to-end DEtection TRansformer (DETRs) are leading a new trend in various object detection tasks. However, when it comes to the ubiquitous tiny objects in aerial imagery, the potential of DETRs still remains under-explored. In this work, we observe that the expansive field of view of remote sensing images often results in a limited pixel representation of tiny objects coupled with a substantial variance in the number of instances across images. The significantly varied tiny object number per image conflicts with DETRs' fixed set of object queries. A large number of queries are necessary to ensure high recall in dense scenarios, while sparse scenarios benefit from fewer, more distinct queries. To tackle this issue, we propose a Density-aware DETR with Dynamic Query (D3Q). D3Q adaptively determines the optimal number of object queries for each image by explicitly estimating its object density. This dynamic query mechanism enables efficient and accurate tiny object detection under both dense and sparse object distributions. In addition, we introduce a refined box loss designed for tiny object detection that further stabilizes training. Through these strategies, D3Q effectively adapts to both dense and sparse scenarios, overcoming the limitations of fixed query in DETR. Extensive experiments on challenging tiny object detection benchmarks demonstrate the superior performance of D3Q compared to state-of-the-art methods. Particularly, when integrated with DINO, D3Q achieves an impressive 32.1% mAP on the AI-TOD-v2 dataset, setting a new state-of-the-art performance. |
|---|---|
| ISSN: | 1939-1404 2151-1535 |