Region Boosting for Real-Time Object Detection Using Multi-Dimensional Attention

Real-time object detection remains an important topic in computer vision. Balancing the accuracy and speed of object detectors is a formidable challenge for both academic researchers and industry practitioners. In this paper, considering the latest models may be somewhat over-optimized for anchor-fr...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinlong Chen, Kejian Xu, Yi Ning, Zhi Xu
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10745475/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Real-time object detection remains an important topic in computer vision. Balancing the accuracy and speed of object detectors is a formidable challenge for both academic researchers and industry practitioners. In this paper, considering the latest models may be somewhat over-optimized for anchor-free pipes, we elect to use YOLOX as our baseline and introduce a series of enhancements, forming in a new high-performance detector named YOLOAX. To further exploit the power of the attention mechanism, we devise multi-dimensional attention-based modules which can activate CNNs, emphasizing regions of interest and boosting the capacity to learn the informative representations from feature maps. Moreover, we introduce a new label assignment strategy called STA, along with a novel loss function named GEIOU Loss, to further refine our object detector’s performance. Extensive ablation studies on the COCO and PASCAL VOC 2012 datasets are provided to validate our proposed methods. Our YOLOAX series is trained solely on the COCO dataset from scratch, without any prior knowledge, surpassing YOLOX series by a margin of 4.0% AP. Especially, YOLOAX-X achieves an impressive 55.2% AP on the COCO 2017 test set while maintaining a real-time speed of 82.4 fps.
ISSN:2169-3536