Region Boosting for Real-Time Object Detection Using Multi-Dimensional Attention

Real-time object detection remains an important topic in computer vision. Balancing the accuracy and speed of object detectors is a formidable challenge for both academic researchers and industry practitioners. In this paper, considering the latest models may be somewhat over-optimized for anchor-fr...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinlong Chen, Kejian Xu, Yi Ning, Zhi Xu
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10745475/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846159849364127744
author Jinlong Chen
Kejian Xu
Yi Ning
Zhi Xu
author_facet Jinlong Chen
Kejian Xu
Yi Ning
Zhi Xu
author_sort Jinlong Chen
collection DOAJ
description Real-time object detection remains an important topic in computer vision. Balancing the accuracy and speed of object detectors is a formidable challenge for both academic researchers and industry practitioners. In this paper, considering the latest models may be somewhat over-optimized for anchor-free pipes, we elect to use YOLOX as our baseline and introduce a series of enhancements, forming in a new high-performance detector named YOLOAX. To further exploit the power of the attention mechanism, we devise multi-dimensional attention-based modules which can activate CNNs, emphasizing regions of interest and boosting the capacity to learn the informative representations from feature maps. Moreover, we introduce a new label assignment strategy called STA, along with a novel loss function named GEIOU Loss, to further refine our object detector’s performance. Extensive ablation studies on the COCO and PASCAL VOC 2012 datasets are provided to validate our proposed methods. Our YOLOAX series is trained solely on the COCO dataset from scratch, without any prior knowledge, surpassing YOLOX series by a margin of 4.0% AP. Especially, YOLOAX-X achieves an impressive 55.2% AP on the COCO 2017 test set while maintaining a real-time speed of 82.4 fps.
format Article
id doaj-art-d31265deb2d54474b4a069ca385ee45b
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-d31265deb2d54474b4a069ca385ee45b2024-11-23T00:01:02ZengIEEEIEEE Access2169-35362024-01-011217163417164310.1109/ACCESS.2024.349249310745475Region Boosting for Real-Time Object Detection Using Multi-Dimensional AttentionJinlong Chen0Kejian Xu1https://orcid.org/0009-0003-2647-2196Yi Ning2Zhi Xu3https://orcid.org/0000-0001-8665-1020School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, ChinaSchool of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, ChinaSchool of Continuing Education, Guilin University of Electronic Technology, Guilin, ChinaSchool of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, ChinaReal-time object detection remains an important topic in computer vision. Balancing the accuracy and speed of object detectors is a formidable challenge for both academic researchers and industry practitioners. In this paper, considering the latest models may be somewhat over-optimized for anchor-free pipes, we elect to use YOLOX as our baseline and introduce a series of enhancements, forming in a new high-performance detector named YOLOAX. To further exploit the power of the attention mechanism, we devise multi-dimensional attention-based modules which can activate CNNs, emphasizing regions of interest and boosting the capacity to learn the informative representations from feature maps. Moreover, we introduce a new label assignment strategy called STA, along with a novel loss function named GEIOU Loss, to further refine our object detector’s performance. Extensive ablation studies on the COCO and PASCAL VOC 2012 datasets are provided to validate our proposed methods. Our YOLOAX series is trained solely on the COCO dataset from scratch, without any prior knowledge, surpassing YOLOX series by a margin of 4.0% AP. Especially, YOLOAX-X achieves an impressive 55.2% AP on the COCO 2017 test set while maintaining a real-time speed of 82.4 fps.https://ieeexplore.ieee.org/document/10745475/Real-time object detectionattention mechanismregion boostinglabel assignment strategyloss function
spellingShingle Jinlong Chen
Kejian Xu
Yi Ning
Zhi Xu
Region Boosting for Real-Time Object Detection Using Multi-Dimensional Attention
IEEE Access
Real-time object detection
attention mechanism
region boosting
label assignment strategy
loss function
title Region Boosting for Real-Time Object Detection Using Multi-Dimensional Attention
title_full Region Boosting for Real-Time Object Detection Using Multi-Dimensional Attention
title_fullStr Region Boosting for Real-Time Object Detection Using Multi-Dimensional Attention
title_full_unstemmed Region Boosting for Real-Time Object Detection Using Multi-Dimensional Attention
title_short Region Boosting for Real-Time Object Detection Using Multi-Dimensional Attention
title_sort region boosting for real time object detection using multi dimensional attention
topic Real-time object detection
attention mechanism
region boosting
label assignment strategy
loss function
url https://ieeexplore.ieee.org/document/10745475/
work_keys_str_mv AT jinlongchen regionboostingforrealtimeobjectdetectionusingmultidimensionalattention
AT kejianxu regionboostingforrealtimeobjectdetectionusingmultidimensionalattention
AT yining regionboostingforrealtimeobjectdetectionusingmultidimensionalattention
AT zhixu regionboostingforrealtimeobjectdetectionusingmultidimensionalattention