6D Object Pose Estimation With Compact Generalized Non-Local Operation

Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object’s...

Full description

Saved in:
Bibliographic Details
Main Authors: Changhong Jiang, Xiaoqiao Mu, Bingbing Zhang, Chao Liang, Mujun Xie
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10771728/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846129603130687488
author Changhong Jiang
Xiaoqiao Mu
Bingbing Zhang
Chao Liang
Mujun Xie
author_facet Changhong Jiang
Xiaoqiao Mu
Bingbing Zhang
Chao Liang
Mujun Xie
author_sort Changhong Jiang
collection DOAJ
description Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object’s 3D bounding box vertices onto a 2D image, facilitating the estimation of the object’s 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.
format Article
id doaj-art-4abda70ec3204d2987fb4961b9427e16
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-4abda70ec3204d2987fb4961b9427e162024-12-10T00:02:18ZengIEEEIEEE Access2169-35362024-01-011217808017808810.1109/ACCESS.2024.3508772107717286D Object Pose Estimation With Compact Generalized Non-Local OperationChanghong Jiang0https://orcid.org/0000-0001-9646-6179Xiaoqiao Mu1https://orcid.org/0009-0009-3127-1157Bingbing Zhang2https://orcid.org/0000-0002-4734-4164Chao Liang3https://orcid.org/0009-0001-6084-6900Mujun Xie4https://orcid.org/0000-0002-4984-6504School of Electrical and Electronic Engineering, Changchun University of Technology, Changchun, ChinaSchool of Mechanical and Electrical Engineering, Changchun University of Technology, Changchun, ChinaSchool of Computer Science and Engineering, Dalian Minzu University, Dalian, ChinaCollage of Computer Science and Engineering, Changchun University of Technology, Changchun, ChinaSchool of Electrical and Electronic Engineering, Changchun University of Technology, Changchun, ChinaReal-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object’s 3D bounding box vertices onto a 2D image, facilitating the estimation of the object’s 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.https://ieeexplore.ieee.org/document/10771728/Correlationssubtle featureend-to-endlong-range spatiotemporalfine-grained detailsrepresentational power
spellingShingle Changhong Jiang
Xiaoqiao Mu
Bingbing Zhang
Chao Liang
Mujun Xie
6D Object Pose Estimation With Compact Generalized Non-Local Operation
IEEE Access
Correlations
subtle feature
end-to-end
long-range spatiotemporal
fine-grained details
representational power
title 6D Object Pose Estimation With Compact Generalized Non-Local Operation
title_full 6D Object Pose Estimation With Compact Generalized Non-Local Operation
title_fullStr 6D Object Pose Estimation With Compact Generalized Non-Local Operation
title_full_unstemmed 6D Object Pose Estimation With Compact Generalized Non-Local Operation
title_short 6D Object Pose Estimation With Compact Generalized Non-Local Operation
title_sort 6d object pose estimation with compact generalized non local operation
topic Correlations
subtle feature
end-to-end
long-range spatiotemporal
fine-grained details
representational power
url https://ieeexplore.ieee.org/document/10771728/
work_keys_str_mv AT changhongjiang 6dobjectposeestimationwithcompactgeneralizednonlocaloperation
AT xiaoqiaomu 6dobjectposeestimationwithcompactgeneralizednonlocaloperation
AT bingbingzhang 6dobjectposeestimationwithcompactgeneralizednonlocaloperation
AT chaoliang 6dobjectposeestimationwithcompactgeneralizednonlocaloperation
AT mujunxie 6dobjectposeestimationwithcompactgeneralizednonlocaloperation