6D Object Pose Estimation With Compact Generalized Non-Local Operation

Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object’s...

Full description

Saved in:
Bibliographic Details
Main Authors: Changhong Jiang, Xiaoqiao Mu, Bingbing Zhang, Chao Liang, Mujun Xie
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10771728/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Real-time object detection and pose estimation are critical in practical applications such as virtual reality, scene understanding, and robotics. In this paper, we propose a compact generalized non-local pose estimation network capable of directly predicting the projection of an object’s 3D bounding box vertices onto a 2D image, facilitating the estimation of the object’s 6D pose. The network is constructed using the YOLOv5 model, with the integration of an improved non-local module termed the Compact Generalized Non-local Block. This module enhances feature representation by learning the correlations between the positions of all elements across channels, effectively capturing subtle feature cues. The proposed network is end-to-end trainable, producing accurate pose predictions without the need for any post-processing operations. Extensive validation on the LineMod dataset shows that our approach achieves a final accuracy of 46.1% on the average 3D distance of model vertices (ADD) metric, outperforming existing methods by 6.9% and our baseline model by 1.8%, thus underscoring the efficacy of the proposed network.
ISSN:2169-3536