HandFI: Multilevel Interacting Hand Reconstruction Based on Multilevel Feature Fusion in RGB Images

Interacting hand reconstruction presents significant opportunities in various applications. However, it currently faces challenges such as the difficulty in distinguishing the features of both hands, misalignment of hand meshes with input images, and modeling the complex spatial relationships betwee...

Full description

Saved in:
Bibliographic Details
Main Authors: Huimin Pan, Yuting Cai, Jiayi Yang, Shaojia Niu, Quanli Gao, Xihan Wang
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/1/88
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Interacting hand reconstruction presents significant opportunities in various applications. However, it currently faces challenges such as the difficulty in distinguishing the features of both hands, misalignment of hand meshes with input images, and modeling the complex spatial relationships between interacting hands. In this paper, we propose a multilevel feature fusion interactive network for hand reconstruction (HandFI). Within this network, the hand feature separation module utilizes attentional mechanisms and positional coding to distinguish between left-hand and right-hand features while maintaining the spatial relationship of the features. The hand fusion and attention module promotes the alignment of hand vertices with the image by integrating multi-scale hand features while introducing cross-attention to help determine the complex spatial relationships between interacting hands, thereby enhancing the accuracy of two-hand reconstruction. We evaluated our method with existing approaches using the InterHand 2.6M, RGB2Hands, and EgoHands datasets. Extensive experimental results demonstrated that our method outperformed other representative methods, with performance metrics of 9.38 mm for the MPJPE and 9.61 mm for the MPVPE. Additionally, the results obtained in real-world scenes further validated the generalization capability of our method.
ISSN:1424-8220