Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition

Skeleton-based human interaction recognition has gained increasing attention due to its ability to capture complex multi-person dynamics. Significant progress has been made in interaction recognition research, but challenges remain. First, variations in camera positions and viewpoints can cause sign...

Full description

Saved in:
Bibliographic Details
Main Authors: Xing Liu, Bo Gao
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10820344/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841542539015880704
author Xing Liu
Bo Gao
author_facet Xing Liu
Bo Gao
author_sort Xing Liu
collection DOAJ
description Skeleton-based human interaction recognition has gained increasing attention due to its ability to capture complex multi-person dynamics. Significant progress has been made in interaction recognition research, but challenges remain. First, variations in camera positions and viewpoints can cause significant differences in skeletal data for actions of the same type. Second, capturing both spatial information from skeleton structures and temporal information from interaction sequences is crucial for discriminative interaction feature representation. Third, the different contributions of each participant especially in asymmetric interactions are often overlooked. To address the above issues, we propose an innovative method by designing the individual contribution based spatial-temporal attention graph convolutional network. In this work, we first propose a simple but feasible view transformation method to reduce data mismatch from multi-view cameras. Then we design individual contribution weights to measure the importance of each person for interaction feature representation. Next, a novel spatial-temporal attention module based on individual contribution weights is proposed to obtain attention based skeleton data, which are fed to multiple layers of graph convolution to extract spatial-temporal features. Additionally, we use a two-stream architecture with joint coordinates and joint motion data as inputs for each stream. A weighted fusion strategy is utilized to obtain the final classification score. Experiments conducted on three different datasets demonstrate that proposed interaction recognition method can achieve satisfactory results compared with other works.
format Article
id doaj-art-419327f79a9745a192e62ff91f8aca5e
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-419327f79a9745a192e62ff91f8aca5e2025-01-14T00:02:36ZengIEEEIEEE Access2169-35362025-01-01136463647410.1109/ACCESS.2024.352518510820344Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction RecognitionXing Liu0https://orcid.org/0009-0006-9217-5781Bo Gao1School of Sino-German Robotics, Shenzhen Institute of Information Technology, Shenzhen, ChinaSchool of Sino-German Robotics, Shenzhen Institute of Information Technology, Shenzhen, ChinaSkeleton-based human interaction recognition has gained increasing attention due to its ability to capture complex multi-person dynamics. Significant progress has been made in interaction recognition research, but challenges remain. First, variations in camera positions and viewpoints can cause significant differences in skeletal data for actions of the same type. Second, capturing both spatial information from skeleton structures and temporal information from interaction sequences is crucial for discriminative interaction feature representation. Third, the different contributions of each participant especially in asymmetric interactions are often overlooked. To address the above issues, we propose an innovative method by designing the individual contribution based spatial-temporal attention graph convolutional network. In this work, we first propose a simple but feasible view transformation method to reduce data mismatch from multi-view cameras. Then we design individual contribution weights to measure the importance of each person for interaction feature representation. Next, a novel spatial-temporal attention module based on individual contribution weights is proposed to obtain attention based skeleton data, which are fed to multiple layers of graph convolution to extract spatial-temporal features. Additionally, we use a two-stream architecture with joint coordinates and joint motion data as inputs for each stream. A weighted fusion strategy is utilized to obtain the final classification score. Experiments conducted on three different datasets demonstrate that proposed interaction recognition method can achieve satisfactory results compared with other works.https://ieeexplore.ieee.org/document/10820344/Skeleton-based interaction recognitionspatial-temporal attentiongraph convolutional networksindividual contribution weights
spellingShingle Xing Liu
Bo Gao
Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition
IEEE Access
Skeleton-based interaction recognition
spatial-temporal attention
graph convolutional networks
individual contribution weights
title Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition
title_full Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition
title_fullStr Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition
title_full_unstemmed Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition
title_short Individual Contribution-Based Spatial-Temporal Attention on Skeleton Sequences for Human Interaction Recognition
title_sort individual contribution based spatial temporal attention on skeleton sequences for human interaction recognition
topic Skeleton-based interaction recognition
spatial-temporal attention
graph convolutional networks
individual contribution weights
url https://ieeexplore.ieee.org/document/10820344/
work_keys_str_mv AT xingliu individualcontributionbasedspatialtemporalattentiononskeletonsequencesforhumaninteractionrecognition
AT bogao individualcontributionbasedspatialtemporalattentiononskeletonsequencesforhumaninteractionrecognition