Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition

Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firs...

Full description

Saved in:
Bibliographic Details
Main Authors: Song Gao, Dingzhuo Zhang, Zhaoming Tang, Hongyan Wang
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/23/7609
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846123853339688960
author Song Gao
Dingzhuo Zhang
Zhaoming Tang
Hongyan Wang
author_facet Song Gao
Dingzhuo Zhang
Zhaoming Tang
Hongyan Wang
author_sort Song Gao
collection DOAJ
description Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial–temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial–temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial–temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms.
format Article
id doaj-art-a439c30aef9c45f28ad86f8b57d1fce6
institution Kabale University
issn 1424-8220
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-a439c30aef9c45f28ad86f8b57d1fce62024-12-13T16:32:08ZengMDPI AGSensors1424-82202024-11-012423760910.3390/s24237609Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action RecognitionSong Gao0Dingzhuo Zhang1Zhaoming Tang2Hongyan Wang3Aviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, ChinaCollege of Information Engineering, Dalian University, Dalian 116622, ChinaAviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, ChinaSchool of Comuputer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, ChinaFocusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial–temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial–temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial–temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms.https://www.mdpi.com/1424-8220/24/23/7609skeleton spatial–temporal featureskeleton dynamic featurefeature enhancementaction recognitiontwo stream convolutional neural networks
spellingShingle Song Gao
Dingzhuo Zhang
Zhaoming Tang
Hongyan Wang
Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
Sensors
skeleton spatial–temporal feature
skeleton dynamic feature
feature enhancement
action recognition
two stream convolutional neural networks
title Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_full Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_fullStr Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_full_unstemmed Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_short Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_sort deep fusion of skeleton spatial temporal and dynamic information for action recognition
topic skeleton spatial–temporal feature
skeleton dynamic feature
feature enhancement
action recognition
two stream convolutional neural networks
url https://www.mdpi.com/1424-8220/24/23/7609
work_keys_str_mv AT songgao deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition
AT dingzhuozhang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition
AT zhaomingtang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition
AT hongyanwang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition