Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition

Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firs...

Full description

Saved in:

Bibliographic Details
Main Authors:	Song Gao, Dingzhuo Zhang, Zhaoming Tang, Hongyan Wang
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Sensors
Subjects:	skeleton spatial–temporal feature skeleton dynamic feature feature enhancement action recognition two stream convolutional neural networks
Online Access:	https://www.mdpi.com/1424-8220/24/23/7609
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846123853339688960
author	Song Gao Dingzhuo Zhang Zhaoming Tang Hongyan Wang
author_facet	Song Gao Dingzhuo Zhang Zhaoming Tang Hongyan Wang
author_sort	Song Gao
collection	DOAJ
description	Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial–temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial–temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial–temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms.
format	Article
id	doaj-art-a439c30aef9c45f28ad86f8b57d1fce6
institution	Kabale University
issn	1424-8220
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-a439c30aef9c45f28ad86f8b57d1fce62024-12-13T16:32:08ZengMDPI AGSensors1424-82202024-11-012423760910.3390/s24237609Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action RecognitionSong Gao0Dingzhuo Zhang1Zhaoming Tang2Hongyan Wang3Aviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, ChinaCollege of Information Engineering, Dalian University, Dalian 116622, ChinaAviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, ChinaSchool of Comuputer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, ChinaFocusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial–temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial–temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial–temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms.https://www.mdpi.com/1424-8220/24/23/7609skeleton spatial–temporal featureskeleton dynamic featurefeature enhancementaction recognitiontwo stream convolutional neural networks
spellingShingle	Song Gao Dingzhuo Zhang Zhaoming Tang Hongyan Wang Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition Sensors skeleton spatial–temporal feature skeleton dynamic feature feature enhancement action recognition two stream convolutional neural networks
title	Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_full	Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_fullStr	Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_full_unstemmed	Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_short	Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
title_sort	deep fusion of skeleton spatial temporal and dynamic information for action recognition
topic	skeleton spatial–temporal feature skeleton dynamic feature feature enhancement action recognition two stream convolutional neural networks
url	https://www.mdpi.com/1424-8220/24/23/7609
work_keys_str_mv	AT songgao deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition AT dingzhuozhang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition AT zhaomingtang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition AT hongyanwang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition

Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition

Similar Items