Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition
Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firs...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-11-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/24/23/7609 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1846123853339688960 |
---|---|
author | Song Gao Dingzhuo Zhang Zhaoming Tang Hongyan Wang |
author_facet | Song Gao Dingzhuo Zhang Zhaoming Tang Hongyan Wang |
author_sort | Song Gao |
collection | DOAJ |
description | Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial–temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial–temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial–temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms. |
format | Article |
id | doaj-art-a439c30aef9c45f28ad86f8b57d1fce6 |
institution | Kabale University |
issn | 1424-8220 |
language | English |
publishDate | 2024-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj-art-a439c30aef9c45f28ad86f8b57d1fce62024-12-13T16:32:08ZengMDPI AGSensors1424-82202024-11-012423760910.3390/s24237609Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action RecognitionSong Gao0Dingzhuo Zhang1Zhaoming Tang2Hongyan Wang3Aviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, ChinaCollege of Information Engineering, Dalian University, Dalian 116622, ChinaAviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, ChinaSchool of Comuputer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, ChinaFocusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial–temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton’s three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial–temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial–temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial–temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms.https://www.mdpi.com/1424-8220/24/23/7609skeleton spatial–temporal featureskeleton dynamic featurefeature enhancementaction recognitiontwo stream convolutional neural networks |
spellingShingle | Song Gao Dingzhuo Zhang Zhaoming Tang Hongyan Wang Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition Sensors skeleton spatial–temporal feature skeleton dynamic feature feature enhancement action recognition two stream convolutional neural networks |
title | Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition |
title_full | Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition |
title_fullStr | Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition |
title_full_unstemmed | Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition |
title_short | Deep Fusion of Skeleton Spatial–Temporal and Dynamic Information for Action Recognition |
title_sort | deep fusion of skeleton spatial temporal and dynamic information for action recognition |
topic | skeleton spatial–temporal feature skeleton dynamic feature feature enhancement action recognition two stream convolutional neural networks |
url | https://www.mdpi.com/1424-8220/24/23/7609 |
work_keys_str_mv | AT songgao deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition AT dingzhuozhang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition AT zhaomingtang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition AT hongyanwang deepfusionofskeletonspatialtemporalanddynamicinformationforactionrecognition |