Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition

Skeleton-sequence-based behavior recognition models are characterized by fast processing speeds, low computational requirements, and simple structures. Graph convolutional networks (GCNs) have advantages in processing skeleton sequence data. However, existing miner behavior recognition models based...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG Jianfang, DUAN Siyuan, PAN Hongguang, JING Ningbo
Format: Article
Language:zho
Published: Editorial Department of Industry and Mine Automation 2024-11-01
Series:Gong-kuang zidonghua
Subjects:
Online Access:http://www.gkzdh.cn/article/doi/10.13272/j.issn.1671-251x.2024090059
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846100636115927040
author WANG Jianfang
DUAN Siyuan
PAN Hongguang
JING Ningbo
author_facet WANG Jianfang
DUAN Siyuan
PAN Hongguang
JING Ningbo
author_sort WANG Jianfang
collection DOAJ
description Skeleton-sequence-based behavior recognition models are characterized by fast processing speeds, low computational requirements, and simple structures. Graph convolutional networks (GCNs) have advantages in processing skeleton sequence data. However, existing miner behavior recognition models based on graph convolution struggle to balance high accuracy and low computational complexity. To address this issue, this study proposed a miner behavior recognition model based on a lightweight pose estimation network (Lite-HRNet) and a multi-dimensional feature-enhanced spatial-temporal graph convolutional network (MEST-GCN). Lite-HRNet performed human detection using a target detector, extracted image features through a convolutional neural network (CNN), and generated anchor boxes via a region proposal network (RPN). These anchor boxes were classified to determine whether they contain a target. The RPN applied bounding box regression to the anchor boxes identified as containing targets and outputted the human bounding box, with the optimal detection result selected via non-maximum suppression. The detected human regions were cropped and inputted into Lite-HRNet to generate skeleton sequences based on human pose keypoints. MEST-GCN improved upon the spatial-temporal graph convolutional network (ST-GCN) by removing redundant layers to simplify the model structure and reduce the number of parameters. It also introduced a multi-dimensional feature fusion attention module (M2FA). The generated skeleton sequences were processed by the BN layer for batch normalization, and the miner behavior features were extracted through the multi-dimensional feature-enhanced graph convolution module. These features were passed through global average pooling and a Softmax layer to obtain the behavior confidence, providing the miner behavior prediction results. Experimental results showed that: ① The parameter count of MEST-GCN was reduced to 1.87 Mib. ② On the public NTU60 dataset, evaluated using cross subject and cross view standards, the accuracy of the miner behavior recognition model based on Lite-HRNet and MEST-GCN reached 88.0% and 92.6%, respectively, with Lite-HRNet extracting 2D human keypoint coordinates. ③ On a custom-built miner behavior dataset, the model based on Lite-HRNet and MEST-GCN achieved an accuracy of 88.5% and a video processing speed of 18.26 frames per second, accurately and quickly identifying miner action categories.
format Article
id doaj-art-bcb6e980006a43e78d2e0b07ce4a9359
institution Kabale University
issn 1671-251X
language zho
publishDate 2024-11-01
publisher Editorial Department of Industry and Mine Automation
record_format Article
series Gong-kuang zidonghua
spelling doaj-art-bcb6e980006a43e78d2e0b07ce4a93592024-12-30T02:32:31ZzhoEditorial Department of Industry and Mine AutomationGong-kuang zidonghua1671-251X2024-11-015011344210.13272/j.issn.1671-251x.2024090059Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognitionWANG Jianfang0DUAN Siyuan1PAN Hongguang2JING Ningbo3Chenghe Mining Co., Ltd., Shaanxi Coal and Chemical Industry Group Co., Ltd., Weinan 715200, ChinaCollege of Electric and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, ChinaCollege of Electric and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, ChinaCollege of Electric and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, ChinaSkeleton-sequence-based behavior recognition models are characterized by fast processing speeds, low computational requirements, and simple structures. Graph convolutional networks (GCNs) have advantages in processing skeleton sequence data. However, existing miner behavior recognition models based on graph convolution struggle to balance high accuracy and low computational complexity. To address this issue, this study proposed a miner behavior recognition model based on a lightweight pose estimation network (Lite-HRNet) and a multi-dimensional feature-enhanced spatial-temporal graph convolutional network (MEST-GCN). Lite-HRNet performed human detection using a target detector, extracted image features through a convolutional neural network (CNN), and generated anchor boxes via a region proposal network (RPN). These anchor boxes were classified to determine whether they contain a target. The RPN applied bounding box regression to the anchor boxes identified as containing targets and outputted the human bounding box, with the optimal detection result selected via non-maximum suppression. The detected human regions were cropped and inputted into Lite-HRNet to generate skeleton sequences based on human pose keypoints. MEST-GCN improved upon the spatial-temporal graph convolutional network (ST-GCN) by removing redundant layers to simplify the model structure and reduce the number of parameters. It also introduced a multi-dimensional feature fusion attention module (M2FA). The generated skeleton sequences were processed by the BN layer for batch normalization, and the miner behavior features were extracted through the multi-dimensional feature-enhanced graph convolution module. These features were passed through global average pooling and a Softmax layer to obtain the behavior confidence, providing the miner behavior prediction results. Experimental results showed that: ① The parameter count of MEST-GCN was reduced to 1.87 Mib. ② On the public NTU60 dataset, evaluated using cross subject and cross view standards, the accuracy of the miner behavior recognition model based on Lite-HRNet and MEST-GCN reached 88.0% and 92.6%, respectively, with Lite-HRNet extracting 2D human keypoint coordinates. ③ On a custom-built miner behavior dataset, the model based on Lite-HRNet and MEST-GCN achieved an accuracy of 88.5% and a video processing speed of 18.26 frames per second, accurately and quickly identifying miner action categories.http://www.gkzdh.cn/article/doi/10.13272/j.issn.1671-251x.2024090059miner behavior recognitionhuman keypoint extractionskeleton sequencegraph convolutionlightweight pose estimation networkfeature fusionmulti-dimensional feature fusion attention module
spellingShingle WANG Jianfang
DUAN Siyuan
PAN Hongguang
JING Ningbo
Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition
Gong-kuang zidonghua
miner behavior recognition
human keypoint extraction
skeleton sequence
graph convolution
lightweight pose estimation network
feature fusion
multi-dimensional feature fusion attention module
title Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition
title_full Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition
title_fullStr Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition
title_full_unstemmed Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition
title_short Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition
title_sort lightweight pose estimation spatial temporal enhanced graph convolutional model for miner behavior recognition
topic miner behavior recognition
human keypoint extraction
skeleton sequence
graph convolution
lightweight pose estimation network
feature fusion
multi-dimensional feature fusion attention module
url http://www.gkzdh.cn/article/doi/10.13272/j.issn.1671-251x.2024090059
work_keys_str_mv AT wangjianfang lightweightposeestimationspatialtemporalenhancedgraphconvolutionalmodelforminerbehaviorrecognition
AT duansiyuan lightweightposeestimationspatialtemporalenhancedgraphconvolutionalmodelforminerbehaviorrecognition
AT panhongguang lightweightposeestimationspatialtemporalenhancedgraphconvolutionalmodelforminerbehaviorrecognition
AT jingningbo lightweightposeestimationspatialtemporalenhancedgraphconvolutionalmodelforminerbehaviorrecognition