Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition
Skeleton-sequence-based behavior recognition models are characterized by fast processing speeds, low computational requirements, and simple structures. Graph convolutional networks (GCNs) have advantages in processing skeleton sequence data. However, existing miner behavior recognition models based...
        Saved in:
      
    
          | Main Authors: | , , , | 
|---|---|
| Format: | Article | 
| Language: | zho | 
| Published: | Editorial Department of Industry and Mine Automation
    
        2024-11-01 | 
| Series: | Gong-kuang zidonghua | 
| Subjects: | |
| Online Access: | http://www.gkzdh.cn/article/doi/10.13272/j.issn.1671-251x.2024090059 | 
| Tags: | Add Tag 
      No Tags, Be the first to tag this record!
   | 
| _version_ | 1846100636115927040 | 
|---|---|
| author | WANG Jianfang DUAN Siyuan PAN Hongguang JING Ningbo | 
| author_facet | WANG Jianfang DUAN Siyuan PAN Hongguang JING Ningbo | 
| author_sort | WANG Jianfang | 
| collection | DOAJ | 
| description | Skeleton-sequence-based behavior recognition models are characterized by fast processing speeds, low computational requirements, and simple structures. Graph convolutional networks (GCNs) have advantages in processing skeleton sequence data. However, existing miner behavior recognition models based on graph convolution struggle to balance high accuracy and low computational complexity. To address this issue, this study proposed a miner behavior recognition model based on a lightweight pose estimation network (Lite-HRNet) and a multi-dimensional feature-enhanced spatial-temporal graph convolutional network (MEST-GCN). Lite-HRNet performed human detection using a target detector, extracted image features through a convolutional neural network (CNN), and generated anchor boxes via a region proposal network (RPN). These anchor boxes were classified to determine whether they contain a target. The RPN applied bounding box regression to the anchor boxes identified as containing targets and outputted the human bounding box, with the optimal detection result selected via non-maximum suppression. The detected human regions were cropped and inputted into Lite-HRNet to generate skeleton sequences based on human pose keypoints. MEST-GCN improved upon the spatial-temporal graph convolutional network (ST-GCN) by removing redundant layers to simplify the model structure and reduce the number of parameters. It also introduced a multi-dimensional feature fusion attention module (M2FA). The generated skeleton sequences were processed by the BN layer for batch normalization, and the miner behavior features were extracted through the multi-dimensional feature-enhanced graph convolution module. These features were passed through global average pooling and a Softmax layer to obtain the behavior confidence, providing the miner behavior prediction results. Experimental results showed that: ① The parameter count of MEST-GCN was reduced to 1.87 Mib. ② On the public NTU60 dataset, evaluated using cross subject and cross view standards, the accuracy of the miner behavior recognition model based on Lite-HRNet and MEST-GCN reached 88.0% and 92.6%, respectively, with Lite-HRNet extracting 2D human keypoint coordinates. ③ On a custom-built miner behavior dataset, the model based on Lite-HRNet and MEST-GCN achieved an accuracy of 88.5% and a video processing speed of 18.26 frames per second, accurately and quickly identifying miner action categories. | 
| format | Article | 
| id | doaj-art-bcb6e980006a43e78d2e0b07ce4a9359 | 
| institution | Kabale University | 
| issn | 1671-251X | 
| language | zho | 
| publishDate | 2024-11-01 | 
| publisher | Editorial Department of Industry and Mine Automation | 
| record_format | Article | 
| series | Gong-kuang zidonghua | 
| spelling | doaj-art-bcb6e980006a43e78d2e0b07ce4a93592024-12-30T02:32:31ZzhoEditorial Department of Industry and Mine AutomationGong-kuang zidonghua1671-251X2024-11-015011344210.13272/j.issn.1671-251x.2024090059Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognitionWANG Jianfang0DUAN Siyuan1PAN Hongguang2JING Ningbo3Chenghe Mining Co., Ltd., Shaanxi Coal and Chemical Industry Group Co., Ltd., Weinan 715200, ChinaCollege of Electric and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, ChinaCollege of Electric and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, ChinaCollege of Electric and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, ChinaSkeleton-sequence-based behavior recognition models are characterized by fast processing speeds, low computational requirements, and simple structures. Graph convolutional networks (GCNs) have advantages in processing skeleton sequence data. However, existing miner behavior recognition models based on graph convolution struggle to balance high accuracy and low computational complexity. To address this issue, this study proposed a miner behavior recognition model based on a lightweight pose estimation network (Lite-HRNet) and a multi-dimensional feature-enhanced spatial-temporal graph convolutional network (MEST-GCN). Lite-HRNet performed human detection using a target detector, extracted image features through a convolutional neural network (CNN), and generated anchor boxes via a region proposal network (RPN). These anchor boxes were classified to determine whether they contain a target. The RPN applied bounding box regression to the anchor boxes identified as containing targets and outputted the human bounding box, with the optimal detection result selected via non-maximum suppression. The detected human regions were cropped and inputted into Lite-HRNet to generate skeleton sequences based on human pose keypoints. MEST-GCN improved upon the spatial-temporal graph convolutional network (ST-GCN) by removing redundant layers to simplify the model structure and reduce the number of parameters. It also introduced a multi-dimensional feature fusion attention module (M2FA). The generated skeleton sequences were processed by the BN layer for batch normalization, and the miner behavior features were extracted through the multi-dimensional feature-enhanced graph convolution module. These features were passed through global average pooling and a Softmax layer to obtain the behavior confidence, providing the miner behavior prediction results. Experimental results showed that: ① The parameter count of MEST-GCN was reduced to 1.87 Mib. ② On the public NTU60 dataset, evaluated using cross subject and cross view standards, the accuracy of the miner behavior recognition model based on Lite-HRNet and MEST-GCN reached 88.0% and 92.6%, respectively, with Lite-HRNet extracting 2D human keypoint coordinates. ③ On a custom-built miner behavior dataset, the model based on Lite-HRNet and MEST-GCN achieved an accuracy of 88.5% and a video processing speed of 18.26 frames per second, accurately and quickly identifying miner action categories.http://www.gkzdh.cn/article/doi/10.13272/j.issn.1671-251x.2024090059miner behavior recognitionhuman keypoint extractionskeleton sequencegraph convolutionlightweight pose estimation networkfeature fusionmulti-dimensional feature fusion attention module | 
| spellingShingle | WANG Jianfang DUAN Siyuan PAN Hongguang JING Ningbo Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition Gong-kuang zidonghua miner behavior recognition human keypoint extraction skeleton sequence graph convolution lightweight pose estimation network feature fusion multi-dimensional feature fusion attention module | 
| title | Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition | 
| title_full | Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition | 
| title_fullStr | Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition | 
| title_full_unstemmed | Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition | 
| title_short | Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition | 
| title_sort | lightweight pose estimation spatial temporal enhanced graph convolutional model for miner behavior recognition | 
| topic | miner behavior recognition human keypoint extraction skeleton sequence graph convolution lightweight pose estimation network feature fusion multi-dimensional feature fusion attention module | 
| url | http://www.gkzdh.cn/article/doi/10.13272/j.issn.1671-251x.2024090059 | 
| work_keys_str_mv | AT wangjianfang lightweightposeestimationspatialtemporalenhancedgraphconvolutionalmodelforminerbehaviorrecognition AT duansiyuan lightweightposeestimationspatialtemporalenhancedgraphconvolutionalmodelforminerbehaviorrecognition AT panhongguang lightweightposeestimationspatialtemporalenhancedgraphconvolutionalmodelforminerbehaviorrecognition AT jingningbo lightweightposeestimationspatialtemporalenhancedgraphconvolutionalmodelforminerbehaviorrecognition | 
 
       