Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
Event-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the ev...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2024-11-01
|
| Series: | Frontiers in Computational Neuroscience |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846156453080989696 |
|---|---|
| author | Xinxu Lin Xinxu Lin Xinxu Lin Mingxuan Liu Hong Chen Hong Chen |
| author_facet | Xinxu Lin Xinxu Lin Xinxu Lin Mingxuan Liu Hong Chen Hong Chen |
| author_sort | Xinxu Lin |
| collection | DOAJ |
| description | Event-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the event cameras due to their spike-based event-driven paradigm, with less power consumption compared to artificial neural networks. In this paper, we propose two end-to-end SNNs, namely Spike-HAR and Spike-HAR++, to introduce spiking transformer into event-based HAR. Spike-HAR includes two novel blocks: a spike attention branch, which enables model to focus on regions with high spike rates, reducing the impact of noise to improve the accuracy, and a parallel spike transformer block with simplified spiking self-attention mechanism, increasing computational efficiency. To better extract crucial information from high-level features, we modify the architecture of the spike attention branch and extend it in Spike-HAR to a higher dimension, proposing Spike-HAR++ to further enhance classification performance. Comprehensive experiments were conducted on four HAR datasets: SL-Animals-DVS, N-LSA64, DVS128 Gesture and DailyAction-DVS, to demonstrate the superior performance of our proposed model. Additionally, the proposed Spike-HAR and Spike-HAR++ require only 0.03 and 0.06 mJ, respectively, to process a sequence of event frames, with model sizes of only 0.7 and 1.8 M. This efficiency positions it as a promising new SNN baseline for the HAR community. Code is available at Spike-HAR++. |
| format | Article |
| id | doaj-art-5829bcc88d7e435ebee2c519c2beed0c |
| institution | Kabale University |
| issn | 1662-5188 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Computational Neuroscience |
| spelling | doaj-art-5829bcc88d7e435ebee2c519c2beed0c2024-11-26T04:25:06ZengFrontiers Media S.A.Frontiers in Computational Neuroscience1662-51882024-11-011810.3389/fncom.2024.15082971508297Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognitionXinxu Lin0Xinxu Lin1Xinxu Lin2Mingxuan Liu3Hong Chen4Hong Chen5School of Integrated Circuits, Tsinghua University, Beijing, ChinaState Key Laboratory of Integrated Chips and Systems, Frontier Institute of Chip and System, Fudan University, Shanghai, ChinaGreater Bay Area National Center of Technology Innovation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, ChinaSchool of Biomedical Engineering, Tsinghua University, Beijing, ChinaSchool of Integrated Circuits, Tsinghua University, Beijing, ChinaGreater Bay Area National Center of Technology Innovation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, ChinaEvent-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the event cameras due to their spike-based event-driven paradigm, with less power consumption compared to artificial neural networks. In this paper, we propose two end-to-end SNNs, namely Spike-HAR and Spike-HAR++, to introduce spiking transformer into event-based HAR. Spike-HAR includes two novel blocks: a spike attention branch, which enables model to focus on regions with high spike rates, reducing the impact of noise to improve the accuracy, and a parallel spike transformer block with simplified spiking self-attention mechanism, increasing computational efficiency. To better extract crucial information from high-level features, we modify the architecture of the spike attention branch and extend it in Spike-HAR to a higher dimension, proposing Spike-HAR++ to further enhance classification performance. Comprehensive experiments were conducted on four HAR datasets: SL-Animals-DVS, N-LSA64, DVS128 Gesture and DailyAction-DVS, to demonstrate the superior performance of our proposed model. Additionally, the proposed Spike-HAR and Spike-HAR++ require only 0.03 and 0.06 mJ, respectively, to process a sequence of event frames, with model sizes of only 0.7 and 1.8 M. This efficiency positions it as a promising new SNN baseline for the HAR community. Code is available at Spike-HAR++.https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/fullspiking neural networkhuman action recognitiontransformerattention branchevent-based vision |
| spellingShingle | Xinxu Lin Xinxu Lin Xinxu Lin Mingxuan Liu Hong Chen Hong Chen Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition Frontiers in Computational Neuroscience spiking neural network human action recognition transformer attention branch event-based vision |
| title | Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition |
| title_full | Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition |
| title_fullStr | Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition |
| title_full_unstemmed | Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition |
| title_short | Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition |
| title_sort | spike har an energy efficient and lightweight parallel spiking transformer for event based human action recognition |
| topic | spiking neural network human action recognition transformer attention branch event-based vision |
| url | https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/full |
| work_keys_str_mv | AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT mingxuanliu spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT hongchen spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT hongchen spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition |