Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition

Event-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the ev...

Full description

Saved in:
Bibliographic Details
Main Authors: Xinxu Lin, Mingxuan Liu, Hong Chen
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-11-01
Series:Frontiers in Computational Neuroscience
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846156453080989696
author Xinxu Lin
Xinxu Lin
Xinxu Lin
Mingxuan Liu
Hong Chen
Hong Chen
author_facet Xinxu Lin
Xinxu Lin
Xinxu Lin
Mingxuan Liu
Hong Chen
Hong Chen
author_sort Xinxu Lin
collection DOAJ
description Event-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the event cameras due to their spike-based event-driven paradigm, with less power consumption compared to artificial neural networks. In this paper, we propose two end-to-end SNNs, namely Spike-HAR and Spike-HAR++, to introduce spiking transformer into event-based HAR. Spike-HAR includes two novel blocks: a spike attention branch, which enables model to focus on regions with high spike rates, reducing the impact of noise to improve the accuracy, and a parallel spike transformer block with simplified spiking self-attention mechanism, increasing computational efficiency. To better extract crucial information from high-level features, we modify the architecture of the spike attention branch and extend it in Spike-HAR to a higher dimension, proposing Spike-HAR++ to further enhance classification performance. Comprehensive experiments were conducted on four HAR datasets: SL-Animals-DVS, N-LSA64, DVS128 Gesture and DailyAction-DVS, to demonstrate the superior performance of our proposed model. Additionally, the proposed Spike-HAR and Spike-HAR++ require only 0.03 and 0.06 mJ, respectively, to process a sequence of event frames, with model sizes of only 0.7 and 1.8 M. This efficiency positions it as a promising new SNN baseline for the HAR community. Code is available at Spike-HAR++.
format Article
id doaj-art-5829bcc88d7e435ebee2c519c2beed0c
institution Kabale University
issn 1662-5188
language English
publishDate 2024-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Computational Neuroscience
spelling doaj-art-5829bcc88d7e435ebee2c519c2beed0c2024-11-26T04:25:06ZengFrontiers Media S.A.Frontiers in Computational Neuroscience1662-51882024-11-011810.3389/fncom.2024.15082971508297Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognitionXinxu Lin0Xinxu Lin1Xinxu Lin2Mingxuan Liu3Hong Chen4Hong Chen5School of Integrated Circuits, Tsinghua University, Beijing, ChinaState Key Laboratory of Integrated Chips and Systems, Frontier Institute of Chip and System, Fudan University, Shanghai, ChinaGreater Bay Area National Center of Technology Innovation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, ChinaSchool of Biomedical Engineering, Tsinghua University, Beijing, ChinaSchool of Integrated Circuits, Tsinghua University, Beijing, ChinaGreater Bay Area National Center of Technology Innovation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, ChinaEvent-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the event cameras due to their spike-based event-driven paradigm, with less power consumption compared to artificial neural networks. In this paper, we propose two end-to-end SNNs, namely Spike-HAR and Spike-HAR++, to introduce spiking transformer into event-based HAR. Spike-HAR includes two novel blocks: a spike attention branch, which enables model to focus on regions with high spike rates, reducing the impact of noise to improve the accuracy, and a parallel spike transformer block with simplified spiking self-attention mechanism, increasing computational efficiency. To better extract crucial information from high-level features, we modify the architecture of the spike attention branch and extend it in Spike-HAR to a higher dimension, proposing Spike-HAR++ to further enhance classification performance. Comprehensive experiments were conducted on four HAR datasets: SL-Animals-DVS, N-LSA64, DVS128 Gesture and DailyAction-DVS, to demonstrate the superior performance of our proposed model. Additionally, the proposed Spike-HAR and Spike-HAR++ require only 0.03 and 0.06 mJ, respectively, to process a sequence of event frames, with model sizes of only 0.7 and 1.8 M. This efficiency positions it as a promising new SNN baseline for the HAR community. Code is available at Spike-HAR++.https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/fullspiking neural networkhuman action recognitiontransformerattention branchevent-based vision
spellingShingle Xinxu Lin
Xinxu Lin
Xinxu Lin
Mingxuan Liu
Hong Chen
Hong Chen
Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
Frontiers in Computational Neuroscience
spiking neural network
human action recognition
transformer
attention branch
event-based vision
title Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_full Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_fullStr Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_full_unstemmed Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_short Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_sort spike har an energy efficient and lightweight parallel spiking transformer for event based human action recognition
topic spiking neural network
human action recognition
transformer
attention branch
event-based vision
url https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/full
work_keys_str_mv AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition
AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition
AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition
AT mingxuanliu spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition
AT hongchen spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition
AT hongchen spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition