Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition

Event-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the ev...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xinxu Lin, Mingxuan Liu, Hong Chen
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2024-11-01
Series:	Frontiers in Computational Neuroscience
Subjects:	spiking neural network human action recognition transformer attention branch event-based vision
Online Access:	https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846156453080989696
author	Xinxu Lin Xinxu Lin Xinxu Lin Mingxuan Liu Hong Chen Hong Chen
author_facet	Xinxu Lin Xinxu Lin Xinxu Lin Mingxuan Liu Hong Chen Hong Chen
author_sort	Xinxu Lin
collection	DOAJ
description	Event-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the event cameras due to their spike-based event-driven paradigm, with less power consumption compared to artificial neural networks. In this paper, we propose two end-to-end SNNs, namely Spike-HAR and Spike-HAR++, to introduce spiking transformer into event-based HAR. Spike-HAR includes two novel blocks: a spike attention branch, which enables model to focus on regions with high spike rates, reducing the impact of noise to improve the accuracy, and a parallel spike transformer block with simplified spiking self-attention mechanism, increasing computational efficiency. To better extract crucial information from high-level features, we modify the architecture of the spike attention branch and extend it in Spike-HAR to a higher dimension, proposing Spike-HAR++ to further enhance classification performance. Comprehensive experiments were conducted on four HAR datasets: SL-Animals-DVS, N-LSA64, DVS128 Gesture and DailyAction-DVS, to demonstrate the superior performance of our proposed model. Additionally, the proposed Spike-HAR and Spike-HAR++ require only 0.03 and 0.06 mJ, respectively, to process a sequence of event frames, with model sizes of only 0.7 and 1.8 M. This efficiency positions it as a promising new SNN baseline for the HAR community. Code is available at Spike-HAR++.
format	Article
id	doaj-art-5829bcc88d7e435ebee2c519c2beed0c
institution	Kabale University
issn	1662-5188
language	English
publishDate	2024-11-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Computational Neuroscience
spelling	doaj-art-5829bcc88d7e435ebee2c519c2beed0c2024-11-26T04:25:06ZengFrontiers Media S.A.Frontiers in Computational Neuroscience1662-51882024-11-011810.3389/fncom.2024.15082971508297Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognitionXinxu Lin0Xinxu Lin1Xinxu Lin2Mingxuan Liu3Hong Chen4Hong Chen5School of Integrated Circuits, Tsinghua University, Beijing, ChinaState Key Laboratory of Integrated Chips and Systems, Frontier Institute of Chip and System, Fudan University, Shanghai, ChinaGreater Bay Area National Center of Technology Innovation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, ChinaSchool of Biomedical Engineering, Tsinghua University, Beijing, ChinaSchool of Integrated Circuits, Tsinghua University, Beijing, ChinaGreater Bay Area National Center of Technology Innovation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, ChinaEvent-based cameras are suitable for human action recognition (HAR) by providing movement perception with highly dynamic range, high temporal resolution, high power efficiency and low latency. Spike Neural Networks (SNNs) are naturally suited to deal with the asynchronous and sparse data from the event cameras due to their spike-based event-driven paradigm, with less power consumption compared to artificial neural networks. In this paper, we propose two end-to-end SNNs, namely Spike-HAR and Spike-HAR++, to introduce spiking transformer into event-based HAR. Spike-HAR includes two novel blocks: a spike attention branch, which enables model to focus on regions with high spike rates, reducing the impact of noise to improve the accuracy, and a parallel spike transformer block with simplified spiking self-attention mechanism, increasing computational efficiency. To better extract crucial information from high-level features, we modify the architecture of the spike attention branch and extend it in Spike-HAR to a higher dimension, proposing Spike-HAR++ to further enhance classification performance. Comprehensive experiments were conducted on four HAR datasets: SL-Animals-DVS, N-LSA64, DVS128 Gesture and DailyAction-DVS, to demonstrate the superior performance of our proposed model. Additionally, the proposed Spike-HAR and Spike-HAR++ require only 0.03 and 0.06 mJ, respectively, to process a sequence of event frames, with model sizes of only 0.7 and 1.8 M. This efficiency positions it as a promising new SNN baseline for the HAR community. Code is available at Spike-HAR++.https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/fullspiking neural networkhuman action recognitiontransformerattention branchevent-based vision
spellingShingle	Xinxu Lin Xinxu Lin Xinxu Lin Mingxuan Liu Hong Chen Hong Chen Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition Frontiers in Computational Neuroscience spiking neural network human action recognition transformer attention branch event-based vision
title	Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_full	Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_fullStr	Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_full_unstemmed	Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_short	Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition
title_sort	spike har an energy efficient and lightweight parallel spiking transformer for event based human action recognition
topic	spiking neural network human action recognition transformer attention branch event-based vision
url	https://www.frontiersin.org/articles/10.3389/fncom.2024.1508297/full
work_keys_str_mv	AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT xinxulin spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT mingxuanliu spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT hongchen spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition AT hongchen spikeharanenergyefficientandlightweightparallelspikingtransformerforeventbasedhumanactionrecognition

Spike-HAR++: an energy-efficient and lightweight parallel spiking transformer for event-based human action recognition

Similar Items