The lip reading method based on Adaptive Pooling Attention Transformer

Lip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequen...

Full description

Saved in:

Bibliographic Details
Main Authors:	YAO Yun, HU Zhenxiao, DENG Tao, WANG Xiao
Format:	Article
Language:	zho
Published:	POSTS&TELECOM PRESS Co., LTD 2025-01-01
Series:	智能科学与技术学报
Subjects:	attention mechanism Transformer Convolutional Pooling Adaptive
Online Access:	http://www.cjist.com.cn/zh/article/99639204/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849311969868251136
author	YAO Yun HU Zhenxiao DENG Tao WANG Xiao
author_facet	YAO Yun HU Zhenxiao DENG Tao WANG Xiao
author_sort	YAO Yun
collection	DOAJ
description	Lip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequential video frames. However, they suffer from significant information loss, especially when the video information is incomplete or contains noise. In such cases, the model often struggles to distinguish between lip movements at different time points, leading to a significant decline in recognition performance. To address this issue, a lip reading method based on Adaptive Pooling Attention Transformer (APAT-LR) is proposed. This method introduces an Adaptive Pooling Module before the Multi-Head Self-Attention (MHSA) mechanism in the standard Transformer, using a concatenation strategy of max pooling and average pooling. This module helps suppress irrelevant information and enhances the representation of key features. Experiments on the CMLR and GRID datasets show that the proposed APAT-LR method can reduce the recognition error rate, thus verifying the effectiveness of the proposed method.
format	Article
id	doaj-art-99cec5b9ae8f49c0b4c1714c35d910e9
institution	Kabale University
issn	2096-6652
language	zho
publishDate	2025-01-01
publisher	POSTS&TELECOM PRESS Co., LTD
record_format	Article
series	智能科学与技术学报
spelling	doaj-art-99cec5b9ae8f49c0b4c1714c35d910e92025-08-20T03:53:13ZzhoPOSTS&TELECOM PRESS Co., LTD智能科学与技术学报2096-66522025-01-0199639204The lip reading method based on Adaptive Pooling Attention TransformerYAO YunHU ZhenxiaoDENG TaoWANG XiaoLip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequential video frames. However, they suffer from significant information loss, especially when the video information is incomplete or contains noise. In such cases, the model often struggles to distinguish between lip movements at different time points, leading to a significant decline in recognition performance. To address this issue, a lip reading method based on Adaptive Pooling Attention Transformer (APAT-LR) is proposed. This method introduces an Adaptive Pooling Module before the Multi-Head Self-Attention (MHSA) mechanism in the standard Transformer, using a concatenation strategy of max pooling and average pooling. This module helps suppress irrelevant information and enhances the representation of key features. Experiments on the CMLR and GRID datasets show that the proposed APAT-LR method can reduce the recognition error rate, thus verifying the effectiveness of the proposed method.http://www.cjist.com.cn/zh/article/99639204/attention mechanismTransformerConvolutional PoolingAdaptive
spellingShingle	YAO Yun HU Zhenxiao DENG Tao WANG Xiao The lip reading method based on Adaptive Pooling Attention Transformer 智能科学与技术学报 attention mechanism Transformer Convolutional Pooling Adaptive
title	The lip reading method based on Adaptive Pooling Attention Transformer
title_full	The lip reading method based on Adaptive Pooling Attention Transformer
title_fullStr	The lip reading method based on Adaptive Pooling Attention Transformer
title_full_unstemmed	The lip reading method based on Adaptive Pooling Attention Transformer
title_short	The lip reading method based on Adaptive Pooling Attention Transformer
title_sort	lip reading method based on adaptive pooling attention transformer
topic	attention mechanism Transformer Convolutional Pooling Adaptive
url	http://www.cjist.com.cn/zh/article/99639204/
work_keys_str_mv	AT yaoyun thelipreadingmethodbasedonadaptivepoolingattentiontransformer AT huzhenxiao thelipreadingmethodbasedonadaptivepoolingattentiontransformer AT dengtao thelipreadingmethodbasedonadaptivepoolingattentiontransformer AT wangxiao thelipreadingmethodbasedonadaptivepoolingattentiontransformer AT yaoyun lipreadingmethodbasedonadaptivepoolingattentiontransformer AT huzhenxiao lipreadingmethodbasedonadaptivepoolingattentiontransformer AT dengtao lipreadingmethodbasedonadaptivepoolingattentiontransformer AT wangxiao lipreadingmethodbasedonadaptivepoolingattentiontransformer

The lip reading method based on Adaptive Pooling Attention Transformer

Similar Items