The lip reading method based on Adaptive Pooling Attention Transformer

Lip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequen...

Full description

Saved in:
Bibliographic Details
Main Authors: YAO Yun, HU Zhenxiao, DENG Tao, WANG Xiao
Format: Article
Language:zho
Published: POSTS&TELECOM PRESS Co., LTD 2025-01-01
Series:智能科学与技术学报
Subjects:
Online Access:http://www.cjist.com.cn/zh/article/99639204/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849311969868251136
author YAO Yun
HU Zhenxiao
DENG Tao
WANG Xiao
author_facet YAO Yun
HU Zhenxiao
DENG Tao
WANG Xiao
author_sort YAO Yun
collection DOAJ
description Lip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequential video frames. However, they suffer from significant information loss, especially when the video information is incomplete or contains noise. In such cases, the model often struggles to distinguish between lip movements at different time points, leading to a significant decline in recognition performance. To address this issue, a lip reading method based on Adaptive Pooling Attention Transformer (APAT-LR) is proposed. This method introduces an Adaptive Pooling Module before the Multi-Head Self-Attention (MHSA) mechanism in the standard Transformer, using a concatenation strategy of max pooling and average pooling. This module helps suppress irrelevant information and enhances the representation of key features. Experiments on the CMLR and GRID datasets show that the proposed APAT-LR method can reduce the recognition error rate, thus verifying the effectiveness of the proposed method.
format Article
id doaj-art-99cec5b9ae8f49c0b4c1714c35d910e9
institution Kabale University
issn 2096-6652
language zho
publishDate 2025-01-01
publisher POSTS&TELECOM PRESS Co., LTD
record_format Article
series 智能科学与技术学报
spelling doaj-art-99cec5b9ae8f49c0b4c1714c35d910e92025-08-20T03:53:13ZzhoPOSTS&TELECOM PRESS Co., LTD智能科学与技术学报2096-66522025-01-0199639204The lip reading method based on Adaptive Pooling Attention TransformerYAO YunHU ZhenxiaoDENG TaoWANG XiaoLip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequential video frames. However, they suffer from significant information loss, especially when the video information is incomplete or contains noise. In such cases, the model often struggles to distinguish between lip movements at different time points, leading to a significant decline in recognition performance. To address this issue, a lip reading method based on Adaptive Pooling Attention Transformer (APAT-LR) is proposed. This method introduces an Adaptive Pooling Module before the Multi-Head Self-Attention (MHSA) mechanism in the standard Transformer, using a concatenation strategy of max pooling and average pooling. This module helps suppress irrelevant information and enhances the representation of key features. Experiments on the CMLR and GRID datasets show that the proposed APAT-LR method can reduce the recognition error rate, thus verifying the effectiveness of the proposed method.http://www.cjist.com.cn/zh/article/99639204/attention mechanismTransformerConvolutional PoolingAdaptive
spellingShingle YAO Yun
HU Zhenxiao
DENG Tao
WANG Xiao
The lip reading method based on Adaptive Pooling Attention Transformer
智能科学与技术学报
attention mechanism
Transformer
Convolutional Pooling
Adaptive
title The lip reading method based on Adaptive Pooling Attention Transformer
title_full The lip reading method based on Adaptive Pooling Attention Transformer
title_fullStr The lip reading method based on Adaptive Pooling Attention Transformer
title_full_unstemmed The lip reading method based on Adaptive Pooling Attention Transformer
title_short The lip reading method based on Adaptive Pooling Attention Transformer
title_sort lip reading method based on adaptive pooling attention transformer
topic attention mechanism
Transformer
Convolutional Pooling
Adaptive
url http://www.cjist.com.cn/zh/article/99639204/
work_keys_str_mv AT yaoyun thelipreadingmethodbasedonadaptivepoolingattentiontransformer
AT huzhenxiao thelipreadingmethodbasedonadaptivepoolingattentiontransformer
AT dengtao thelipreadingmethodbasedonadaptivepoolingattentiontransformer
AT wangxiao thelipreadingmethodbasedonadaptivepoolingattentiontransformer
AT yaoyun lipreadingmethodbasedonadaptivepoolingattentiontransformer
AT huzhenxiao lipreadingmethodbasedonadaptivepoolingattentiontransformer
AT dengtao lipreadingmethodbasedonadaptivepoolingattentiontransformer
AT wangxiao lipreadingmethodbasedonadaptivepoolingattentiontransformer