The lip reading method based on Adaptive Pooling Attention Transformer
Lip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequen...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
POSTS&TELECOM PRESS Co., LTD
2025-01-01
|
| Series: | 智能科学与技术学报 |
| Subjects: | |
| Online Access: | http://www.cjist.com.cn/zh/article/99639204/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849311969868251136 |
|---|---|
| author | YAO Yun HU Zhenxiao DENG Tao WANG Xiao |
| author_facet | YAO Yun HU Zhenxiao DENG Tao WANG Xiao |
| author_sort | YAO Yun |
| collection | DOAJ |
| description | Lip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequential video frames. However, they suffer from significant information loss, especially when the video information is incomplete or contains noise. In such cases, the model often struggles to distinguish between lip movements at different time points, leading to a significant decline in recognition performance. To address this issue, a lip reading method based on Adaptive Pooling Attention Transformer (APAT-LR) is proposed. This method introduces an Adaptive Pooling Module before the Multi-Head Self-Attention (MHSA) mechanism in the standard Transformer, using a concatenation strategy of max pooling and average pooling. This module helps suppress irrelevant information and enhances the representation of key features. Experiments on the CMLR and GRID datasets show that the proposed APAT-LR method can reduce the recognition error rate, thus verifying the effectiveness of the proposed method. |
| format | Article |
| id | doaj-art-99cec5b9ae8f49c0b4c1714c35d910e9 |
| institution | Kabale University |
| issn | 2096-6652 |
| language | zho |
| publishDate | 2025-01-01 |
| publisher | POSTS&TELECOM PRESS Co., LTD |
| record_format | Article |
| series | 智能科学与技术学报 |
| spelling | doaj-art-99cec5b9ae8f49c0b4c1714c35d910e92025-08-20T03:53:13ZzhoPOSTS&TELECOM PRESS Co., LTD智能科学与技术学报2096-66522025-01-0199639204The lip reading method based on Adaptive Pooling Attention TransformerYAO YunHU ZhenxiaoDENG TaoWANG XiaoLip reading technology establishes the mapping relationship between lip movements and specific language characters by processing a series of consecutive lip images, thereby enabling semantic information recognition. Existing methods mainly use recurrent networks for spatiotemporal modeling of sequential video frames. However, they suffer from significant information loss, especially when the video information is incomplete or contains noise. In such cases, the model often struggles to distinguish between lip movements at different time points, leading to a significant decline in recognition performance. To address this issue, a lip reading method based on Adaptive Pooling Attention Transformer (APAT-LR) is proposed. This method introduces an Adaptive Pooling Module before the Multi-Head Self-Attention (MHSA) mechanism in the standard Transformer, using a concatenation strategy of max pooling and average pooling. This module helps suppress irrelevant information and enhances the representation of key features. Experiments on the CMLR and GRID datasets show that the proposed APAT-LR method can reduce the recognition error rate, thus verifying the effectiveness of the proposed method.http://www.cjist.com.cn/zh/article/99639204/attention mechanismTransformerConvolutional PoolingAdaptive |
| spellingShingle | YAO Yun HU Zhenxiao DENG Tao WANG Xiao The lip reading method based on Adaptive Pooling Attention Transformer 智能科学与技术学报 attention mechanism Transformer Convolutional Pooling Adaptive |
| title | The lip reading method based on Adaptive Pooling Attention Transformer |
| title_full | The lip reading method based on Adaptive Pooling Attention Transformer |
| title_fullStr | The lip reading method based on Adaptive Pooling Attention Transformer |
| title_full_unstemmed | The lip reading method based on Adaptive Pooling Attention Transformer |
| title_short | The lip reading method based on Adaptive Pooling Attention Transformer |
| title_sort | lip reading method based on adaptive pooling attention transformer |
| topic | attention mechanism Transformer Convolutional Pooling Adaptive |
| url | http://www.cjist.com.cn/zh/article/99639204/ |
| work_keys_str_mv | AT yaoyun thelipreadingmethodbasedonadaptivepoolingattentiontransformer AT huzhenxiao thelipreadingmethodbasedonadaptivepoolingattentiontransformer AT dengtao thelipreadingmethodbasedonadaptivepoolingattentiontransformer AT wangxiao thelipreadingmethodbasedonadaptivepoolingattentiontransformer AT yaoyun lipreadingmethodbasedonadaptivepoolingattentiontransformer AT huzhenxiao lipreadingmethodbasedonadaptivepoolingattentiontransformer AT dengtao lipreadingmethodbasedonadaptivepoolingattentiontransformer AT wangxiao lipreadingmethodbasedonadaptivepoolingattentiontransformer |