ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units

Facial nerve paralysis (FNP), commonly known as Bell’s palsy or facial paralysis, significantly affects patients’ daily lives and mental well-being. Timely identification and diagnosis are crucial for early treatment and rehabilitation. With the rapid advancement of deep learning and computer vision...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jia GAO, Wenhao CAI, Junli ZHAO, Fuqing DUAN
Format:	Article
Language:	zho
Published:	Science Press 2025-02-01
Series:	工程科学学报
Subjects:	transformer action units multi-resolution feature maps generator heatmap regression
Online Access:	http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.05.06.003
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841561845566013440
author	Jia GAO Wenhao CAI Junli ZHAO Fuqing DUAN
author_facet	Jia GAO Wenhao CAI Junli ZHAO Fuqing DUAN
author_sort	Jia GAO
collection	DOAJ
description	Facial nerve paralysis (FNP), commonly known as Bell’s palsy or facial paralysis, significantly affects patients’ daily lives and mental well-being. Timely identification and diagnosis are crucial for early treatment and rehabilitation. With the rapid advancement of deep learning and computer vision technologies, automatic recognition of facial paralysis has become feasible, offering a more accurate and objective diagnostic approach. Current research primarily focuses on broad facial changes and often neglects finer facial details, which leads to insufficient analysis of how different areas affect recognition results. This study proposes an innovative method that combines the vision transformer (ViT) model with an action unit (AU) facial region detection network to automatically recognize and analyze facial paralysis. Initially, the ViT model utilizes its self-attention mechanism to accurately determine the presence of facial paralysis. Subsequently, we analyzed the AU data to assess the activity of facial muscles, allowing for a deeper evaluation of the affected areas. The self-attention mechanism in the transformer architecture captures the global contextual information required to recognize facial paralysis. To accurately determine the specific affected regions, we use the pixel2style2pixel (pSp) encoder and the StyleGAN2 generator to encode and decode images and extract feature maps that represent facial characteristics. These maps are then processed through a pyramid convolutional neural network interpreter to generate heatmaps. By optimizing the mean squared error between the predicted and actual heatmaps, we can effectively identify the affected paralysis areas. Our proposed method integrates ViT with facial AUs, designing a ViT-based facial paralysis recognition network that enhances the extraction of local area features through its self-attention mechanism, thereby enabling precise recognition of facial paralysis. Additionally, by incorporating facial AU data, we conducted detailed regional analyses for patients identified with facial paralysis. Experimental results demonstrate the efficacy of our approach, achieving a recognition accuracy of 99.4% for facial paralysis and 81.36% for detecting affected regions on the YouTube Facial Palsy (YFP) and extended Cohn Kanade (CK+) datasets. These results not only highlight the effectiveness of our automatic recognition method compared to the latest techniques but also validate its potential for clinical applications. Furthermore, to facilitate the observation of affected regions, we developed a visualization method that intuitively displays the impacted areas, thereby aiding patients and healthcare professionals in understanding the condition and enhancing communication regarding treatment and rehabilitation strategies. In conclusion, the proposed method illustrates the power of combining advanced deep learning techniques with a detailed analysis of facial AUs to improve the automatic recognition of facial paralysis. By addressing previous research limitations, the proposed method provides a more nuanced understanding of how specific facial areas are affected, leading to improved diagnosis, treatment, and patient care. This innovative approach not only enhances the accuracy of facial paralysis detection but also contributes to facial medical imaging.
format	Article
id	doaj-art-3d5cfc499e964a14b7b56d9ece020fa6
institution	Kabale University
issn	2095-9389
language	zho
publishDate	2025-02-01
publisher	Science Press
record_format	Article
series	工程科学学报
spelling	doaj-art-3d5cfc499e964a14b7b56d9ece020fa62025-01-03T01:21:00ZzhoScience Press工程科学学报2095-93892025-02-0147235136310.13374/j.issn2095-9389.2024.05.06.003240506-0003ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action unitsJia GAO0Wenhao CAI1Junli ZHAO2Fuqing DUAN3College of Computer Science & Technology, Qingdao University, Qingdao 266071, ChinaCollege of Computer Science & Technology, Qingdao University, Qingdao 266071, ChinaCollege of Computer Science & Technology, Qingdao University, Qingdao 266071, ChinaSchool of Artificial Intelligence, Beijing Normal University, Beijing 100875, ChinaFacial nerve paralysis (FNP), commonly known as Bell’s palsy or facial paralysis, significantly affects patients’ daily lives and mental well-being. Timely identification and diagnosis are crucial for early treatment and rehabilitation. With the rapid advancement of deep learning and computer vision technologies, automatic recognition of facial paralysis has become feasible, offering a more accurate and objective diagnostic approach. Current research primarily focuses on broad facial changes and often neglects finer facial details, which leads to insufficient analysis of how different areas affect recognition results. This study proposes an innovative method that combines the vision transformer (ViT) model with an action unit (AU) facial region detection network to automatically recognize and analyze facial paralysis. Initially, the ViT model utilizes its self-attention mechanism to accurately determine the presence of facial paralysis. Subsequently, we analyzed the AU data to assess the activity of facial muscles, allowing for a deeper evaluation of the affected areas. The self-attention mechanism in the transformer architecture captures the global contextual information required to recognize facial paralysis. To accurately determine the specific affected regions, we use the pixel2style2pixel (pSp) encoder and the StyleGAN2 generator to encode and decode images and extract feature maps that represent facial characteristics. These maps are then processed through a pyramid convolutional neural network interpreter to generate heatmaps. By optimizing the mean squared error between the predicted and actual heatmaps, we can effectively identify the affected paralysis areas. Our proposed method integrates ViT with facial AUs, designing a ViT-based facial paralysis recognition network that enhances the extraction of local area features through its self-attention mechanism, thereby enabling precise recognition of facial paralysis. Additionally, by incorporating facial AU data, we conducted detailed regional analyses for patients identified with facial paralysis. Experimental results demonstrate the efficacy of our approach, achieving a recognition accuracy of 99.4% for facial paralysis and 81.36% for detecting affected regions on the YouTube Facial Palsy (YFP) and extended Cohn Kanade (CK+) datasets. These results not only highlight the effectiveness of our automatic recognition method compared to the latest techniques but also validate its potential for clinical applications. Furthermore, to facilitate the observation of affected regions, we developed a visualization method that intuitively displays the impacted areas, thereby aiding patients and healthcare professionals in understanding the condition and enhancing communication regarding treatment and rehabilitation strategies. In conclusion, the proposed method illustrates the power of combining advanced deep learning techniques with a detailed analysis of facial AUs to improve the automatic recognition of facial paralysis. By addressing previous research limitations, the proposed method provides a more nuanced understanding of how specific facial areas are affected, leading to improved diagnosis, treatment, and patient care. This innovative approach not only enhances the accuracy of facial paralysis detection but also contributes to facial medical imaging.http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.05.06.003transformeraction unitsmulti-resolution feature mapsgeneratorheatmap regression
spellingShingle	Jia GAO Wenhao CAI Junli ZHAO Fuqing DUAN ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units 工程科学学报 transformer action units multi-resolution feature maps generator heatmap regression
title	ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_full	ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_fullStr	ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_full_unstemmed	ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_short	ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_sort	vitau facial paralysis recognition and analysis based on vision transformer and facial action units
topic	transformer action units multi-resolution feature maps generator heatmap regression
url	http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.05.06.003
work_keys_str_mv	AT jiagao vitaufacialparalysisrecognitionandanalysisbasedonvisiontransformerandfacialactionunits AT wenhaocai vitaufacialparalysisrecognitionandanalysisbasedonvisiontransformerandfacialactionunits AT junlizhao vitaufacialparalysisrecognitionandanalysisbasedonvisiontransformerandfacialactionunits AT fuqingduan vitaufacialparalysisrecognitionandanalysisbasedonvisiontransformerandfacialactionunits

ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units

Similar Items