ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units

Facial nerve paralysis (FNP), commonly known as Bell’s palsy or facial paralysis, significantly affects patients’ daily lives and mental well-being. Timely identification and diagnosis are crucial for early treatment and rehabilitation. With the rapid advancement of deep learning and computer vision...

Full description

Saved in:
Bibliographic Details
Main Authors: Jia GAO, Wenhao CAI, Junli ZHAO, Fuqing DUAN
Format: Article
Language:zho
Published: Science Press 2025-02-01
Series:工程科学学报
Subjects:
Online Access:http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.05.06.003
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841561845566013440
author Jia GAO
Wenhao CAI
Junli ZHAO
Fuqing DUAN
author_facet Jia GAO
Wenhao CAI
Junli ZHAO
Fuqing DUAN
author_sort Jia GAO
collection DOAJ
description Facial nerve paralysis (FNP), commonly known as Bell’s palsy or facial paralysis, significantly affects patients’ daily lives and mental well-being. Timely identification and diagnosis are crucial for early treatment and rehabilitation. With the rapid advancement of deep learning and computer vision technologies, automatic recognition of facial paralysis has become feasible, offering a more accurate and objective diagnostic approach. Current research primarily focuses on broad facial changes and often neglects finer facial details, which leads to insufficient analysis of how different areas affect recognition results. This study proposes an innovative method that combines the vision transformer (ViT) model with an action unit (AU) facial region detection network to automatically recognize and analyze facial paralysis. Initially, the ViT model utilizes its self-attention mechanism to accurately determine the presence of facial paralysis. Subsequently, we analyzed the AU data to assess the activity of facial muscles, allowing for a deeper evaluation of the affected areas. The self-attention mechanism in the transformer architecture captures the global contextual information required to recognize facial paralysis. To accurately determine the specific affected regions, we use the pixel2style2pixel (pSp) encoder and the StyleGAN2 generator to encode and decode images and extract feature maps that represent facial characteristics. These maps are then processed through a pyramid convolutional neural network interpreter to generate heatmaps. By optimizing the mean squared error between the predicted and actual heatmaps, we can effectively identify the affected paralysis areas. Our proposed method integrates ViT with facial AUs, designing a ViT-based facial paralysis recognition network that enhances the extraction of local area features through its self-attention mechanism, thereby enabling precise recognition of facial paralysis. Additionally, by incorporating facial AU data, we conducted detailed regional analyses for patients identified with facial paralysis. Experimental results demonstrate the efficacy of our approach, achieving a recognition accuracy of 99.4% for facial paralysis and 81.36% for detecting affected regions on the YouTube Facial Palsy (YFP) and extended Cohn Kanade (CK+) datasets. These results not only highlight the effectiveness of our automatic recognition method compared to the latest techniques but also validate its potential for clinical applications. Furthermore, to facilitate the observation of affected regions, we developed a visualization method that intuitively displays the impacted areas, thereby aiding patients and healthcare professionals in understanding the condition and enhancing communication regarding treatment and rehabilitation strategies. In conclusion, the proposed method illustrates the power of combining advanced deep learning techniques with a detailed analysis of facial AUs to improve the automatic recognition of facial paralysis. By addressing previous research limitations, the proposed method provides a more nuanced understanding of how specific facial areas are affected, leading to improved diagnosis, treatment, and patient care. This innovative approach not only enhances the accuracy of facial paralysis detection but also contributes to facial medical imaging.
format Article
id doaj-art-3d5cfc499e964a14b7b56d9ece020fa6
institution Kabale University
issn 2095-9389
language zho
publishDate 2025-02-01
publisher Science Press
record_format Article
series 工程科学学报
spelling doaj-art-3d5cfc499e964a14b7b56d9ece020fa62025-01-03T01:21:00ZzhoScience Press工程科学学报2095-93892025-02-0147235136310.13374/j.issn2095-9389.2024.05.06.003240506-0003ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action unitsJia GAO0Wenhao CAI1Junli ZHAO2Fuqing DUAN3College of Computer Science & Technology, Qingdao University, Qingdao 266071, ChinaCollege of Computer Science & Technology, Qingdao University, Qingdao 266071, ChinaCollege of Computer Science & Technology, Qingdao University, Qingdao 266071, ChinaSchool of Artificial Intelligence, Beijing Normal University, Beijing 100875, ChinaFacial nerve paralysis (FNP), commonly known as Bell’s palsy or facial paralysis, significantly affects patients’ daily lives and mental well-being. Timely identification and diagnosis are crucial for early treatment and rehabilitation. With the rapid advancement of deep learning and computer vision technologies, automatic recognition of facial paralysis has become feasible, offering a more accurate and objective diagnostic approach. Current research primarily focuses on broad facial changes and often neglects finer facial details, which leads to insufficient analysis of how different areas affect recognition results. This study proposes an innovative method that combines the vision transformer (ViT) model with an action unit (AU) facial region detection network to automatically recognize and analyze facial paralysis. Initially, the ViT model utilizes its self-attention mechanism to accurately determine the presence of facial paralysis. Subsequently, we analyzed the AU data to assess the activity of facial muscles, allowing for a deeper evaluation of the affected areas. The self-attention mechanism in the transformer architecture captures the global contextual information required to recognize facial paralysis. To accurately determine the specific affected regions, we use the pixel2style2pixel (pSp) encoder and the StyleGAN2 generator to encode and decode images and extract feature maps that represent facial characteristics. These maps are then processed through a pyramid convolutional neural network interpreter to generate heatmaps. By optimizing the mean squared error between the predicted and actual heatmaps, we can effectively identify the affected paralysis areas. Our proposed method integrates ViT with facial AUs, designing a ViT-based facial paralysis recognition network that enhances the extraction of local area features through its self-attention mechanism, thereby enabling precise recognition of facial paralysis. Additionally, by incorporating facial AU data, we conducted detailed regional analyses for patients identified with facial paralysis. Experimental results demonstrate the efficacy of our approach, achieving a recognition accuracy of 99.4% for facial paralysis and 81.36% for detecting affected regions on the YouTube Facial Palsy (YFP) and extended Cohn Kanade (CK+) datasets. These results not only highlight the effectiveness of our automatic recognition method compared to the latest techniques but also validate its potential for clinical applications. Furthermore, to facilitate the observation of affected regions, we developed a visualization method that intuitively displays the impacted areas, thereby aiding patients and healthcare professionals in understanding the condition and enhancing communication regarding treatment and rehabilitation strategies. In conclusion, the proposed method illustrates the power of combining advanced deep learning techniques with a detailed analysis of facial AUs to improve the automatic recognition of facial paralysis. By addressing previous research limitations, the proposed method provides a more nuanced understanding of how specific facial areas are affected, leading to improved diagnosis, treatment, and patient care. This innovative approach not only enhances the accuracy of facial paralysis detection but also contributes to facial medical imaging.http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.05.06.003transformeraction unitsmulti-resolution feature mapsgeneratorheatmap regression
spellingShingle Jia GAO
Wenhao CAI
Junli ZHAO
Fuqing DUAN
ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
工程科学学报
transformer
action units
multi-resolution feature maps
generator
heatmap regression
title ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_full ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_fullStr ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_full_unstemmed ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_short ViTAU: Facial paralysis recognition and analysis based on vision transformer and facial action units
title_sort vitau facial paralysis recognition and analysis based on vision transformer and facial action units
topic transformer
action units
multi-resolution feature maps
generator
heatmap regression
url http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.05.06.003
work_keys_str_mv AT jiagao vitaufacialparalysisrecognitionandanalysisbasedonvisiontransformerandfacialactionunits
AT wenhaocai vitaufacialparalysisrecognitionandanalysisbasedonvisiontransformerandfacialactionunits
AT junlizhao vitaufacialparalysisrecognitionandanalysisbasedonvisiontransformerandfacialactionunits
AT fuqingduan vitaufacialparalysisrecognitionandanalysisbasedonvisiontransformerandfacialactionunits