Depression recognition using voice-based pre-training model

Abstract The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiangsheng Huang, Fang Wang, Yuan Gao, Yilong Liao, Wenjing Zhang, Li Zhang, Zhenrong Xu
Format:	Article
Language:	English
Published:	Nature Portfolio 2024-06-01
Series:	Scientific Reports
Subjects:	Depression Pre-training model Voice features Wav2vec 2.0 DAIC-WOZ
Online Access:	https://doi.org/10.1038/s41598-024-63556-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841544596717305856
author	Xiangsheng Huang Fang Wang Yuan Gao Yilong Liao Wenjing Zhang Li Zhang Zhenrong Xu
author_facet	Xiangsheng Huang Fang Wang Yuan Gao Yilong Liao Wenjing Zhang Li Zhang Zhenrong Xu
author_sort	Xiangsheng Huang
collection	DOAJ
description	Abstract The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.
format	Article
id	doaj-art-81d495a9cda74eafa45e74f4b8e90b5c
institution	Kabale University
issn	2045-2322
language	English
publishDate	2024-06-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-81d495a9cda74eafa45e74f4b8e90b5c2025-01-12T12:25:10ZengNature PortfolioScientific Reports2045-23222024-06-0114111310.1038/s41598-024-63556-0Depression recognition using voice-based pre-training modelXiangsheng Huang0Fang Wang1Yuan Gao2Yilong Liao3Wenjing Zhang4Li Zhang5Zhenrong Xu6School of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversityAbstract The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.https://doi.org/10.1038/s41598-024-63556-0DepressionPre-training modelVoice featuresWav2vec 2.0DAIC-WOZ
spellingShingle	Xiangsheng Huang Fang Wang Yuan Gao Yilong Liao Wenjing Zhang Li Zhang Zhenrong Xu Depression recognition using voice-based pre-training model Scientific Reports Depression Pre-training model Voice features Wav2vec 2.0 DAIC-WOZ
title	Depression recognition using voice-based pre-training model
title_full	Depression recognition using voice-based pre-training model
title_fullStr	Depression recognition using voice-based pre-training model
title_full_unstemmed	Depression recognition using voice-based pre-training model
title_short	Depression recognition using voice-based pre-training model
title_sort	depression recognition using voice based pre training model
topic	Depression Pre-training model Voice features Wav2vec 2.0 DAIC-WOZ
url	https://doi.org/10.1038/s41598-024-63556-0
work_keys_str_mv	AT xiangshenghuang depressionrecognitionusingvoicebasedpretrainingmodel AT fangwang depressionrecognitionusingvoicebasedpretrainingmodel AT yuangao depressionrecognitionusingvoicebasedpretrainingmodel AT yilongliao depressionrecognitionusingvoicebasedpretrainingmodel AT wenjingzhang depressionrecognitionusingvoicebasedpretrainingmodel AT lizhang depressionrecognitionusingvoicebasedpretrainingmodel AT zhenrongxu depressionrecognitionusingvoicebasedpretrainingmodel

Depression recognition using voice-based pre-training model

Similar Items