Depression recognition using voice-based pre-training model

Abstract The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an a...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiangsheng Huang, Fang Wang, Yuan Gao, Yilong Liao, Wenjing Zhang, Li Zhang, Zhenrong Xu
Format: Article
Language:English
Published: Nature Portfolio 2024-06-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-63556-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544596717305856
author Xiangsheng Huang
Fang Wang
Yuan Gao
Yilong Liao
Wenjing Zhang
Li Zhang
Zhenrong Xu
author_facet Xiangsheng Huang
Fang Wang
Yuan Gao
Yilong Liao
Wenjing Zhang
Li Zhang
Zhenrong Xu
author_sort Xiangsheng Huang
collection DOAJ
description Abstract The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.
format Article
id doaj-art-81d495a9cda74eafa45e74f4b8e90b5c
institution Kabale University
issn 2045-2322
language English
publishDate 2024-06-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-81d495a9cda74eafa45e74f4b8e90b5c2025-01-12T12:25:10ZengNature PortfolioScientific Reports2045-23222024-06-0114111310.1038/s41598-024-63556-0Depression recognition using voice-based pre-training modelXiangsheng Huang0Fang Wang1Yuan Gao2Yilong Liao3Wenjing Zhang4Li Zhang5Zhenrong Xu6School of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversitySchool of Biomedical Engineering, South-Central Minzu UniversityAbstract The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.https://doi.org/10.1038/s41598-024-63556-0DepressionPre-training modelVoice featuresWav2vec 2.0DAIC-WOZ
spellingShingle Xiangsheng Huang
Fang Wang
Yuan Gao
Yilong Liao
Wenjing Zhang
Li Zhang
Zhenrong Xu
Depression recognition using voice-based pre-training model
Scientific Reports
Depression
Pre-training model
Voice features
Wav2vec 2.0
DAIC-WOZ
title Depression recognition using voice-based pre-training model
title_full Depression recognition using voice-based pre-training model
title_fullStr Depression recognition using voice-based pre-training model
title_full_unstemmed Depression recognition using voice-based pre-training model
title_short Depression recognition using voice-based pre-training model
title_sort depression recognition using voice based pre training model
topic Depression
Pre-training model
Voice features
Wav2vec 2.0
DAIC-WOZ
url https://doi.org/10.1038/s41598-024-63556-0
work_keys_str_mv AT xiangshenghuang depressionrecognitionusingvoicebasedpretrainingmodel
AT fangwang depressionrecognitionusingvoicebasedpretrainingmodel
AT yuangao depressionrecognitionusingvoicebasedpretrainingmodel
AT yilongliao depressionrecognitionusingvoicebasedpretrainingmodel
AT wenjingzhang depressionrecognitionusingvoicebasedpretrainingmodel
AT lizhang depressionrecognitionusingvoicebasedpretrainingmodel
AT zhenrongxu depressionrecognitionusingvoicebasedpretrainingmodel