Machine learning-driven development of a stratified CES-D screening system: optimizing depression assessment through adaptive item selection

Abstract Objective To develop a stratified screening tool through machine learning approaches for the Center for Epidemiologic Studies Depression Scale (CES-D-20) while maintaining diagnostic accuracy, addressing the efficiency limitations in large-scale applications. Methods Data were derived from...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruo-Fei Xu, Zhen-Jing Liu, Shunan Ouyang, Qin Dong, Wen-Jing Yan, Dong-Wu Xu
Format: Article
Language:English
Published: BMC 2025-03-01
Series:BMC Psychiatry
Subjects:
Online Access:https://doi.org/10.1186/s12888-025-06693-8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Objective To develop a stratified screening tool through machine learning approaches for the Center for Epidemiologic Studies Depression Scale (CES-D-20) while maintaining diagnostic accuracy, addressing the efficiency limitations in large-scale applications. Methods Data were derived from the Chinese Psychological Health Guard Project (primary sample: n = 179,877; age 9–18) and China Labor-force Dynamics Survey (validation samples across age spans). We employed a two-stage machine learning approach: first applying Recursive Feature Elimination with multiple linear regression to identify core predictive items for total depression scores, followed by logistic regression for optimizing depression classification (CES-D ≥ 16). Model performance was systematically evaluated through discrimination (ROC analysis), calibration (Brier score), and clinical utility analyses (decision curve analysis), with additional validation using random forest and support vector machine algorithms across independent samples. Results The resulting stratified screening system consists of an initial four-item rapid screening layer (encompassing emotional, cognitive, and interpersonal dimensions) for detecting probable depression (AUC = 0.982, sensitivity = 0.945, specificity = 0.926), followed by an enhanced assessment layer with five additional items. Together, these nine items enable accurate prediction of the full CES-D-20 total score (R2 = 0.957). This stratified approach demonstrated robust generalizability across age groups (R2 > 0.94, accuracy > 0.91) and time points. Calibration analyses and decision curve analyses confirmed optimal clinical utility, particularly in the critical risk threshold range (0.3–0.6). Conclusions This study contributes to the refinement of CES-D by developing a machine learning-derived stratified screening version, offering an efficient and reliable approach that optimizes assessment burden while maintaining excellent psychometric properties. The stratified design makes it particularly valuable for large-scale mental health screening programs, enabling efficient risk stratification and targeted assessment allocation.
ISSN:1471-244X