Predicting car accident severity in Northwest Ethiopia: a machine learning approach leveraging driver, environmental, and road conditions

Abstract Road traffic accidents (RTAs) in Northwest Ethiopia, a region with a fatality rate of 32.2 per 100,000 residents, pose a critical public health challenge exacerbated by infrastructural deficits and environmental hazards. This study leverages machine learning (ML) to predict accident severit...

Full description

Saved in:
Bibliographic Details
Main Authors: Abraham Keffale Mengistu, Andualem Enyew Gedefaw, Nebebe Demis Baykemagn, Agmasie Damtew Walle, Tirualem Zeleke Yehuala, Meron Asmamaw Alemayehu, Mengistu Abebe Messelu, Bayou Tilahun Assaye
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-08005-2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Road traffic accidents (RTAs) in Northwest Ethiopia, a region with a fatality rate of 32.2 per 100,000 residents, pose a critical public health challenge exacerbated by infrastructural deficits and environmental hazards. This study leverages machine learning (ML) to predict accident severity, addressing gaps in localized predictive frameworks for low- and middle-income countries (LMICs). Our study aims to predict the severity of car accidents in Northwest Ethiopia via machine-learning techniques. Using a dataset of 2,000 accidents (2018–2023) from police reports, we integrated driver demographics, behavioral factors (e.g., alcohol use, seatbelt compliance), and environmental conditions (e.g., unpaved roads, weather) in North West Ethiopia. Ten ML models, including Random Forest, XGBoost, and LightGBM, were evaluated after addressing class imbalance via the Synthetic Minority Oversampling Technique (SMOTE). Hyperparameter tuning and Shapley Additive explanations (SHAP) provided model optimization and interpretability. Random Forest outperformed other models, achieving 82% accuracy (AUC-ROC: 0.87) post-tuning. Driver age (mean: 44 years) and environmental factors (e.g., nighttime on unlit roads, rainy conditions) were critical predictors, increasing fatal accident likelihood by 62%. SMOTE improved the accuracy of the outperforming random forest accuracy from 78.6 to 82%. Random Forest exhibited the highest recall (0.82) after optimization, while ensemble methods dominated performance metrics. The study underscores the efficacy of ML in contextualizing accident severity in LMICs, with Random Forest emerging as a robust tool for policymakers. Prioritizing road paving, sobriety checkpoints, and motorcycle safety could mitigate risks, aligning with Sustainable Development Goal 3.6. Future work should address data limitations (underreporting, geospatial gaps) and expand model interpretability.
ISSN:2045-2322