NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.

Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when appl...

Full description

Saved in:
Bibliographic Details
Main Authors: Seongil Han, Haemin Jung
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0316454
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555563655200768
author Seongil Han
Haemin Jung
author_facet Seongil Han
Haemin Jung
author_sort Seongil Han
collection DOAJ
description Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model's decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE's capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.
format Article
id doaj-art-9743bc4542344d66999f603f5fa3d3c6
institution Kabale University
issn 1932-6203
language English
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-9743bc4542344d66999f603f5fa3d3c62025-01-08T05:32:07ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e031645410.1371/journal.pone.0316454NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.Seongil HanHaemin JungCredit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model's decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE's capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.https://doi.org/10.1371/journal.pone.0316454
spellingShingle Seongil Han
Haemin Jung
NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.
PLoS ONE
title NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.
title_full NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.
title_fullStr NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.
title_full_unstemmed NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.
title_short NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.
title_sort nate non parametric approach for explainable credit scoring on imbalanced class
url https://doi.org/10.1371/journal.pone.0316454
work_keys_str_mv AT seongilhan natenonparametricapproachforexplainablecreditscoringonimbalancedclass
AT haeminjung natenonparametricapproachforexplainablecreditscoringonimbalancedclass