NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.
Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when appl...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2024-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0316454 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841555563655200768 |
---|---|
author | Seongil Han Haemin Jung |
author_facet | Seongil Han Haemin Jung |
author_sort | Seongil Han |
collection | DOAJ |
description | Credit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model's decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE's capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications. |
format | Article |
id | doaj-art-9743bc4542344d66999f603f5fa3d3c6 |
institution | Kabale University |
issn | 1932-6203 |
language | English |
publishDate | 2024-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj-art-9743bc4542344d66999f603f5fa3d3c62025-01-08T05:32:07ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e031645410.1371/journal.pone.0316454NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class.Seongil HanHaemin JungCredit scoring models play a crucial role for financial institutions in evaluating borrower risk and sustaining profitability. Logistic regression is widely used in credit scoring due to its robustness, interpretability, and computational efficiency; however, its predictive power decreases when applied to complex or non-linear datasets, resulting in reduced accuracy. In contrast, tree-based machine learning models often provide enhanced predictive performance but struggle with interpretability. Furthermore, imbalanced class distributions, which are prevalent in credit scoring, can adversely impact model accuracy and robustness, as the majority class tends to dominate. Despite these challenges, research that comprehensively addresses both the predictive performance and explainability aspects within the credit scoring domain remains limited. This paper introduces the Non-pArameTric oversampling approach for Explainable credit scoring (NATE), a framework designed to address these challenges by combining oversampling techniques with tree-based classifiers to enhance model performance and interpretability. NATE incorporates class balancing methods to mitigate the impact of imbalanced data distributions and integrates interpretability features to elucidate the model's decision-making process. Experimental results show that NATE substantially outperforms traditional logistic regression in credit risk classification, with improvements of 19.33% in AUC, 71.56% in MCC, and 85.33% in F1 Score. Oversampling approaches, particularly when used with gradient boosting, demonstrated superior effectiveness compared to undersampling, achieving optimal metrics of AUC: 0.9649, MCC: 0.8104, and F1 Score: 0.9072. Moreover, NATE enhances interpretability by providing detailed insights into feature contributions, aiding in understanding individual predictions. These findings highlight NATE's capability in managing class imbalance, improving predictive performance, and enhancing model interpretability, demonstrating its potential as a reliable and transparent tool for credit scoring applications.https://doi.org/10.1371/journal.pone.0316454 |
spellingShingle | Seongil Han Haemin Jung NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class. PLoS ONE |
title | NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class. |
title_full | NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class. |
title_fullStr | NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class. |
title_full_unstemmed | NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class. |
title_short | NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class. |
title_sort | nate non parametric approach for explainable credit scoring on imbalanced class |
url | https://doi.org/10.1371/journal.pone.0316454 |
work_keys_str_mv | AT seongilhan natenonparametricapproachforexplainablecreditscoringonimbalancedclass AT haeminjung natenonparametricapproachforexplainablecreditscoringonimbalancedclass |