Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
This paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-ma...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10811902/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846102112864305152 |
|---|---|
| author | Fabian Obster Monica I. Ciolacu Andreas Humpe |
| author_facet | Fabian Obster Monica I. Ciolacu Andreas Humpe |
| author_sort | Fabian Obster |
| collection | DOAJ |
| description | This paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-making processes, particularly for traffic management and other high-level commitment applications, concerns regarding the transparency and trustworthiness of complex ‘black-box’ models have grown. Theoretical assertions often propose a trade-off between model complexity (predictive performance) and transparency (interpretability); however, empirical evidence supporting this claim is limited and inconsistent. To address this gap, we introduce a novel interpretability scoring system - a Machine Learning Interpretability Rank-based scale - that combines objective measures such as the number of model parameters with subjective interpretability rankings across different model types. This comprehensive methodology includes stratified sampling, model tuning, and a two-step ranking system to operationalize this trade-off. Results reveal a significant negative correlation between interpretability and predictive performance for intrinsically interpretable models, reinforcing the notion of a trade-off. However, this relationship does not hold for black-box models, suggesting that for these algorithms, predictive performance can be prioritized over interpretability. This study contributes to the ongoing discourse on explainable AI, providing practical insights and tools to help researchers and practitioners achieve a balance between model complexity and transparency. We recommend to prioritise more interpretable models when predictive performance is comparable. Our scale provides a transparent and efficient framework for implementing this heuristic and improving parameter optimization. Further research should extend this analysis to unstructured data, explore different interpretability methods, and develop new metrics for evaluating the trade-off across diverse contexts. |
| format | Article |
| id | doaj-art-1e5bc87a599e4d4f9e362ae2f36d651b |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-1e5bc87a599e4d4f9e362ae2f36d651b2024-12-28T00:00:47ZengIEEEIEEE Access2169-35362024-01-011219561319562810.1109/ACCESS.2024.352124210811902Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic PredictionFabian Obster0https://orcid.org/0000-0002-6951-9869Monica I. Ciolacu1https://orcid.org/0000-0001-9464-2511Andreas Humpe2https://orcid.org/0000-0001-8663-3201Department of Business Administration, University of the Bundeswehr Munich, Neubiberg, GermanyFaculty of Social and Educational Sciences, University of Passau, Passau, GermanyInstitute for Applications of Machine Learning and Intelligent Systems (IAMLIS), Munich University of Applied Sciences, Munich, GermanyThis paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-making processes, particularly for traffic management and other high-level commitment applications, concerns regarding the transparency and trustworthiness of complex ‘black-box’ models have grown. Theoretical assertions often propose a trade-off between model complexity (predictive performance) and transparency (interpretability); however, empirical evidence supporting this claim is limited and inconsistent. To address this gap, we introduce a novel interpretability scoring system - a Machine Learning Interpretability Rank-based scale - that combines objective measures such as the number of model parameters with subjective interpretability rankings across different model types. This comprehensive methodology includes stratified sampling, model tuning, and a two-step ranking system to operationalize this trade-off. Results reveal a significant negative correlation between interpretability and predictive performance for intrinsically interpretable models, reinforcing the notion of a trade-off. However, this relationship does not hold for black-box models, suggesting that for these algorithms, predictive performance can be prioritized over interpretability. This study contributes to the ongoing discourse on explainable AI, providing practical insights and tools to help researchers and practitioners achieve a balance between model complexity and transparency. We recommend to prioritise more interpretable models when predictive performance is comparable. Our scale provides a transparent and efficient framework for implementing this heuristic and improving parameter optimization. Further research should extend this analysis to unstructured data, explore different interpretability methods, and develop new metrics for evaluating the trade-off across diverse contexts.https://ieeexplore.ieee.org/document/10811902/Explainable AIpredictive analysisinterpretability scoringempirical analysisbicycle traffic databias-variance |
| spellingShingle | Fabian Obster Monica I. Ciolacu Andreas Humpe Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction IEEE Access Explainable AI predictive analysis interpretability scoring empirical analysis bicycle traffic data bias-variance |
| title | Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction |
| title_full | Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction |
| title_fullStr | Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction |
| title_full_unstemmed | Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction |
| title_short | Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction |
| title_sort | balancing predictive performance and interpretability in machine learning a scoring system and an empirical study in traffic prediction |
| topic | Explainable AI predictive analysis interpretability scoring empirical analysis bicycle traffic data bias-variance |
| url | https://ieeexplore.ieee.org/document/10811902/ |
| work_keys_str_mv | AT fabianobster balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction AT monicaiciolacu balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction AT andreashumpe balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction |