Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction

This paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-ma...

Full description

Saved in:
Bibliographic Details
Main Authors: Fabian Obster, Monica I. Ciolacu, Andreas Humpe
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10811902/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846102112864305152
author Fabian Obster
Monica I. Ciolacu
Andreas Humpe
author_facet Fabian Obster
Monica I. Ciolacu
Andreas Humpe
author_sort Fabian Obster
collection DOAJ
description This paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-making processes, particularly for traffic management and other high-level commitment applications, concerns regarding the transparency and trustworthiness of complex ‘black-box’ models have grown. Theoretical assertions often propose a trade-off between model complexity (predictive performance) and transparency (interpretability); however, empirical evidence supporting this claim is limited and inconsistent. To address this gap, we introduce a novel interpretability scoring system - a Machine Learning Interpretability Rank-based scale - that combines objective measures such as the number of model parameters with subjective interpretability rankings across different model types. This comprehensive methodology includes stratified sampling, model tuning, and a two-step ranking system to operationalize this trade-off. Results reveal a significant negative correlation between interpretability and predictive performance for intrinsically interpretable models, reinforcing the notion of a trade-off. However, this relationship does not hold for black-box models, suggesting that for these algorithms, predictive performance can be prioritized over interpretability. This study contributes to the ongoing discourse on explainable AI, providing practical insights and tools to help researchers and practitioners achieve a balance between model complexity and transparency. We recommend to prioritise more interpretable models when predictive performance is comparable. Our scale provides a transparent and efficient framework for implementing this heuristic and improving parameter optimization. Further research should extend this analysis to unstructured data, explore different interpretability methods, and develop new metrics for evaluating the trade-off across diverse contexts.
format Article
id doaj-art-1e5bc87a599e4d4f9e362ae2f36d651b
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-1e5bc87a599e4d4f9e362ae2f36d651b2024-12-28T00:00:47ZengIEEEIEEE Access2169-35362024-01-011219561319562810.1109/ACCESS.2024.352124210811902Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic PredictionFabian Obster0https://orcid.org/0000-0002-6951-9869Monica I. Ciolacu1https://orcid.org/0000-0001-9464-2511Andreas Humpe2https://orcid.org/0000-0001-8663-3201Department of Business Administration, University of the Bundeswehr Munich, Neubiberg, GermanyFaculty of Social and Educational Sciences, University of Passau, Passau, GermanyInstitute for Applications of Machine Learning and Intelligent Systems (IAMLIS), Munich University of Applied Sciences, Munich, GermanyThis paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-making processes, particularly for traffic management and other high-level commitment applications, concerns regarding the transparency and trustworthiness of complex ‘black-box’ models have grown. Theoretical assertions often propose a trade-off between model complexity (predictive performance) and transparency (interpretability); however, empirical evidence supporting this claim is limited and inconsistent. To address this gap, we introduce a novel interpretability scoring system - a Machine Learning Interpretability Rank-based scale - that combines objective measures such as the number of model parameters with subjective interpretability rankings across different model types. This comprehensive methodology includes stratified sampling, model tuning, and a two-step ranking system to operationalize this trade-off. Results reveal a significant negative correlation between interpretability and predictive performance for intrinsically interpretable models, reinforcing the notion of a trade-off. However, this relationship does not hold for black-box models, suggesting that for these algorithms, predictive performance can be prioritized over interpretability. This study contributes to the ongoing discourse on explainable AI, providing practical insights and tools to help researchers and practitioners achieve a balance between model complexity and transparency. We recommend to prioritise more interpretable models when predictive performance is comparable. Our scale provides a transparent and efficient framework for implementing this heuristic and improving parameter optimization. Further research should extend this analysis to unstructured data, explore different interpretability methods, and develop new metrics for evaluating the trade-off across diverse contexts.https://ieeexplore.ieee.org/document/10811902/Explainable AIpredictive analysisinterpretability scoringempirical analysisbicycle traffic databias-variance
spellingShingle Fabian Obster
Monica I. Ciolacu
Andreas Humpe
Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
IEEE Access
Explainable AI
predictive analysis
interpretability scoring
empirical analysis
bicycle traffic data
bias-variance
title Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_full Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_fullStr Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_full_unstemmed Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_short Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_sort balancing predictive performance and interpretability in machine learning a scoring system and an empirical study in traffic prediction
topic Explainable AI
predictive analysis
interpretability scoring
empirical analysis
bicycle traffic data
bias-variance
url https://ieeexplore.ieee.org/document/10811902/
work_keys_str_mv AT fabianobster balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction
AT monicaiciolacu balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction
AT andreashumpe balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction