Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction

This paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-ma...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fabian Obster, Monica I. Ciolacu, Andreas Humpe
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Explainable AI predictive analysis interpretability scoring empirical analysis bicycle traffic data bias-variance
Online Access:	https://ieeexplore.ieee.org/document/10811902/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846102112864305152
author	Fabian Obster Monica I. Ciolacu Andreas Humpe
author_facet	Fabian Obster Monica I. Ciolacu Andreas Humpe
author_sort	Fabian Obster
collection	DOAJ
description	This paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-making processes, particularly for traffic management and other high-level commitment applications, concerns regarding the transparency and trustworthiness of complex ‘black-box’ models have grown. Theoretical assertions often propose a trade-off between model complexity (predictive performance) and transparency (interpretability); however, empirical evidence supporting this claim is limited and inconsistent. To address this gap, we introduce a novel interpretability scoring system - a Machine Learning Interpretability Rank-based scale - that combines objective measures such as the number of model parameters with subjective interpretability rankings across different model types. This comprehensive methodology includes stratified sampling, model tuning, and a two-step ranking system to operationalize this trade-off. Results reveal a significant negative correlation between interpretability and predictive performance for intrinsically interpretable models, reinforcing the notion of a trade-off. However, this relationship does not hold for black-box models, suggesting that for these algorithms, predictive performance can be prioritized over interpretability. This study contributes to the ongoing discourse on explainable AI, providing practical insights and tools to help researchers and practitioners achieve a balance between model complexity and transparency. We recommend to prioritise more interpretable models when predictive performance is comparable. Our scale provides a transparent and efficient framework for implementing this heuristic and improving parameter optimization. Further research should extend this analysis to unstructured data, explore different interpretability methods, and develop new metrics for evaluating the trade-off across diverse contexts.
format	Article
id	doaj-art-1e5bc87a599e4d4f9e362ae2f36d651b
institution	Kabale University
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-1e5bc87a599e4d4f9e362ae2f36d651b2024-12-28T00:00:47ZengIEEEIEEE Access2169-35362024-01-011219561319562810.1109/ACCESS.2024.352124210811902Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic PredictionFabian Obster0https://orcid.org/0000-0002-6951-9869Monica I. Ciolacu1https://orcid.org/0000-0001-9464-2511Andreas Humpe2https://orcid.org/0000-0001-8663-3201Department of Business Administration, University of the Bundeswehr Munich, Neubiberg, GermanyFaculty of Social and Educational Sciences, University of Passau, Passau, GermanyInstitute for Applications of Machine Learning and Intelligent Systems (IAMLIS), Munich University of Applied Sciences, Munich, GermanyThis paper investigates the empirical relationship between predictive performance, often called predictive power, and interpretability of various Machine Learning algorithms, focusing on bicycle traffic data from four cities. As Machine Learning algorithms become increasingly embedded in decision-making processes, particularly for traffic management and other high-level commitment applications, concerns regarding the transparency and trustworthiness of complex ‘black-box’ models have grown. Theoretical assertions often propose a trade-off between model complexity (predictive performance) and transparency (interpretability); however, empirical evidence supporting this claim is limited and inconsistent. To address this gap, we introduce a novel interpretability scoring system - a Machine Learning Interpretability Rank-based scale - that combines objective measures such as the number of model parameters with subjective interpretability rankings across different model types. This comprehensive methodology includes stratified sampling, model tuning, and a two-step ranking system to operationalize this trade-off. Results reveal a significant negative correlation between interpretability and predictive performance for intrinsically interpretable models, reinforcing the notion of a trade-off. However, this relationship does not hold for black-box models, suggesting that for these algorithms, predictive performance can be prioritized over interpretability. This study contributes to the ongoing discourse on explainable AI, providing practical insights and tools to help researchers and practitioners achieve a balance between model complexity and transparency. We recommend to prioritise more interpretable models when predictive performance is comparable. Our scale provides a transparent and efficient framework for implementing this heuristic and improving parameter optimization. Further research should extend this analysis to unstructured data, explore different interpretability methods, and develop new metrics for evaluating the trade-off across diverse contexts.https://ieeexplore.ieee.org/document/10811902/Explainable AIpredictive analysisinterpretability scoringempirical analysisbicycle traffic databias-variance
spellingShingle	Fabian Obster Monica I. Ciolacu Andreas Humpe Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction IEEE Access Explainable AI predictive analysis interpretability scoring empirical analysis bicycle traffic data bias-variance
title	Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_full	Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_fullStr	Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_full_unstemmed	Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_short	Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction
title_sort	balancing predictive performance and interpretability in machine learning a scoring system and an empirical study in traffic prediction
topic	Explainable AI predictive analysis interpretability scoring empirical analysis bicycle traffic data bias-variance
url	https://ieeexplore.ieee.org/document/10811902/
work_keys_str_mv	AT fabianobster balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction AT monicaiciolacu balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction AT andreashumpe balancingpredictiveperformanceandinterpretabilityinmachinelearningascoringsystemandanempiricalstudyintrafficprediction

Balancing Predictive Performance and Interpretability in Machine Learning: A Scoring System and an Empirical Study in Traffic Prediction

Similar Items