Improving Accuracy and Calibration of Deep Image Classifiers With Agreement-Driven Dynamic Ensemble

One of the biggest challenges when considering the applicability of Deep Learning systems to real-world problems is the possibility of failure in <italic>critical</italic> situations. Possible strategies to tackle this problem are two-fold: (i) models need to be highly accurate, conseque...

Full description

Saved in:
Bibliographic Details
Main Authors: Pedro Conde, Rui L. Lopes, Cristiano Premebida
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of the Computer Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10806808/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:One of the biggest challenges when considering the applicability of Deep Learning systems to real-world problems is the possibility of failure in <italic>critical</italic> situations. Possible strategies to tackle this problem are two-fold: (i) models need to be highly accurate, consequently reducing this risk of failure; (ii) facing the impossibility of completely eliminating the risk of error, the models should be able to inform the level of uncertainty at the prediction level. As such, state-of-the-art DL models should be <italic>accurate</italic> and also <italic>calibrated</italic>, meaning that each prediction has to codify its confidence/uncertainty in a way that approximates the true likelihood of correctness. Nonetheless, relevant literature shows that improvements in <italic>accuracy</italic> and <italic>calibration</italic> are not usually related. This motivates the development of Agreement-Driven Dynamic Ensemble, a deep ensemble method that - by dynamically combining the advantages of two different ensemble strategies - is capable of achieving the highest possible accuracy values while obtaining also substantial improvements in calibration. The merits of the proposed algorithm are shown through a series of representative experiments, leveraging two different neural network architectures and three different datasets against multiple state-of-the-art baselines.
ISSN:2644-1268