Parameter uncertainties for imperfect surrogate models in the low-noise regime

Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As...

Full description

Saved in:
Bibliographic Details
Main Authors: Thomas D Swinburne, Danny Perez
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Machine Learning: Science and Technology
Subjects:
Online Access:https://doi.org/10.1088/2632-2153/ad9fce
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841543725677805568
author Thomas D Swinburne
Danny Perez
author_facet Thomas D Swinburne
Danny Perez
author_sort Thomas D Swinburne
collection DOAJ
description Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As misspecification is the main source of uncertainty for surrogate models of low-noise calculations, such as those arising in atomistic simulation, predictive uncertainties are systematically underestimated. We analyze the true generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show that posterior parameter distributions must cover every training point to avoid a divergence in the generalization error and design a compatible ansatz which incurs minimal overhead for linear models. The approach is demonstrated on model problems before application to thousand-dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors in terms of parameter uncertainties, allowing this important source of uncertainty to be incorporated in multi-scale computational workflows.
format Article
id doaj-art-d6a6a95d91ba4d708e3d0818f03922a3
institution Kabale University
issn 2632-2153
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Machine Learning: Science and Technology
spelling doaj-art-d6a6a95d91ba4d708e3d0818f03922a32025-01-13T06:35:42ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101500810.1088/2632-2153/ad9fceParameter uncertainties for imperfect surrogate models in the low-noise regimeThomas D Swinburne0https://orcid.org/0000-0002-3255-4257Danny Perez1https://orcid.org/0000-0003-3028-5249Aix-Marseille Université , CNRS, CINaM UMR 7325, Campus de Luminy, 13288 Marseille, FranceTheoretical Division T-1, Los Alamos National Laboratory , Los Alamos, NM, United States of AmericaBayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As misspecification is the main source of uncertainty for surrogate models of low-noise calculations, such as those arising in atomistic simulation, predictive uncertainties are systematically underestimated. We analyze the true generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show that posterior parameter distributions must cover every training point to avoid a divergence in the generalization error and design a compatible ansatz which incurs minimal overhead for linear models. The approach is demonstrated on model problems before application to thousand-dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors in terms of parameter uncertainties, allowing this important source of uncertainty to be incorporated in multi-scale computational workflows.https://doi.org/10.1088/2632-2153/ad9fceBayesian methodsuncertainty quantificationsurrogate modelsmisspecification
spellingShingle Thomas D Swinburne
Danny Perez
Parameter uncertainties for imperfect surrogate models in the low-noise regime
Machine Learning: Science and Technology
Bayesian methods
uncertainty quantification
surrogate models
misspecification
title Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_full Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_fullStr Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_full_unstemmed Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_short Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_sort parameter uncertainties for imperfect surrogate models in the low noise regime
topic Bayesian methods
uncertainty quantification
surrogate models
misspecification
url https://doi.org/10.1088/2632-2153/ad9fce
work_keys_str_mv AT thomasdswinburne parameteruncertaintiesforimperfectsurrogatemodelsinthelownoiseregime
AT dannyperez parameteruncertaintiesforimperfectsurrogatemodelsinthelownoiseregime