Parameter uncertainties for imperfect surrogate models in the low-noise regime

Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As...

Full description

Saved in:

Bibliographic Details
Main Authors:	Thomas D Swinburne, Danny Perez
Format:	Article
Language:	English
Published:	IOP Publishing 2025-01-01
Series:	Machine Learning: Science and Technology
Subjects:	Bayesian methods uncertainty quantification surrogate models misspecification
Online Access:	https://doi.org/10.1088/2632-2153/ad9fce
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841543725677805568
author	Thomas D Swinburne Danny Perez
author_facet	Thomas D Swinburne Danny Perez
author_sort	Thomas D Swinburne
collection	DOAJ
description	Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As misspecification is the main source of uncertainty for surrogate models of low-noise calculations, such as those arising in atomistic simulation, predictive uncertainties are systematically underestimated. We analyze the true generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show that posterior parameter distributions must cover every training point to avoid a divergence in the generalization error and design a compatible ansatz which incurs minimal overhead for linear models. The approach is demonstrated on model problems before application to thousand-dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors in terms of parameter uncertainties, allowing this important source of uncertainty to be incorporated in multi-scale computational workflows.
format	Article
id	doaj-art-d6a6a95d91ba4d708e3d0818f03922a3
institution	Kabale University
issn	2632-2153
language	English
publishDate	2025-01-01
publisher	IOP Publishing
record_format	Article
series	Machine Learning: Science and Technology
spelling	doaj-art-d6a6a95d91ba4d708e3d0818f03922a32025-01-13T06:35:42ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101500810.1088/2632-2153/ad9fceParameter uncertainties for imperfect surrogate models in the low-noise regimeThomas D Swinburne0https://orcid.org/0000-0002-3255-4257Danny Perez1https://orcid.org/0000-0003-3028-5249Aix-Marseille Université , CNRS, CINaM UMR 7325, Campus de Luminy, 13288 Marseille, FranceTheoretical Division T-1, Los Alamos National Laboratory , Los Alamos, NM, United States of AmericaBayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As misspecification is the main source of uncertainty for surrogate models of low-noise calculations, such as those arising in atomistic simulation, predictive uncertainties are systematically underestimated. We analyze the true generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show that posterior parameter distributions must cover every training point to avoid a divergence in the generalization error and design a compatible ansatz which incurs minimal overhead for linear models. The approach is demonstrated on model problems before application to thousand-dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors in terms of parameter uncertainties, allowing this important source of uncertainty to be incorporated in multi-scale computational workflows.https://doi.org/10.1088/2632-2153/ad9fceBayesian methodsuncertainty quantificationsurrogate modelsmisspecification
spellingShingle	Thomas D Swinburne Danny Perez Parameter uncertainties for imperfect surrogate models in the low-noise regime Machine Learning: Science and Technology Bayesian methods uncertainty quantification surrogate models misspecification
title	Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_full	Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_fullStr	Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_full_unstemmed	Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_short	Parameter uncertainties for imperfect surrogate models in the low-noise regime
title_sort	parameter uncertainties for imperfect surrogate models in the low noise regime
topic	Bayesian methods uncertainty quantification surrogate models misspecification
url	https://doi.org/10.1088/2632-2153/ad9fce
work_keys_str_mv	AT thomasdswinburne parameteruncertaintiesforimperfectsurrogatemodelsinthelownoiseregime AT dannyperez parameteruncertaintiesforimperfectsurrogatemodelsinthelownoiseregime

Parameter uncertainties for imperfect surrogate models in the low-noise regime

Similar Items