Parameter uncertainties for imperfect surrogate models in the low-noise regime
Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IOP Publishing
2025-01-01
|
Series: | Machine Learning: Science and Technology |
Subjects: | |
Online Access: | https://doi.org/10.1088/2632-2153/ad9fce |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841543725677805568 |
---|---|
author | Thomas D Swinburne Danny Perez |
author_facet | Thomas D Swinburne Danny Perez |
author_sort | Thomas D Swinburne |
collection | DOAJ |
description | Bayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As misspecification is the main source of uncertainty for surrogate models of low-noise calculations, such as those arising in atomistic simulation, predictive uncertainties are systematically underestimated. We analyze the true generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show that posterior parameter distributions must cover every training point to avoid a divergence in the generalization error and design a compatible ansatz which incurs minimal overhead for linear models. The approach is demonstrated on model problems before application to thousand-dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors in terms of parameter uncertainties, allowing this important source of uncertainty to be incorporated in multi-scale computational workflows. |
format | Article |
id | doaj-art-d6a6a95d91ba4d708e3d0818f03922a3 |
institution | Kabale University |
issn | 2632-2153 |
language | English |
publishDate | 2025-01-01 |
publisher | IOP Publishing |
record_format | Article |
series | Machine Learning: Science and Technology |
spelling | doaj-art-d6a6a95d91ba4d708e3d0818f03922a32025-01-13T06:35:42ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101500810.1088/2632-2153/ad9fceParameter uncertainties for imperfect surrogate models in the low-noise regimeThomas D Swinburne0https://orcid.org/0000-0002-3255-4257Danny Perez1https://orcid.org/0000-0003-3028-5249Aix-Marseille Université , CNRS, CINaM UMR 7325, Campus de Luminy, 13288 Marseille, FranceTheoretical Division T-1, Los Alamos National Laboratory , Los Alamos, NM, United States of AmericaBayesian regression determines model parameters by minimizing the expected loss, an upper bound to the true generalization error. However, this loss ignores model form error, or misspecification, meaning parameter uncertainties are significantly underestimated and vanish in the large data limit. As misspecification is the main source of uncertainty for surrogate models of low-noise calculations, such as those arising in atomistic simulation, predictive uncertainties are systematically underestimated. We analyze the true generalization error of misspecified, near-deterministic surrogate models, a regime of broad relevance in science and engineering. We show that posterior parameter distributions must cover every training point to avoid a divergence in the generalization error and design a compatible ansatz which incurs minimal overhead for linear models. The approach is demonstrated on model problems before application to thousand-dimensional datasets in atomistic machine learning. Our efficient misspecification-aware scheme gives accurate prediction and bounding of test errors in terms of parameter uncertainties, allowing this important source of uncertainty to be incorporated in multi-scale computational workflows.https://doi.org/10.1088/2632-2153/ad9fceBayesian methodsuncertainty quantificationsurrogate modelsmisspecification |
spellingShingle | Thomas D Swinburne Danny Perez Parameter uncertainties for imperfect surrogate models in the low-noise regime Machine Learning: Science and Technology Bayesian methods uncertainty quantification surrogate models misspecification |
title | Parameter uncertainties for imperfect surrogate models in the low-noise regime |
title_full | Parameter uncertainties for imperfect surrogate models in the low-noise regime |
title_fullStr | Parameter uncertainties for imperfect surrogate models in the low-noise regime |
title_full_unstemmed | Parameter uncertainties for imperfect surrogate models in the low-noise regime |
title_short | Parameter uncertainties for imperfect surrogate models in the low-noise regime |
title_sort | parameter uncertainties for imperfect surrogate models in the low noise regime |
topic | Bayesian methods uncertainty quantification surrogate models misspecification |
url | https://doi.org/10.1088/2632-2153/ad9fce |
work_keys_str_mv | AT thomasdswinburne parameteruncertaintiesforimperfectsurrogatemodelsinthelownoiseregime AT dannyperez parameteruncertaintiesforimperfectsurrogatemodelsinthelownoiseregime |