Probing out-of-distribution generalization in machine learning for materials

Abstract Scientific machine learning (ML) aims to develop generalizable models, yet assessments of generalizability often rely on heuristics. Here, we demonstrate in the materials science setting that heuristic evaluations lead to biased conclusions of ML generalizability and benefits of neural scal...

Full description

Saved in:
Bibliographic Details
Main Authors: Kangming Li, Andre Niyongabo Rubungo, Xiangyun Lei, Daniel Persaud, Kamal Choudhary, Brian DeCost, Adji Bousso Dieng, Jason Hattrick-Simpers
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Communications Materials
Online Access:https://doi.org/10.1038/s43246-024-00731-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544470741385216
author Kangming Li
Andre Niyongabo Rubungo
Xiangyun Lei
Daniel Persaud
Kamal Choudhary
Brian DeCost
Adji Bousso Dieng
Jason Hattrick-Simpers
author_facet Kangming Li
Andre Niyongabo Rubungo
Xiangyun Lei
Daniel Persaud
Kamal Choudhary
Brian DeCost
Adji Bousso Dieng
Jason Hattrick-Simpers
author_sort Kangming Li
collection DOAJ
description Abstract Scientific machine learning (ML) aims to develop generalizable models, yet assessments of generalizability often rely on heuristics. Here, we demonstrate in the materials science setting that heuristic evaluations lead to biased conclusions of ML generalizability and benefits of neural scaling, through evaluations of out-of-distribution (OOD) tasks involving unseen chemistry or structural symmetries. Surprisingly, many tasks demonstrate good performance across models, including boosted trees. However, analysis of the materials representation space shows that most test data reside within regions well-covered by training data, while poorly-performing tasks involve data outside the training domain. For these challenging tasks, increasing training size or time yields limited or adverse effects, contrary to traditional neural scaling trends. Our findings highlight that most OOD tests reflect interpolation, not true extrapolation, leading to overestimations of generalizability and scaling benefits. This emphasizes the need for rigorously challenging OOD benchmarks.
format Article
id doaj-art-ddb837a99cba47ecb9f059773865b66c
institution Kabale University
issn 2662-4443
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Communications Materials
spelling doaj-art-ddb837a99cba47ecb9f059773865b66c2025-01-12T12:32:47ZengNature PortfolioCommunications Materials2662-44432025-01-016111010.1038/s43246-024-00731-wProbing out-of-distribution generalization in machine learning for materialsKangming Li0Andre Niyongabo Rubungo1Xiangyun Lei2Daniel Persaud3Kamal Choudhary4Brian DeCost5Adji Bousso Dieng6Jason Hattrick-Simpers7Department of Materials Science and Engineering, University of TorontoVertaix, Department of Computer Science, Princeton UniversityToyota Research InstituteDepartment of Materials Science and Engineering, University of TorontoMaterial Measurement Laboratory, National Institute of Standards and TechnologyMaterial Measurement Laboratory, National Institute of Standards and TechnologyVertaix, Department of Computer Science, Princeton UniversityDepartment of Materials Science and Engineering, University of TorontoAbstract Scientific machine learning (ML) aims to develop generalizable models, yet assessments of generalizability often rely on heuristics. Here, we demonstrate in the materials science setting that heuristic evaluations lead to biased conclusions of ML generalizability and benefits of neural scaling, through evaluations of out-of-distribution (OOD) tasks involving unseen chemistry or structural symmetries. Surprisingly, many tasks demonstrate good performance across models, including boosted trees. However, analysis of the materials representation space shows that most test data reside within regions well-covered by training data, while poorly-performing tasks involve data outside the training domain. For these challenging tasks, increasing training size or time yields limited or adverse effects, contrary to traditional neural scaling trends. Our findings highlight that most OOD tests reflect interpolation, not true extrapolation, leading to overestimations of generalizability and scaling benefits. This emphasizes the need for rigorously challenging OOD benchmarks.https://doi.org/10.1038/s43246-024-00731-w
spellingShingle Kangming Li
Andre Niyongabo Rubungo
Xiangyun Lei
Daniel Persaud
Kamal Choudhary
Brian DeCost
Adji Bousso Dieng
Jason Hattrick-Simpers
Probing out-of-distribution generalization in machine learning for materials
Communications Materials
title Probing out-of-distribution generalization in machine learning for materials
title_full Probing out-of-distribution generalization in machine learning for materials
title_fullStr Probing out-of-distribution generalization in machine learning for materials
title_full_unstemmed Probing out-of-distribution generalization in machine learning for materials
title_short Probing out-of-distribution generalization in machine learning for materials
title_sort probing out of distribution generalization in machine learning for materials
url https://doi.org/10.1038/s43246-024-00731-w
work_keys_str_mv AT kangmingli probingoutofdistributiongeneralizationinmachinelearningformaterials
AT andreniyongaborubungo probingoutofdistributiongeneralizationinmachinelearningformaterials
AT xiangyunlei probingoutofdistributiongeneralizationinmachinelearningformaterials
AT danielpersaud probingoutofdistributiongeneralizationinmachinelearningformaterials
AT kamalchoudhary probingoutofdistributiongeneralizationinmachinelearningformaterials
AT briandecost probingoutofdistributiongeneralizationinmachinelearningformaterials
AT adjiboussodieng probingoutofdistributiongeneralizationinmachinelearningformaterials
AT jasonhattricksimpers probingoutofdistributiongeneralizationinmachinelearningformaterials