Crystal structure generation with autoregressive large language modeling

Abstract The generation of plausible crystal structures is often the first step in predicting the structure and properties of a material from its chemical composition. However, most current methods for crystal structure prediction are computationally expensive, slowing the pace of innovation. Seedin...

Full description

Saved in:
Bibliographic Details
Main Authors: Luis M. Antunes, Keith T. Butler, Ricardo Grau-Crespo
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-54639-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846137020332638208
author Luis M. Antunes
Keith T. Butler
Ricardo Grau-Crespo
author_facet Luis M. Antunes
Keith T. Butler
Ricardo Grau-Crespo
author_sort Luis M. Antunes
collection DOAJ
description Abstract The generation of plausible crystal structures is often the first step in predicting the structure and properties of a material from its chemical composition. However, most current methods for crystal structure prediction are computationally expensive, slowing the pace of innovation. Seeding structure prediction algorithms with quality generated candidates can overcome a major bottleneck. Here, we introduce CrystaLLM, a methodology for the versatile generation of crystal structures, based on the autoregressive large language modeling (LLM) of the Crystallographic Information File (CIF) format. Trained on millions of CIF files, CrystaLLM focuses on modeling crystal structures through text. CrystaLLM can produce plausible crystal structures for a wide range of inorganic compounds unseen in training, as demonstrated by ab initio simulations. Our approach challenges conventional representations of crystals, and demonstrates the potential of LLMs for learning effective models of crystal chemistry, which will lead to accelerated discovery and innovation in materials science.
format Article
id doaj-art-de7409d068ed45c5acd8350760693f9f
institution Kabale University
issn 2041-1723
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-de7409d068ed45c5acd8350760693f9f2024-12-08T12:35:37ZengNature PortfolioNature Communications2041-17232024-12-0115111610.1038/s41467-024-54639-7Crystal structure generation with autoregressive large language modelingLuis M. Antunes0Keith T. Butler1Ricardo Grau-Crespo2Department of Chemistry, University of ReadingDepartment of Chemistry, University College LondonDepartment of Chemistry, University of ReadingAbstract The generation of plausible crystal structures is often the first step in predicting the structure and properties of a material from its chemical composition. However, most current methods for crystal structure prediction are computationally expensive, slowing the pace of innovation. Seeding structure prediction algorithms with quality generated candidates can overcome a major bottleneck. Here, we introduce CrystaLLM, a methodology for the versatile generation of crystal structures, based on the autoregressive large language modeling (LLM) of the Crystallographic Information File (CIF) format. Trained on millions of CIF files, CrystaLLM focuses on modeling crystal structures through text. CrystaLLM can produce plausible crystal structures for a wide range of inorganic compounds unseen in training, as demonstrated by ab initio simulations. Our approach challenges conventional representations of crystals, and demonstrates the potential of LLMs for learning effective models of crystal chemistry, which will lead to accelerated discovery and innovation in materials science.https://doi.org/10.1038/s41467-024-54639-7
spellingShingle Luis M. Antunes
Keith T. Butler
Ricardo Grau-Crespo
Crystal structure generation with autoregressive large language modeling
Nature Communications
title Crystal structure generation with autoregressive large language modeling
title_full Crystal structure generation with autoregressive large language modeling
title_fullStr Crystal structure generation with autoregressive large language modeling
title_full_unstemmed Crystal structure generation with autoregressive large language modeling
title_short Crystal structure generation with autoregressive large language modeling
title_sort crystal structure generation with autoregressive large language modeling
url https://doi.org/10.1038/s41467-024-54639-7
work_keys_str_mv AT luismantunes crystalstructuregenerationwithautoregressivelargelanguagemodeling
AT keithtbutler crystalstructuregenerationwithautoregressivelargelanguagemodeling
AT ricardograucrespo crystalstructuregenerationwithautoregressivelargelanguagemodeling