Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents

Abstract A significant challenge in computational chemistry is developing approximations that accelerate ab initio methods while preserving accuracy. Machine learning interatomic potentials (MLIPs) have emerged as a promising solution for constructing atomistic potentials that can be transferred acr...

Full description

Saved in:
Bibliographic Details
Main Authors: Nicholas J. Williams, Lara Kabalan, Ljiljana Stojanovic, Viktor Zólyomi, Edward O. Pyzer-Knapp
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-024-04361-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559866654588928
author Nicholas J. Williams
Lara Kabalan
Ljiljana Stojanovic
Viktor Zólyomi
Edward O. Pyzer-Knapp
author_facet Nicholas J. Williams
Lara Kabalan
Ljiljana Stojanovic
Viktor Zólyomi
Edward O. Pyzer-Knapp
author_sort Nicholas J. Williams
collection DOAJ
description Abstract A significant challenge in computational chemistry is developing approximations that accelerate ab initio methods while preserving accuracy. Machine learning interatomic potentials (MLIPs) have emerged as a promising solution for constructing atomistic potentials that can be transferred across different molecular and crystalline systems. Most MLIPs are trained only on energies and forces in vacuum, while an improved description of the potential energy surface could be achieved by including the curvature of the potential energy surface. We present Hessian QM9, the first database of equilibrium configurations and numerical Hessian matrices, consisting of 41,645 molecules from the QM9 dataset at the ωB97x/6-31G* level. Molecular Hessians were calculated in vacuum, as well as water, tetrahydrofuran, and toluene using an implicit solvation model. To demonstrate the utility of this dataset, we show that incorporating second derivatives of the potential energy surface into the loss function of a MLIP significantly improves the prediction of vibrational frequencies in all solvent environments, thus making this dataset extremely useful for studying organic molecules in realistic solvent environments for experimental characterization.
format Article
id doaj-art-1fc0b00376e848d99915d9067b254ff7
institution Kabale University
issn 2052-4463
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-1fc0b00376e848d99915d9067b254ff72025-01-05T12:08:20ZengNature PortfolioScientific Data2052-44632025-01-011211610.1038/s41597-024-04361-2Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solventsNicholas J. Williams0Lara Kabalan1Ljiljana Stojanovic2Viktor Zólyomi3Edward O. Pyzer-Knapp4IBM ResearchHartree Centre, Science and Technology Facilities Council, Daresbury LaboratoryHartree Centre, Science and Technology Facilities Council, Daresbury LaboratoryHartree Centre, Science and Technology Facilities Council, Daresbury LaboratoryIBM ResearchAbstract A significant challenge in computational chemistry is developing approximations that accelerate ab initio methods while preserving accuracy. Machine learning interatomic potentials (MLIPs) have emerged as a promising solution for constructing atomistic potentials that can be transferred across different molecular and crystalline systems. Most MLIPs are trained only on energies and forces in vacuum, while an improved description of the potential energy surface could be achieved by including the curvature of the potential energy surface. We present Hessian QM9, the first database of equilibrium configurations and numerical Hessian matrices, consisting of 41,645 molecules from the QM9 dataset at the ωB97x/6-31G* level. Molecular Hessians were calculated in vacuum, as well as water, tetrahydrofuran, and toluene using an implicit solvation model. To demonstrate the utility of this dataset, we show that incorporating second derivatives of the potential energy surface into the loss function of a MLIP significantly improves the prediction of vibrational frequencies in all solvent environments, thus making this dataset extremely useful for studying organic molecules in realistic solvent environments for experimental characterization.https://doi.org/10.1038/s41597-024-04361-2
spellingShingle Nicholas J. Williams
Lara Kabalan
Ljiljana Stojanovic
Viktor Zólyomi
Edward O. Pyzer-Knapp
Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents
Scientific Data
title Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents
title_full Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents
title_fullStr Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents
title_full_unstemmed Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents
title_short Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents
title_sort hessian qm9 a quantum chemistry database of molecular hessians in implicit solvents
url https://doi.org/10.1038/s41597-024-04361-2
work_keys_str_mv AT nicholasjwilliams hessianqm9aquantumchemistrydatabaseofmolecularhessiansinimplicitsolvents
AT larakabalan hessianqm9aquantumchemistrydatabaseofmolecularhessiansinimplicitsolvents
AT ljiljanastojanovic hessianqm9aquantumchemistrydatabaseofmolecularhessiansinimplicitsolvents
AT viktorzolyomi hessianqm9aquantumchemistrydatabaseofmolecularhessiansinimplicitsolvents
AT edwardopyzerknapp hessianqm9aquantumchemistrydatabaseofmolecularhessiansinimplicitsolvents