mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics

Abstract Recent advancements in protein structure determination are revolutionizing our understanding of proteins. Still, a significant gap remains in the availability of comprehensive datasets that focus on the dynamics of proteins, which are crucial for understanding protein function, folding, and...

Full description

Saved in:
Bibliographic Details
Main Authors: Antonio Mirarchi, Toni Giorgino, Gianni De Fabritiis
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-024-04140-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846148112796614656
author Antonio Mirarchi
Toni Giorgino
Gianni De Fabritiis
author_facet Antonio Mirarchi
Toni Giorgino
Gianni De Fabritiis
author_sort Antonio Mirarchi
collection DOAJ
description Abstract Recent advancements in protein structure determination are revolutionizing our understanding of proteins. Still, a significant gap remains in the availability of comprehensive datasets that focus on the dynamics of proteins, which are crucial for understanding protein function, folding, and interactions. To address this critical gap, we introduce mdCATH, a dataset generated through an extensive set of all-atom molecular dynamics simulations of a diverse and representative collection of protein domains. This dataset comprises all-atom systems for 5,398 domains, modeled with a state-of-the-art classical force field, and simulated in five replicates each at five temperatures from 320 K to 450 K. The mdCATH dataset records coordinates and forces every 1 ns, for over 62 ms of accumulated simulation time, effectively capturing the dynamics of the various classes of domains and providing a unique resource for proteome-wide statistical analyses of protein unfolding thermodynamics and kinetics. We outline the dataset structure and showcase its potential through four easily reproducible case studies, highlighting its capabilities in advancing protein science.
format Article
id doaj-art-7f9e58c97a4b45afb504babdd4224d02
institution Kabale University
issn 2052-4463
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-7f9e58c97a4b45afb504babdd4224d022024-12-01T12:09:02ZengNature PortfolioScientific Data2052-44632024-11-0111111110.1038/s41597-024-04140-zmdCATH: A Large-Scale MD Dataset for Data-Driven Computational BiophysicsAntonio Mirarchi0Toni Giorgino1Gianni De Fabritiis2Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB)Biophysics Institute, National Research Council (CNR-IBF)Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB)Abstract Recent advancements in protein structure determination are revolutionizing our understanding of proteins. Still, a significant gap remains in the availability of comprehensive datasets that focus on the dynamics of proteins, which are crucial for understanding protein function, folding, and interactions. To address this critical gap, we introduce mdCATH, a dataset generated through an extensive set of all-atom molecular dynamics simulations of a diverse and representative collection of protein domains. This dataset comprises all-atom systems for 5,398 domains, modeled with a state-of-the-art classical force field, and simulated in five replicates each at five temperatures from 320 K to 450 K. The mdCATH dataset records coordinates and forces every 1 ns, for over 62 ms of accumulated simulation time, effectively capturing the dynamics of the various classes of domains and providing a unique resource for proteome-wide statistical analyses of protein unfolding thermodynamics and kinetics. We outline the dataset structure and showcase its potential through four easily reproducible case studies, highlighting its capabilities in advancing protein science.https://doi.org/10.1038/s41597-024-04140-z
spellingShingle Antonio Mirarchi
Toni Giorgino
Gianni De Fabritiis
mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
Scientific Data
title mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
title_full mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
title_fullStr mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
title_full_unstemmed mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
title_short mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
title_sort mdcath a large scale md dataset for data driven computational biophysics
url https://doi.org/10.1038/s41597-024-04140-z
work_keys_str_mv AT antoniomirarchi mdcathalargescalemddatasetfordatadrivencomputationalbiophysics
AT tonigiorgino mdcathalargescalemddatasetfordatadrivencomputationalbiophysics
AT giannidefabritiis mdcathalargescalemddatasetfordatadrivencomputationalbiophysics