Prediction and design of thermostable proteins with a desired melting temperature

Abstract The stability of proteins at higher temperatures is crucial for their functionality, which is measured by their melting temperature (Tm). The Tm is the temperature at which 50% of the protein loses its native structure and activity. Existing methods for predicting Tm have two major limitati...

Full description

Saved in:
Bibliographic Details
Main Authors: Purva Tijare, Nishant Kumar, Gajendra P. S. Raghava
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-98667-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849326659192225792
author Purva Tijare
Nishant Kumar
Gajendra P. S. Raghava
author_facet Purva Tijare
Nishant Kumar
Gajendra P. S. Raghava
author_sort Purva Tijare
collection DOAJ
description Abstract The stability of proteins at higher temperatures is crucial for their functionality, which is measured by their melting temperature (Tm). The Tm is the temperature at which 50% of the protein loses its native structure and activity. Existing methods for predicting Tm have two major limitations: first, they are often trained on redundant proteins, and second, they do not allow users to design proteins with the desired Tm. To address these limitations, we developed a regression method for predicting the Tm value of proteins using 17,312 non-redundant proteins, where no two proteins are more than 40% similar. We used 80% of the data for training and testing and the remaining 20% for validation. Initially, we developed a machine learning model using standard features from protein sequences. Our best model, developed using Shannon entropy for all residues, achieved the highest Pearson correlation of 0.80 with an R2 of 0.63 between the predicted and actual Tm of proteins on the validation dataset. Next, we fine-tuned large language models (e.g., ProtBert, ProtGPT2, ProtT5) on our training dataset and generated embeddings. These embeddings have been used to develop machine learning models. Our best model, developed using ProtBert embeddings, achieved a maximum correlation of 0.89 with an R2 of 0.80 on the validation dataset. Finally, we developed an ensemble method that combines standard protein features and embeddings. One of the aims of the study is to assist the scientific community in the design of targeted melting temperatures. Our standalone software can be used to screen thermostable proteins at the genome level. We demonstrated the application of PPTstab in identifying thermostable proteins in different organisms. We created a user-friendly web server, and a Python package for predicting and designing thermostable proteins is available at https://webs.iiitd.edu.in/raghava/pptstab , https://github.com/raghavagps/pptstab .
format Article
id doaj-art-fd9ea0417f0f4d70ab3ff6f8924d1a07
institution Kabale University
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-fd9ea0417f0f4d70ab3ff6f8924d1a072025-08-20T03:48:06ZengNature PortfolioScientific Reports2045-23222025-05-0115111310.1038/s41598-025-98667-9Prediction and design of thermostable proteins with a desired melting temperaturePurva Tijare0Nishant Kumar1Gajendra P. S. Raghava2Department of Computational Biology, Indraprastha Institute of Information TechnologyDepartment of Computational Biology, Indraprastha Institute of Information TechnologyDepartment of Computational Biology, Indraprastha Institute of Information TechnologyAbstract The stability of proteins at higher temperatures is crucial for their functionality, which is measured by their melting temperature (Tm). The Tm is the temperature at which 50% of the protein loses its native structure and activity. Existing methods for predicting Tm have two major limitations: first, they are often trained on redundant proteins, and second, they do not allow users to design proteins with the desired Tm. To address these limitations, we developed a regression method for predicting the Tm value of proteins using 17,312 non-redundant proteins, where no two proteins are more than 40% similar. We used 80% of the data for training and testing and the remaining 20% for validation. Initially, we developed a machine learning model using standard features from protein sequences. Our best model, developed using Shannon entropy for all residues, achieved the highest Pearson correlation of 0.80 with an R2 of 0.63 between the predicted and actual Tm of proteins on the validation dataset. Next, we fine-tuned large language models (e.g., ProtBert, ProtGPT2, ProtT5) on our training dataset and generated embeddings. These embeddings have been used to develop machine learning models. Our best model, developed using ProtBert embeddings, achieved a maximum correlation of 0.89 with an R2 of 0.80 on the validation dataset. Finally, we developed an ensemble method that combines standard protein features and embeddings. One of the aims of the study is to assist the scientific community in the design of targeted melting temperatures. Our standalone software can be used to screen thermostable proteins at the genome level. We demonstrated the application of PPTstab in identifying thermostable proteins in different organisms. We created a user-friendly web server, and a Python package for predicting and designing thermostable proteins is available at https://webs.iiitd.edu.in/raghava/pptstab , https://github.com/raghavagps/pptstab .https://doi.org/10.1038/s41598-025-98667-9Melting temperaturePredictionMachine learningEmbeddingsProtein language modelsThermostable proteins
spellingShingle Purva Tijare
Nishant Kumar
Gajendra P. S. Raghava
Prediction and design of thermostable proteins with a desired melting temperature
Scientific Reports
Melting temperature
Prediction
Machine learning
Embeddings
Protein language models
Thermostable proteins
title Prediction and design of thermostable proteins with a desired melting temperature
title_full Prediction and design of thermostable proteins with a desired melting temperature
title_fullStr Prediction and design of thermostable proteins with a desired melting temperature
title_full_unstemmed Prediction and design of thermostable proteins with a desired melting temperature
title_short Prediction and design of thermostable proteins with a desired melting temperature
title_sort prediction and design of thermostable proteins with a desired melting temperature
topic Melting temperature
Prediction
Machine learning
Embeddings
Protein language models
Thermostable proteins
url https://doi.org/10.1038/s41598-025-98667-9
work_keys_str_mv AT purvatijare predictionanddesignofthermostableproteinswithadesiredmeltingtemperature
AT nishantkumar predictionanddesignofthermostableproteinswithadesiredmeltingtemperature
AT gajendrapsraghava predictionanddesignofthermostableproteinswithadesiredmeltingtemperature