Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt

In-silico prediction of protein biophysical traits is often hindered by the limited availability of experimental data and their heterogeneity. Training on limited data can lead to overfitting and poor generalizability to sequences distant from those in the training set. Additionally, inadequate use...

Full description

Saved in:
Bibliographic Details
Main Authors: Aubin Ramon, Mingyang Ni, Olga Predeina, Rebecca Gaffey, Patrick Kunz, Shimobi Onuoha, Pietro Sormanni
Format: Article
Language:English
Published: Taylor & Francis Group 2025-12-01
Series:mAbs
Subjects:
Online Access:https://www.tandfonline.com/doi/10.1080/19420862.2024.2442750
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841554605558726656
author Aubin Ramon
Mingyang Ni
Olga Predeina
Rebecca Gaffey
Patrick Kunz
Shimobi Onuoha
Pietro Sormanni
author_facet Aubin Ramon
Mingyang Ni
Olga Predeina
Rebecca Gaffey
Patrick Kunz
Shimobi Onuoha
Pietro Sormanni
author_sort Aubin Ramon
collection DOAJ
description In-silico prediction of protein biophysical traits is often hindered by the limited availability of experimental data and their heterogeneity. Training on limited data can lead to overfitting and poor generalizability to sequences distant from those in the training set. Additionally, inadequate use of scarce and disparate data can introduce biases during evaluation, leading to unreliable model performances being reported. Here, we present a comprehensive study exploring various approaches for protein fitness prediction from limited data, leveraging pre-trained embeddings, repeated stratified nested cross-validation, and ensemble learning to ensure an unbiased assessment of the performances. We applied our framework to introduce NanoMelt, a predictor of nanobody thermostability trained with a dataset of 640 measurements of apparent melting temperature, obtained by integrating data from the literature with 129 new measurements from this study. We find that an ensemble model stacking multiple regression using diverse sequence embeddings achieves state-of-the-art accuracy in predicting nanobody thermostability. We further demonstrate NanoMelt’s potential to streamline nanobody development by guiding the selection of highly stable nanobodies. We make the curated dataset of nanobody thermostability freely available and NanoMelt accessible as a downloadable software and webserver.
format Article
id doaj-art-0c75e6a1f0aa403d84df18683f8a0d2d
institution Kabale University
issn 1942-0862
1942-0870
language English
publishDate 2025-12-01
publisher Taylor & Francis Group
record_format Article
series mAbs
spelling doaj-art-0c75e6a1f0aa403d84df18683f8a0d2d2025-01-08T12:45:19ZengTaylor & Francis GroupmAbs1942-08621942-08702025-12-0117110.1080/19420862.2024.2442750Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMeltAubin Ramon0Mingyang Ni1Olga Predeina2Rebecca Gaffey3Patrick Kunz4Shimobi Onuoha5Pietro Sormanni6Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UKCentre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UKCentre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UKCentre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UKDivision of Functional Genome Analysis, German Cancer Research Center (DKFZ), Heidelberg, GermanyChimeris UK, The Works, Cambridge, UKCentre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UKIn-silico prediction of protein biophysical traits is often hindered by the limited availability of experimental data and their heterogeneity. Training on limited data can lead to overfitting and poor generalizability to sequences distant from those in the training set. Additionally, inadequate use of scarce and disparate data can introduce biases during evaluation, leading to unreliable model performances being reported. Here, we present a comprehensive study exploring various approaches for protein fitness prediction from limited data, leveraging pre-trained embeddings, repeated stratified nested cross-validation, and ensemble learning to ensure an unbiased assessment of the performances. We applied our framework to introduce NanoMelt, a predictor of nanobody thermostability trained with a dataset of 640 measurements of apparent melting temperature, obtained by integrating data from the literature with 129 new measurements from this study. We find that an ensemble model stacking multiple regression using diverse sequence embeddings achieves state-of-the-art accuracy in predicting nanobody thermostability. We further demonstrate NanoMelt’s potential to streamline nanobody development by guiding the selection of highly stable nanobodies. We make the curated dataset of nanobody thermostability freely available and NanoMelt accessible as a downloadable software and webserver.https://www.tandfonline.com/doi/10.1080/19420862.2024.2442750Protein fitnessthermostabilityantibody engineeringantibody designnanobodymachine learning
spellingShingle Aubin Ramon
Mingyang Ni
Olga Predeina
Rebecca Gaffey
Patrick Kunz
Shimobi Onuoha
Pietro Sormanni
Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt
mAbs
Protein fitness
thermostability
antibody engineering
antibody design
nanobody
machine learning
title Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt
title_full Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt
title_fullStr Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt
title_full_unstemmed Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt
title_short Prediction of protein biophysical traits from limited data: a case study on nanobody thermostability through NanoMelt
title_sort prediction of protein biophysical traits from limited data a case study on nanobody thermostability through nanomelt
topic Protein fitness
thermostability
antibody engineering
antibody design
nanobody
machine learning
url https://www.tandfonline.com/doi/10.1080/19420862.2024.2442750
work_keys_str_mv AT aubinramon predictionofproteinbiophysicaltraitsfromlimiteddataacasestudyonnanobodythermostabilitythroughnanomelt
AT mingyangni predictionofproteinbiophysicaltraitsfromlimiteddataacasestudyonnanobodythermostabilitythroughnanomelt
AT olgapredeina predictionofproteinbiophysicaltraitsfromlimiteddataacasestudyonnanobodythermostabilitythroughnanomelt
AT rebeccagaffey predictionofproteinbiophysicaltraitsfromlimiteddataacasestudyonnanobodythermostabilitythroughnanomelt
AT patrickkunz predictionofproteinbiophysicaltraitsfromlimiteddataacasestudyonnanobodythermostabilitythroughnanomelt
AT shimobionuoha predictionofproteinbiophysicaltraitsfromlimiteddataacasestudyonnanobodythermostabilitythroughnanomelt
AT pietrosormanni predictionofproteinbiophysicaltraitsfromlimiteddataacasestudyonnanobodythermostabilitythroughnanomelt