ParaAntiProt provides paratope prediction using antibody and protein language models

Abstract Efficiently predicting the paratope holds immense potential for enhancing antibody design, treating cancers and other serious diseases, and advancing personalized medicine. Although traditional methods are highly accurate, they are often time-consuming, labor-intensive, and reliant on 3D st...

Full description

Saved in:
Bibliographic Details
Main Authors: Mahmood Kalemati, Alireza Noroozi, Aref Shahbakhsh, Somayyeh Koohi
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-80940-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846147906394914816
author Mahmood Kalemati
Alireza Noroozi
Aref Shahbakhsh
Somayyeh Koohi
author_facet Mahmood Kalemati
Alireza Noroozi
Aref Shahbakhsh
Somayyeh Koohi
author_sort Mahmood Kalemati
collection DOAJ
description Abstract Efficiently predicting the paratope holds immense potential for enhancing antibody design, treating cancers and other serious diseases, and advancing personalized medicine. Although traditional methods are highly accurate, they are often time-consuming, labor-intensive, and reliant on 3D structures, restricting their broader use. On the other hand, machine learning-based methods, besides relying on structural data, entail descriptor computation, consideration of diverse physicochemical properties, and feature engineering. Here, we develop a deep learning-assisted prediction method for paratope identification, relying solely on amino acid sequences and being antigen-agnostic. Built on the ProtTrans architecture, and utilizing pre-trained protein and antibody language models, we extract efficient embeddings for predicting paratope. By incorporating positional encoding for Complementarity Determining Regions, our model gains a deeper structural understanding, achieving remarkable performance with a 0.904 ROC AUC, 0.701 F1-score, and 0.585 MCC on benchmark datasets. In addition to yielding accurate antibody paratope predictions, our method exhibits strong performance in predicting nanobody paratope, achieving a ROC AUC of 0.912 and a PR AUC of 0.665 on the nanobody dataset. Notably, our approach outperforms structure-based prediction methods, boasting a PR AUC of 0.731. Various conducted ablation studies, which elaborate on the impact of each part of the model on the prediction task, show that the improvement in prediction performance by applying CDR positional encoding together with CNNs depends on the specific protein and antibody language models used. These results highlight the potential of our method to advance disease understanding and aid in the discovery of new diagnostics and antibody therapies.
format Article
id doaj-art-a65f9b14e0204a7d81b44e5c17d22e4d
institution Kabale University
issn 2045-2322
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-a65f9b14e0204a7d81b44e5c17d22e4d2024-12-01T12:18:20ZengNature PortfolioScientific Reports2045-23222024-11-0114111510.1038/s41598-024-80940-yParaAntiProt provides paratope prediction using antibody and protein language modelsMahmood Kalemati0Alireza Noroozi1Aref Shahbakhsh2Somayyeh Koohi3Department of Computer Engineering, Sharif University of TechnologyDepartment of Computer Engineering, Sharif University of TechnologyDepartment of Computer Engineering, Sharif University of TechnologyDepartment of Computer Engineering, Sharif University of TechnologyAbstract Efficiently predicting the paratope holds immense potential for enhancing antibody design, treating cancers and other serious diseases, and advancing personalized medicine. Although traditional methods are highly accurate, they are often time-consuming, labor-intensive, and reliant on 3D structures, restricting their broader use. On the other hand, machine learning-based methods, besides relying on structural data, entail descriptor computation, consideration of diverse physicochemical properties, and feature engineering. Here, we develop a deep learning-assisted prediction method for paratope identification, relying solely on amino acid sequences and being antigen-agnostic. Built on the ProtTrans architecture, and utilizing pre-trained protein and antibody language models, we extract efficient embeddings for predicting paratope. By incorporating positional encoding for Complementarity Determining Regions, our model gains a deeper structural understanding, achieving remarkable performance with a 0.904 ROC AUC, 0.701 F1-score, and 0.585 MCC on benchmark datasets. In addition to yielding accurate antibody paratope predictions, our method exhibits strong performance in predicting nanobody paratope, achieving a ROC AUC of 0.912 and a PR AUC of 0.665 on the nanobody dataset. Notably, our approach outperforms structure-based prediction methods, boasting a PR AUC of 0.731. Various conducted ablation studies, which elaborate on the impact of each part of the model on the prediction task, show that the improvement in prediction performance by applying CDR positional encoding together with CNNs depends on the specific protein and antibody language models used. These results highlight the potential of our method to advance disease understanding and aid in the discovery of new diagnostics and antibody therapies.https://doi.org/10.1038/s41598-024-80940-yParatope predictionAntibody Language modelsProtein Language modelsComplementarity determining regionsDeep learning
spellingShingle Mahmood Kalemati
Alireza Noroozi
Aref Shahbakhsh
Somayyeh Koohi
ParaAntiProt provides paratope prediction using antibody and protein language models
Scientific Reports
Paratope prediction
Antibody Language models
Protein Language models
Complementarity determining regions
Deep learning
title ParaAntiProt provides paratope prediction using antibody and protein language models
title_full ParaAntiProt provides paratope prediction using antibody and protein language models
title_fullStr ParaAntiProt provides paratope prediction using antibody and protein language models
title_full_unstemmed ParaAntiProt provides paratope prediction using antibody and protein language models
title_short ParaAntiProt provides paratope prediction using antibody and protein language models
title_sort paraantiprot provides paratope prediction using antibody and protein language models
topic Paratope prediction
Antibody Language models
Protein Language models
Complementarity determining regions
Deep learning
url https://doi.org/10.1038/s41598-024-80940-y
work_keys_str_mv AT mahmoodkalemati paraantiprotprovidesparatopepredictionusingantibodyandproteinlanguagemodels
AT alirezanoroozi paraantiprotprovidesparatopepredictionusingantibodyandproteinlanguagemodels
AT arefshahbakhsh paraantiprotprovidesparatopepredictionusingantibodyandproteinlanguagemodels
AT somayyehkoohi paraantiprotprovidesparatopepredictionusingantibodyandproteinlanguagemodels