ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance

Abstract Predicting compound-protein interactions (CPIs) plays a crucial role in drug discovery. Traditional methods, based on the key-lock theory and rigid docking, often fail with novel compounds and proteins due to their inability to account for molecular flexibility and the high sparsity of CPI...

Full description

Saved in:
Bibliographic Details
Main Authors: Qichang Zhao, Haochen Zhao, Linyuan Guo, Kai Zheng, Yajie Li, Qiao Ling, Jing Tang, Yaohang Li, Jianxin Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-61745-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332182290530304
author Qichang Zhao
Haochen Zhao
Linyuan Guo
Kai Zheng
Yajie Li
Qiao Ling
Jing Tang
Yaohang Li
Jianxin Wang
author_facet Qichang Zhao
Haochen Zhao
Linyuan Guo
Kai Zheng
Yajie Li
Qiao Ling
Jing Tang
Yaohang Li
Jianxin Wang
author_sort Qichang Zhao
collection DOAJ
description Abstract Predicting compound-protein interactions (CPIs) plays a crucial role in drug discovery. Traditional methods, based on the key-lock theory and rigid docking, often fail with novel compounds and proteins due to their inability to account for molecular flexibility and the high sparsity of CPI data. Here, we introduce ColdstartCPI, a framework inspired by induced-fit theory, which leverages unsupervised pre-training features and a Transformer module to learn both compound and protein characteristics. ColdstartCPI treats proteins and compounds as flexible molecules during inference, aligning with biological insights. It outperforms state-of-the-art sequence-based models, particularly for unseen compounds and proteins, and shows strong generalization capability compared to structure-based methods in virtual screening. ColdstartCPI also excels in sparse and low-similarity data conditions, demonstrating its potential in data-limited settings. Our results are validated through literature search, molecular docking, and binding free energy calculations. Overall, ColdstartCPI offers a perspective on sequence-based drug design, presenting a promising tool for drug discovery.
format Article
id doaj-art-c20d61d33b484ad5b1c35259a4f5f3c6
institution Kabale University
issn 2041-1723
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-c20d61d33b484ad5b1c35259a4f5f3c62025-08-20T03:46:17ZengNature PortfolioNature Communications2041-17232025-07-0116112210.1038/s41467-025-61745-7ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performanceQichang Zhao0Haochen Zhao1Linyuan Guo2Kai Zheng3Yajie Li4Qiao Ling5Jing Tang6Yaohang Li7Jianxin Wang8School of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversityResearch Program in Systems Oncology, Faculty of Medicine, University of HelsinkiDepartment of Computer Science, Old Dominion UniversitySchool of Computer Science and Engineering, Central South UniversityAbstract Predicting compound-protein interactions (CPIs) plays a crucial role in drug discovery. Traditional methods, based on the key-lock theory and rigid docking, often fail with novel compounds and proteins due to their inability to account for molecular flexibility and the high sparsity of CPI data. Here, we introduce ColdstartCPI, a framework inspired by induced-fit theory, which leverages unsupervised pre-training features and a Transformer module to learn both compound and protein characteristics. ColdstartCPI treats proteins and compounds as flexible molecules during inference, aligning with biological insights. It outperforms state-of-the-art sequence-based models, particularly for unseen compounds and proteins, and shows strong generalization capability compared to structure-based methods in virtual screening. ColdstartCPI also excels in sparse and low-similarity data conditions, demonstrating its potential in data-limited settings. Our results are validated through literature search, molecular docking, and binding free energy calculations. Overall, ColdstartCPI offers a perspective on sequence-based drug design, presenting a promising tool for drug discovery.https://doi.org/10.1038/s41467-025-61745-7
spellingShingle Qichang Zhao
Haochen Zhao
Linyuan Guo
Kai Zheng
Yajie Li
Qiao Ling
Jing Tang
Yaohang Li
Jianxin Wang
ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance
Nature Communications
title ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance
title_full ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance
title_fullStr ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance
title_full_unstemmed ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance
title_short ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance
title_sort coldstartcpi induced fit theory guided dti predictive model with improved generalization performance
url https://doi.org/10.1038/s41467-025-61745-7
work_keys_str_mv AT qichangzhao coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance
AT haochenzhao coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance
AT linyuanguo coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance
AT kaizheng coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance
AT yajieli coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance
AT qiaoling coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance
AT jingtang coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance
AT yaohangli coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance
AT jianxinwang coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance