ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance
Abstract Predicting compound-protein interactions (CPIs) plays a crucial role in drug discovery. Traditional methods, based on the key-lock theory and rigid docking, often fail with novel compounds and proteins due to their inability to account for molecular flexibility and the high sparsity of CPI...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-61745-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849332182290530304 |
|---|---|
| author | Qichang Zhao Haochen Zhao Linyuan Guo Kai Zheng Yajie Li Qiao Ling Jing Tang Yaohang Li Jianxin Wang |
| author_facet | Qichang Zhao Haochen Zhao Linyuan Guo Kai Zheng Yajie Li Qiao Ling Jing Tang Yaohang Li Jianxin Wang |
| author_sort | Qichang Zhao |
| collection | DOAJ |
| description | Abstract Predicting compound-protein interactions (CPIs) plays a crucial role in drug discovery. Traditional methods, based on the key-lock theory and rigid docking, often fail with novel compounds and proteins due to their inability to account for molecular flexibility and the high sparsity of CPI data. Here, we introduce ColdstartCPI, a framework inspired by induced-fit theory, which leverages unsupervised pre-training features and a Transformer module to learn both compound and protein characteristics. ColdstartCPI treats proteins and compounds as flexible molecules during inference, aligning with biological insights. It outperforms state-of-the-art sequence-based models, particularly for unseen compounds and proteins, and shows strong generalization capability compared to structure-based methods in virtual screening. ColdstartCPI also excels in sparse and low-similarity data conditions, demonstrating its potential in data-limited settings. Our results are validated through literature search, molecular docking, and binding free energy calculations. Overall, ColdstartCPI offers a perspective on sequence-based drug design, presenting a promising tool for drug discovery. |
| format | Article |
| id | doaj-art-c20d61d33b484ad5b1c35259a4f5f3c6 |
| institution | Kabale University |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-c20d61d33b484ad5b1c35259a4f5f3c62025-08-20T03:46:17ZengNature PortfolioNature Communications2041-17232025-07-0116112210.1038/s41467-025-61745-7ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performanceQichang Zhao0Haochen Zhao1Linyuan Guo2Kai Zheng3Yajie Li4Qiao Ling5Jing Tang6Yaohang Li7Jianxin Wang8School of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversitySchool of Computer Science and Engineering, Central South UniversityResearch Program in Systems Oncology, Faculty of Medicine, University of HelsinkiDepartment of Computer Science, Old Dominion UniversitySchool of Computer Science and Engineering, Central South UniversityAbstract Predicting compound-protein interactions (CPIs) plays a crucial role in drug discovery. Traditional methods, based on the key-lock theory and rigid docking, often fail with novel compounds and proteins due to their inability to account for molecular flexibility and the high sparsity of CPI data. Here, we introduce ColdstartCPI, a framework inspired by induced-fit theory, which leverages unsupervised pre-training features and a Transformer module to learn both compound and protein characteristics. ColdstartCPI treats proteins and compounds as flexible molecules during inference, aligning with biological insights. It outperforms state-of-the-art sequence-based models, particularly for unseen compounds and proteins, and shows strong generalization capability compared to structure-based methods in virtual screening. ColdstartCPI also excels in sparse and low-similarity data conditions, demonstrating its potential in data-limited settings. Our results are validated through literature search, molecular docking, and binding free energy calculations. Overall, ColdstartCPI offers a perspective on sequence-based drug design, presenting a promising tool for drug discovery.https://doi.org/10.1038/s41467-025-61745-7 |
| spellingShingle | Qichang Zhao Haochen Zhao Linyuan Guo Kai Zheng Yajie Li Qiao Ling Jing Tang Yaohang Li Jianxin Wang ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance Nature Communications |
| title | ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance |
| title_full | ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance |
| title_fullStr | ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance |
| title_full_unstemmed | ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance |
| title_short | ColdstartCPI: Induced-fit theory-guided DTI predictive model with improved generalization performance |
| title_sort | coldstartcpi induced fit theory guided dti predictive model with improved generalization performance |
| url | https://doi.org/10.1038/s41467-025-61745-7 |
| work_keys_str_mv | AT qichangzhao coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance AT haochenzhao coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance AT linyuanguo coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance AT kaizheng coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance AT yajieli coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance AT qiaoling coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance AT jingtang coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance AT yaohangli coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance AT jianxinwang coldstartcpiinducedfittheoryguideddtipredictivemodelwithimprovedgeneralizationperformance |