Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation models

Abstract The discovery of CRISPR-Cas systems has paved the way for advanced gene editing tools. However, traditional Cas discovery methods relying on sequence similarity may miss distant homologs and aren’t suitable for functional recognition. With protein large language models (LLMs) evolving, ther...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenhui Li, Xianyue Jiang, Wuke Wang, Liya Hou, Runze Cai, Yongqian Li, Qiuxi Gu, Qinchang Chen, Peixiang Ma, Jin Tang, Menghao Guo, Guohui Chuai, Xingxu Huang, Jun Zhang, Qi Liu
Format: Article
Language:English
Published: Nature Portfolio 2024-11-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-54365-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544486095683584
author Wenhui Li
Xianyue Jiang
Wuke Wang
Liya Hou
Runze Cai
Yongqian Li
Qiuxi Gu
Qinchang Chen
Peixiang Ma
Jin Tang
Menghao Guo
Guohui Chuai
Xingxu Huang
Jun Zhang
Qi Liu
author_facet Wenhui Li
Xianyue Jiang
Wuke Wang
Liya Hou
Runze Cai
Yongqian Li
Qiuxi Gu
Qinchang Chen
Peixiang Ma
Jin Tang
Menghao Guo
Guohui Chuai
Xingxu Huang
Jun Zhang
Qi Liu
author_sort Wenhui Li
collection DOAJ
description Abstract The discovery of CRISPR-Cas systems has paved the way for advanced gene editing tools. However, traditional Cas discovery methods relying on sequence similarity may miss distant homologs and aren’t suitable for functional recognition. With protein large language models (LLMs) evolving, there is potential for Cas system modeling without extensive training data. Here, we introduce CHOOSER (Cas HOmlog Observing and SElf-processing scReening), an AI framework for alignment-free discovery of CRISPR-Cas systems with self-processing pre-crRNA capability using protein foundation models. By using CHOOSER, we identify 11 Casλ homologs, nearly doubling the known catalog. Notably, one homolog, EphcCasλ, is experimentally validated for self-processing pre-crRNA, DNA cleavage, and trans-cleavage, showing promise for CRISPR-based pathogen detection. This study highlights an innovative approach for discovering CRISPR-Cas systems with specific functions, emphasizing their potential in gene editing.
format Article
id doaj-art-06e347abdbb6496da2cdeead59091ae6
institution Kabale University
issn 2041-1723
language English
publishDate 2024-11-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-06e347abdbb6496da2cdeead59091ae62025-01-12T12:29:30ZengNature PortfolioNature Communications2041-17232024-11-0115111410.1038/s41467-024-54365-0Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation modelsWenhui Li0Xianyue Jiang1Wuke Wang2Liya Hou3Runze Cai4Yongqian Li5Qiuxi Gu6Qinchang Chen7Peixiang Ma8Jin Tang9Menghao Guo10Guohui Chuai11Xingxu Huang12Jun Zhang13Qi Liu14State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji UniversityResearch Center for Life Sciences Computing, Zhejiang LabResearch Center for Life Sciences Computing, Zhejiang LabResearch Center for Life Sciences Computing, Zhejiang LabResearch Center for Life Sciences Computing, Zhejiang LabResearch Center for Life Sciences Computing, Zhejiang LabState Key Laboratory of Reproductive Medicine and Offspring Health, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical UniversityResearch Center for Life Sciences Computing, Zhejiang LabShanghai Key Laboratory of Orthopedic Implants, Department of Orthopedic Surgery, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of MedicineResearch Center for Life Sciences Computing, Zhejiang LabResearch Center for Life Sciences Computing, Zhejiang LabState Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji UniversityResearch Center for Life Sciences Computing, Zhejiang LabState Key Laboratory of Reproductive Medicine and Offspring Health, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical UniversityState Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji UniversityAbstract The discovery of CRISPR-Cas systems has paved the way for advanced gene editing tools. However, traditional Cas discovery methods relying on sequence similarity may miss distant homologs and aren’t suitable for functional recognition. With protein large language models (LLMs) evolving, there is potential for Cas system modeling without extensive training data. Here, we introduce CHOOSER (Cas HOmlog Observing and SElf-processing scReening), an AI framework for alignment-free discovery of CRISPR-Cas systems with self-processing pre-crRNA capability using protein foundation models. By using CHOOSER, we identify 11 Casλ homologs, nearly doubling the known catalog. Notably, one homolog, EphcCasλ, is experimentally validated for self-processing pre-crRNA, DNA cleavage, and trans-cleavage, showing promise for CRISPR-based pathogen detection. This study highlights an innovative approach for discovering CRISPR-Cas systems with specific functions, emphasizing their potential in gene editing.https://doi.org/10.1038/s41467-024-54365-0
spellingShingle Wenhui Li
Xianyue Jiang
Wuke Wang
Liya Hou
Runze Cai
Yongqian Li
Qiuxi Gu
Qinchang Chen
Peixiang Ma
Jin Tang
Menghao Guo
Guohui Chuai
Xingxu Huang
Jun Zhang
Qi Liu
Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation models
Nature Communications
title Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation models
title_full Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation models
title_fullStr Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation models
title_full_unstemmed Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation models
title_short Discovering CRISPR-Cas system with self-processing pre-crRNA capability by foundation models
title_sort discovering crispr cas system with self processing pre crrna capability by foundation models
url https://doi.org/10.1038/s41467-024-54365-0
work_keys_str_mv AT wenhuili discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT xianyuejiang discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT wukewang discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT liyahou discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT runzecai discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT yongqianli discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT qiuxigu discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT qinchangchen discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT peixiangma discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT jintang discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT menghaoguo discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT guohuichuai discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT xingxuhuang discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT junzhang discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels
AT qiliu discoveringcrisprcassystemwithselfprocessingprecrrnacapabilitybyfoundationmodels