deepTFBS: Improving within‐ and Cross‐Species Prediction of Transcription Factor Binding Using Deep Multi‐Task and Transfer Learning

Abstract The precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and ac...

Full description

Saved in:
Bibliographic Details
Main Authors: Jingjing Zhai, Yuzhou Zhang, Chujun Zhang, Xiaotong Yin, Minggui Song, Chenglong Tang, Pengjun Ding, Zenglin Li, Chuang Ma
Format: Article
Language:English
Published: Wiley 2025-08-01
Series:Advanced Science
Subjects:
Online Access:https://doi.org/10.1002/advs.202503135
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849233238966403072
author Jingjing Zhai
Yuzhou Zhang
Chujun Zhang
Xiaotong Yin
Minggui Song
Chenglong Tang
Pengjun Ding
Zenglin Li
Chuang Ma
author_facet Jingjing Zhai
Yuzhou Zhang
Chujun Zhang
Xiaotong Yin
Minggui Song
Chenglong Tang
Pengjun Ding
Zenglin Li
Chuang Ma
author_sort Jingjing Zhai
collection DOAJ
description Abstract The precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and across plant species is presented. Taking advantages of multi‐task DL and transfer learning, deepTFBS is capable of leveraging the knowledge learned from large‐scale TF binding profiles to enhance the prediction of TFBSs under small‐sample training and cross‐species prediction tasks. When tested using available information on 359 Arabidopsis TFs, deepTFBS outperformed previously described prediction strategies, including position weight matrix, deepSEA and DanQ, with a 244.49%, 49.15%, and 23.32% improvement of the area under the precision‐recall curve (PRAUC), respectively. Further cross‐species prediction of TFBS in wheat showed that deepTFBS yielded a significant PRAUC improvement of 30.6% over these three baseline models. deepTFBS can also utilize information from gene conservation and binding motifs, enabling efficient TFBS prediction in species where experimental data availability is limited. A case study, focusing on the WUSCHEL (WUS) transcription factor, illustrated the potential use of deepTFBS in cross‐species applications, in our example between Arabidopsis and wheat. deepTFBS is publically available at https://github.com/cma2015/deepTFBS.
format Article
id doaj-art-2a2d8173f45746749a110e956b4a9388
institution Kabale University
issn 2198-3844
language English
publishDate 2025-08-01
publisher Wiley
record_format Article
series Advanced Science
spelling doaj-art-2a2d8173f45746749a110e956b4a93882025-08-20T11:56:10ZengWileyAdvanced Science2198-38442025-08-011230n/an/a10.1002/advs.202503135deepTFBS: Improving within‐ and Cross‐Species Prediction of Transcription Factor Binding Using Deep Multi‐Task and Transfer LearningJingjing Zhai0Yuzhou Zhang1Chujun Zhang2Xiaotong Yin3Minggui Song4Chenglong Tang5Pengjun Ding6Zenglin Li7Chuang Ma8State Key Laboratory for Crop Stress Resistance and High‐Efficiency Production, Center of Bioinformatics, College of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaState Key Laboratory for Crop Stress Resistance and High‐Efficiency Production, Center of Bioinformatics, College of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaState Key Laboratory for Crop Stress Resistance and High‐Efficiency Production, Center of Bioinformatics, College of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaCollege of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaState Key Laboratory for Crop Stress Resistance and High‐Efficiency Production, Center of Bioinformatics, College of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaCollege of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaState Key Laboratory for Crop Stress Resistance and High‐Efficiency Production, Center of Bioinformatics, College of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaState Key Laboratory for Crop Stress Resistance and High‐Efficiency Production, Center of Bioinformatics, College of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaState Key Laboratory for Crop Stress Resistance and High‐Efficiency Production, Center of Bioinformatics, College of Life Sciences Northwest A&F University Yangling Shaanxi 712100 ChinaAbstract The precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and across plant species is presented. Taking advantages of multi‐task DL and transfer learning, deepTFBS is capable of leveraging the knowledge learned from large‐scale TF binding profiles to enhance the prediction of TFBSs under small‐sample training and cross‐species prediction tasks. When tested using available information on 359 Arabidopsis TFs, deepTFBS outperformed previously described prediction strategies, including position weight matrix, deepSEA and DanQ, with a 244.49%, 49.15%, and 23.32% improvement of the area under the precision‐recall curve (PRAUC), respectively. Further cross‐species prediction of TFBS in wheat showed that deepTFBS yielded a significant PRAUC improvement of 30.6% over these three baseline models. deepTFBS can also utilize information from gene conservation and binding motifs, enabling efficient TFBS prediction in species where experimental data availability is limited. A case study, focusing on the WUSCHEL (WUS) transcription factor, illustrated the potential use of deepTFBS in cross‐species applications, in our example between Arabidopsis and wheat. deepTFBS is publically available at https://github.com/cma2015/deepTFBS.https://doi.org/10.1002/advs.202503135bioinformaticscross‐species predictiondeep learningmachine learningtranscriptional regulatory network
spellingShingle Jingjing Zhai
Yuzhou Zhang
Chujun Zhang
Xiaotong Yin
Minggui Song
Chenglong Tang
Pengjun Ding
Zenglin Li
Chuang Ma
deepTFBS: Improving within‐ and Cross‐Species Prediction of Transcription Factor Binding Using Deep Multi‐Task and Transfer Learning
Advanced Science
bioinformatics
cross‐species prediction
deep learning
machine learning
transcriptional regulatory network
title deepTFBS: Improving within‐ and Cross‐Species Prediction of Transcription Factor Binding Using Deep Multi‐Task and Transfer Learning
title_full deepTFBS: Improving within‐ and Cross‐Species Prediction of Transcription Factor Binding Using Deep Multi‐Task and Transfer Learning
title_fullStr deepTFBS: Improving within‐ and Cross‐Species Prediction of Transcription Factor Binding Using Deep Multi‐Task and Transfer Learning
title_full_unstemmed deepTFBS: Improving within‐ and Cross‐Species Prediction of Transcription Factor Binding Using Deep Multi‐Task and Transfer Learning
title_short deepTFBS: Improving within‐ and Cross‐Species Prediction of Transcription Factor Binding Using Deep Multi‐Task and Transfer Learning
title_sort deeptfbs improving within and cross species prediction of transcription factor binding using deep multi task and transfer learning
topic bioinformatics
cross‐species prediction
deep learning
machine learning
transcriptional regulatory network
url https://doi.org/10.1002/advs.202503135
work_keys_str_mv AT jingjingzhai deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning
AT yuzhouzhang deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning
AT chujunzhang deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning
AT xiaotongyin deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning
AT mingguisong deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning
AT chenglongtang deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning
AT pengjunding deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning
AT zenglinli deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning
AT chuangma deeptfbsimprovingwithinandcrossspeciespredictionoftranscriptionfactorbindingusingdeepmultitaskandtransferlearning