The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes

Abstract Background The precise prediction of transcription factor binding sites (TFBSs) is pivotal for unraveling the gene regulatory networks underlying biological processes. While numerous tools have emerged for in silico TFBS prediction in recent years, the evolving landscape of computational bi...

Full description

Saved in:
Bibliographic Details
Main Authors: Dinithi V. Wanniarachchi, Sameera Viswakula, Anushka M. Wickramasuriya
Format: Article
Language:English
Published: BMC 2024-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-024-05995-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846136800517554176
author Dinithi V. Wanniarachchi
Sameera Viswakula
Anushka M. Wickramasuriya
author_facet Dinithi V. Wanniarachchi
Sameera Viswakula
Anushka M. Wickramasuriya
author_sort Dinithi V. Wanniarachchi
collection DOAJ
description Abstract Background The precise prediction of transcription factor binding sites (TFBSs) is pivotal for unraveling the gene regulatory networks underlying biological processes. While numerous tools have emerged for in silico TFBS prediction in recent years, the evolving landscape of computational biology necessitates thorough assessments of tool performance to ensure accuracy and reliability. Only a limited number of studies have been conducted to evaluate the performance of TFBS prediction tools comprehensively. Thus, the present study focused on assessing twelve widely used TFBS prediction tools and four de novo motif discovery tools using a benchmark dataset comprising real, generic, Markov, and negative sequences. TFBSs of Arabidopsis thaliana and Homo sapiens genomes downloaded from the JASPAR database were implanted in these sequences and the performance of tools was evaluated using several statistical parameters at different overlap percentages between the lengths of known and predicted binding sites. Results Overall, the Multiple Cluster Alignment and Search Tool (MCAST) emerged as the best TFBS prediction tool, followed by Find Individual Motif Occurrences (FIMO) and MOtif Occurrence Detection Suite (MOODS). In addition, MotEvo and Dinucleotide Weight Tensor Toolbox (DWT-toolbox) demonstrated the highest sensitivity in identifying TFBSs at 90% and 80% overlap. Further, MCAST and DWT-toolbox managed to demonstrate the highest sensitivity across all three data types real, generic, and Markov. Among the de novo motif discovery tools, the Multiple Em for Motif Elicitation (MEME) emerged as the best performer. An analysis of the promoter regions of genes involved in the anthocyanin biosynthesis pathway in plants and the pentose phosphate pathway in humans, using the three best-performing tools, revealed considerable variation among the top 20 motifs identified by these tools. Conclusion The findings of this study lay a robust groundwork for selecting optimal TFBS prediction tools for future research. Given the variability observed in tool performance, employing multiple tools for identifying TFBSs in a set of sequences is highly recommended. In addition, further studies are recommended to develop an integrated toolbox that incorporates TFBS prediction or motif discovery tools, aiming to streamline result precision and accuracy.
format Article
id doaj-art-dcdfc01c8a8b46d2953eb6a296990967
institution Kabale University
issn 1471-2105
language English
publishDate 2024-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-dcdfc01c8a8b46d2953eb6a2969909672024-12-08T12:47:36ZengBMCBMC Bioinformatics1471-21052024-12-0125112510.1186/s12859-024-05995-0The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomesDinithi V. Wanniarachchi0Sameera Viswakula1Anushka M. Wickramasuriya2Department of Plant Sciences, Faculty of Science, University of ColomboDepartment of Statistics, Faculty of Science, University of ColomboDepartment of Plant Sciences, Faculty of Science, University of ColomboAbstract Background The precise prediction of transcription factor binding sites (TFBSs) is pivotal for unraveling the gene regulatory networks underlying biological processes. While numerous tools have emerged for in silico TFBS prediction in recent years, the evolving landscape of computational biology necessitates thorough assessments of tool performance to ensure accuracy and reliability. Only a limited number of studies have been conducted to evaluate the performance of TFBS prediction tools comprehensively. Thus, the present study focused on assessing twelve widely used TFBS prediction tools and four de novo motif discovery tools using a benchmark dataset comprising real, generic, Markov, and negative sequences. TFBSs of Arabidopsis thaliana and Homo sapiens genomes downloaded from the JASPAR database were implanted in these sequences and the performance of tools was evaluated using several statistical parameters at different overlap percentages between the lengths of known and predicted binding sites. Results Overall, the Multiple Cluster Alignment and Search Tool (MCAST) emerged as the best TFBS prediction tool, followed by Find Individual Motif Occurrences (FIMO) and MOtif Occurrence Detection Suite (MOODS). In addition, MotEvo and Dinucleotide Weight Tensor Toolbox (DWT-toolbox) demonstrated the highest sensitivity in identifying TFBSs at 90% and 80% overlap. Further, MCAST and DWT-toolbox managed to demonstrate the highest sensitivity across all three data types real, generic, and Markov. Among the de novo motif discovery tools, the Multiple Em for Motif Elicitation (MEME) emerged as the best performer. An analysis of the promoter regions of genes involved in the anthocyanin biosynthesis pathway in plants and the pentose phosphate pathway in humans, using the three best-performing tools, revealed considerable variation among the top 20 motifs identified by these tools. Conclusion The findings of this study lay a robust groundwork for selecting optimal TFBS prediction tools for future research. Given the variability observed in tool performance, employing multiple tools for identifying TFBSs in a set of sequences is highly recommended. In addition, further studies are recommended to develop an integrated toolbox that incorporates TFBS prediction or motif discovery tools, aiming to streamline result precision and accuracy.https://doi.org/10.1186/s12859-024-05995-0Transcription factor binding sitesBioinformatics toolsPerformance evaluation
spellingShingle Dinithi V. Wanniarachchi
Sameera Viswakula
Anushka M. Wickramasuriya
The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes
BMC Bioinformatics
Transcription factor binding sites
Bioinformatics tools
Performance evaluation
title The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes
title_full The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes
title_fullStr The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes
title_full_unstemmed The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes
title_short The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes
title_sort evaluation of transcription factor binding site prediction tools in human and arabidopsis genomes
topic Transcription factor binding sites
Bioinformatics tools
Performance evaluation
url https://doi.org/10.1186/s12859-024-05995-0
work_keys_str_mv AT dinithivwanniarachchi theevaluationoftranscriptionfactorbindingsitepredictiontoolsinhumanandarabidopsisgenomes
AT sameeraviswakula theevaluationoftranscriptionfactorbindingsitepredictiontoolsinhumanandarabidopsisgenomes
AT anushkamwickramasuriya theevaluationoftranscriptionfactorbindingsitepredictiontoolsinhumanandarabidopsisgenomes
AT dinithivwanniarachchi evaluationoftranscriptionfactorbindingsitepredictiontoolsinhumanandarabidopsisgenomes
AT sameeraviswakula evaluationoftranscriptionfactorbindingsitepredictiontoolsinhumanandarabidopsisgenomes
AT anushkamwickramasuriya evaluationoftranscriptionfactorbindingsitepredictiontoolsinhumanandarabidopsisgenomes