DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes
Abstract Background Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Desp...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-024-06030-y |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544243665960960 |
---|---|
author | Mohamed Elmanzalawi Takatomo Fujisawa Hiroshi Mori Yasukazu Nakamura Yasuhiro Tanizawa |
author_facet | Mohamed Elmanzalawi Takatomo Fujisawa Hiroshi Mori Yasukazu Nakamura Yasuhiro Tanizawa |
author_sort | Mohamed Elmanzalawi |
collection | DOAJ |
description | Abstract Background Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Despite advances in genome analysis techniques, challenges persist in ensuring precise and reliable taxonomic assignments. Existing tools for genome verification often involve extensive computational resources or lengthy processing times, which can limit their accessibility and scalability for large-scale projects. There is a need for more efficient, user-friendly solutions that can handle diverse datasets and provide accurate results with minimal computational demands. This work aimed to address these challenges by introducing a novel tool that enhances taxonomic accuracy, offers a user-friendly interface, and supports large-scale analyses. Results We introduce a novel tool for the quality control and taxonomic classification tool of prokaryotic genomes, called DFAST_QC, which is available as both a command-line tool and a web service. DFAST_QC can quickly identify species based on NCBI and GTDB taxonomies by combining genome-distance calculations using MASH with ANI calculations using Skani. We evaluated DFAST_QC's performance in species identification and found it to be highly consistent with existing taxonomic standards, successfully identifying species across diverse datasets. In several cases, DFAST_QC identified potential mislabeling of species names in public databases and highlighted discrepancies in current classifications, demonstrating its capability to uncover errors and enhance taxonomic accuracy. Additionally, the tool’s efficient design allows it to operate smoothly on local machines with minimal computational requirements, making it a practical choice for large-scale genome projects. Conclusions DFAST_QC is a reliable and efficient tool for accurate taxonomic identification and genome quality control, well-suited for large-scale genomic studies. Its compatibility with limited-resource environments, combined with its user-friendly design, ensures seamless integration into existing workflows. DFAST_QC's ability to refine species assignments in public databases highlights its value as a complementary tool for maintaining and enhancing the accuracy of taxonomic data in genomic research. The web version is available at https://dfast.ddbj.nig.ac.jp/dqc/submit/ , and the source code for local use can be found at https://github.com/nigyta/dfast_qc . |
format | Article |
id | doaj-art-ab5ff993b3a0486ab6272de82f0fa7ca |
institution | Kabale University |
issn | 1471-2105 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj-art-ab5ff993b3a0486ab6272de82f0fa7ca2025-01-12T12:41:51ZengBMCBMC Bioinformatics1471-21052025-01-0126111110.1186/s12859-024-06030-yDFAST_QC: quality assessment and taxonomic identification tool for prokaryotic GenomesMohamed Elmanzalawi0Takatomo Fujisawa1Hiroshi Mori2Yasukazu Nakamura3Yasuhiro Tanizawa4Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI)Department of Informatics, National Institute of GeneticsDepartment of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI)Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI)Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI)Abstract Background Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Despite advances in genome analysis techniques, challenges persist in ensuring precise and reliable taxonomic assignments. Existing tools for genome verification often involve extensive computational resources or lengthy processing times, which can limit their accessibility and scalability for large-scale projects. There is a need for more efficient, user-friendly solutions that can handle diverse datasets and provide accurate results with minimal computational demands. This work aimed to address these challenges by introducing a novel tool that enhances taxonomic accuracy, offers a user-friendly interface, and supports large-scale analyses. Results We introduce a novel tool for the quality control and taxonomic classification tool of prokaryotic genomes, called DFAST_QC, which is available as both a command-line tool and a web service. DFAST_QC can quickly identify species based on NCBI and GTDB taxonomies by combining genome-distance calculations using MASH with ANI calculations using Skani. We evaluated DFAST_QC's performance in species identification and found it to be highly consistent with existing taxonomic standards, successfully identifying species across diverse datasets. In several cases, DFAST_QC identified potential mislabeling of species names in public databases and highlighted discrepancies in current classifications, demonstrating its capability to uncover errors and enhance taxonomic accuracy. Additionally, the tool’s efficient design allows it to operate smoothly on local machines with minimal computational requirements, making it a practical choice for large-scale genome projects. Conclusions DFAST_QC is a reliable and efficient tool for accurate taxonomic identification and genome quality control, well-suited for large-scale genomic studies. Its compatibility with limited-resource environments, combined with its user-friendly design, ensures seamless integration into existing workflows. DFAST_QC's ability to refine species assignments in public databases highlights its value as a complementary tool for maintaining and enhancing the accuracy of taxonomic data in genomic research. The web version is available at https://dfast.ddbj.nig.ac.jp/dqc/submit/ , and the source code for local use can be found at https://github.com/nigyta/dfast_qc .https://doi.org/10.1186/s12859-024-06030-yTaxonomyProkaryoteDatabaseINSDCANI |
spellingShingle | Mohamed Elmanzalawi Takatomo Fujisawa Hiroshi Mori Yasukazu Nakamura Yasuhiro Tanizawa DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes BMC Bioinformatics Taxonomy Prokaryote Database INSDC ANI |
title | DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes |
title_full | DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes |
title_fullStr | DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes |
title_full_unstemmed | DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes |
title_short | DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes |
title_sort | dfast qc quality assessment and taxonomic identification tool for prokaryotic genomes |
topic | Taxonomy Prokaryote Database INSDC ANI |
url | https://doi.org/10.1186/s12859-024-06030-y |
work_keys_str_mv | AT mohamedelmanzalawi dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes AT takatomofujisawa dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes AT hiroshimori dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes AT yasukazunakamura dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes AT yasuhirotanizawa dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes |