DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes

Abstract Background Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Desp...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohamed Elmanzalawi, Takatomo Fujisawa, Hiroshi Mori, Yasukazu Nakamura, Yasuhiro Tanizawa
Format: Article
Language:English
Published: BMC 2025-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-024-06030-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544243665960960
author Mohamed Elmanzalawi
Takatomo Fujisawa
Hiroshi Mori
Yasukazu Nakamura
Yasuhiro Tanizawa
author_facet Mohamed Elmanzalawi
Takatomo Fujisawa
Hiroshi Mori
Yasukazu Nakamura
Yasuhiro Tanizawa
author_sort Mohamed Elmanzalawi
collection DOAJ
description Abstract Background Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Despite advances in genome analysis techniques, challenges persist in ensuring precise and reliable taxonomic assignments. Existing tools for genome verification often involve extensive computational resources or lengthy processing times, which can limit their accessibility and scalability for large-scale projects. There is a need for more efficient, user-friendly solutions that can handle diverse datasets and provide accurate results with minimal computational demands. This work aimed to address these challenges by introducing a novel tool that enhances taxonomic accuracy, offers a user-friendly interface, and supports large-scale analyses. Results We introduce a novel tool for the quality control and taxonomic classification tool of prokaryotic genomes, called DFAST_QC, which is available as both a command-line tool and a web service. DFAST_QC can quickly identify species based on NCBI and GTDB taxonomies by combining genome-distance calculations using MASH with ANI calculations using Skani. We evaluated DFAST_QC's performance in species identification and found it to be highly consistent with existing taxonomic standards, successfully identifying species across diverse datasets. In several cases, DFAST_QC identified potential mislabeling of species names in public databases and highlighted discrepancies in current classifications, demonstrating its capability to uncover errors and enhance taxonomic accuracy. Additionally, the tool’s efficient design allows it to operate smoothly on local machines with minimal computational requirements, making it a practical choice for large-scale genome projects. Conclusions DFAST_QC is a reliable and efficient tool for accurate taxonomic identification and genome quality control, well-suited for large-scale genomic studies. Its compatibility with limited-resource environments, combined with its user-friendly design, ensures seamless integration into existing workflows. DFAST_QC's ability to refine species assignments in public databases highlights its value as a complementary tool for maintaining and enhancing the accuracy of taxonomic data in genomic research. The web version is available at https://dfast.ddbj.nig.ac.jp/dqc/submit/ , and the source code for local use can be found at https://github.com/nigyta/dfast_qc .
format Article
id doaj-art-ab5ff993b3a0486ab6272de82f0fa7ca
institution Kabale University
issn 1471-2105
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-ab5ff993b3a0486ab6272de82f0fa7ca2025-01-12T12:41:51ZengBMCBMC Bioinformatics1471-21052025-01-0126111110.1186/s12859-024-06030-yDFAST_QC: quality assessment and taxonomic identification tool for prokaryotic GenomesMohamed Elmanzalawi0Takatomo Fujisawa1Hiroshi Mori2Yasukazu Nakamura3Yasuhiro Tanizawa4Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI)Department of Informatics, National Institute of GeneticsDepartment of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI)Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI)Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI)Abstract Background Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Despite advances in genome analysis techniques, challenges persist in ensuring precise and reliable taxonomic assignments. Existing tools for genome verification often involve extensive computational resources or lengthy processing times, which can limit their accessibility and scalability for large-scale projects. There is a need for more efficient, user-friendly solutions that can handle diverse datasets and provide accurate results with minimal computational demands. This work aimed to address these challenges by introducing a novel tool that enhances taxonomic accuracy, offers a user-friendly interface, and supports large-scale analyses. Results We introduce a novel tool for the quality control and taxonomic classification tool of prokaryotic genomes, called DFAST_QC, which is available as both a command-line tool and a web service. DFAST_QC can quickly identify species based on NCBI and GTDB taxonomies by combining genome-distance calculations using MASH with ANI calculations using Skani. We evaluated DFAST_QC's performance in species identification and found it to be highly consistent with existing taxonomic standards, successfully identifying species across diverse datasets. In several cases, DFAST_QC identified potential mislabeling of species names in public databases and highlighted discrepancies in current classifications, demonstrating its capability to uncover errors and enhance taxonomic accuracy. Additionally, the tool’s efficient design allows it to operate smoothly on local machines with minimal computational requirements, making it a practical choice for large-scale genome projects. Conclusions DFAST_QC is a reliable and efficient tool for accurate taxonomic identification and genome quality control, well-suited for large-scale genomic studies. Its compatibility with limited-resource environments, combined with its user-friendly design, ensures seamless integration into existing workflows. DFAST_QC's ability to refine species assignments in public databases highlights its value as a complementary tool for maintaining and enhancing the accuracy of taxonomic data in genomic research. The web version is available at https://dfast.ddbj.nig.ac.jp/dqc/submit/ , and the source code for local use can be found at https://github.com/nigyta/dfast_qc .https://doi.org/10.1186/s12859-024-06030-yTaxonomyProkaryoteDatabaseINSDCANI
spellingShingle Mohamed Elmanzalawi
Takatomo Fujisawa
Hiroshi Mori
Yasukazu Nakamura
Yasuhiro Tanizawa
DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes
BMC Bioinformatics
Taxonomy
Prokaryote
Database
INSDC
ANI
title DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes
title_full DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes
title_fullStr DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes
title_full_unstemmed DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes
title_short DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes
title_sort dfast qc quality assessment and taxonomic identification tool for prokaryotic genomes
topic Taxonomy
Prokaryote
Database
INSDC
ANI
url https://doi.org/10.1186/s12859-024-06030-y
work_keys_str_mv AT mohamedelmanzalawi dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes
AT takatomofujisawa dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes
AT hiroshimori dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes
AT yasukazunakamura dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes
AT yasuhirotanizawa dfastqcqualityassessmentandtaxonomicidentificationtoolforprokaryoticgenomes