Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery

BackgroundThe study of the human microbiome is crucial for understanding disease mechanisms, identifying biomarkers, and guiding preventive measures. Advances in sequencing platforms, particularly 16S rRNA sequencing, have revolutionized microbiome research. Despite the benefits, large microbiome re...

Full description

Saved in:
Bibliographic Details
Main Authors: Hyejung Han, Yoon Hee Choi, Si Yeong Kim, Jung Hwa Park, Jin Chung, Hee Sam Na
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-11-01
Series:Frontiers in Microbiology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmicb.2024.1485073/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846157640066924544
author Hyejung Han
Yoon Hee Choi
Si Yeong Kim
Jung Hwa Park
Jin Chung
Hee Sam Na
author_facet Hyejung Han
Yoon Hee Choi
Si Yeong Kim
Jung Hwa Park
Jin Chung
Hee Sam Na
author_sort Hyejung Han
collection DOAJ
description BackgroundThe study of the human microbiome is crucial for understanding disease mechanisms, identifying biomarkers, and guiding preventive measures. Advances in sequencing platforms, particularly 16S rRNA sequencing, have revolutionized microbiome research. Despite the benefits, large microbiome reference databases (DBs) pose challenges, including computational demands and potential inaccuracies. This study aimed to determine if full-length 16S rRNA sequencing data produced by PacBio could be used to optimize reference DBs and be applied to Illumina V3-V4 targeted sequencing data for microbial study.MethodsOral and gut microbiome data (PRJNA1049979) were retrieved from NCBI. DADA2 was applied to full-length 16S rRNA PacBio data to obtain amplicon sequencing variants (ASVs). The RDP reference DB was used to assign the ASVs, which were then used as a reference DB to train the classifier. QIIME2 was used for V3-V4 targeted Illumina data analysis. BLAST was used to analyze alignment statistics. Linear discriminant analysis Effect Size (LEfSe) was employed for discriminant analysis.ResultsASVs produced by PacBio showed coverage of the oral microbiome similar to the Human Oral Microbiome Database. A phylogenetic tree was trimmed at various thresholds to obtain an optimized reference DB. This established method was then applied to gut microbiome data, and the optimized gut microbiome reference DB provided improved taxa classification and biomarker discovery efficiency.ConclusionFull-length 16S rRNA sequencing data produced by PacBio can be used to construct a microbiome reference DB. Utilizing an optimized reference DB can increase the accuracy of microbiome classification and enhance biomarker discovery.
format Article
id doaj-art-09a560aaa02c4f9ca80fe6bb679efda4
institution Kabale University
issn 1664-302X
language English
publishDate 2024-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Microbiology
spelling doaj-art-09a560aaa02c4f9ca80fe6bb679efda42024-11-25T06:24:11ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2024-11-011510.3389/fmicb.2024.14850731485073Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discoveryHyejung Han0Yoon Hee Choi1Si Yeong Kim2Jung Hwa Park3Jin Chung4Hee Sam Na5Department of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of KoreaDepartment of Internal Medicine, Dongnam Institute of Radiological and Medical Sciences, Busan, Republic of KoreaDepartment of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of KoreaDepartment of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of KoreaDepartment of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of KoreaDepartment of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of KoreaBackgroundThe study of the human microbiome is crucial for understanding disease mechanisms, identifying biomarkers, and guiding preventive measures. Advances in sequencing platforms, particularly 16S rRNA sequencing, have revolutionized microbiome research. Despite the benefits, large microbiome reference databases (DBs) pose challenges, including computational demands and potential inaccuracies. This study aimed to determine if full-length 16S rRNA sequencing data produced by PacBio could be used to optimize reference DBs and be applied to Illumina V3-V4 targeted sequencing data for microbial study.MethodsOral and gut microbiome data (PRJNA1049979) were retrieved from NCBI. DADA2 was applied to full-length 16S rRNA PacBio data to obtain amplicon sequencing variants (ASVs). The RDP reference DB was used to assign the ASVs, which were then used as a reference DB to train the classifier. QIIME2 was used for V3-V4 targeted Illumina data analysis. BLAST was used to analyze alignment statistics. Linear discriminant analysis Effect Size (LEfSe) was employed for discriminant analysis.ResultsASVs produced by PacBio showed coverage of the oral microbiome similar to the Human Oral Microbiome Database. A phylogenetic tree was trimmed at various thresholds to obtain an optimized reference DB. This established method was then applied to gut microbiome data, and the optimized gut microbiome reference DB provided improved taxa classification and biomarker discovery efficiency.ConclusionFull-length 16S rRNA sequencing data produced by PacBio can be used to construct a microbiome reference DB. Utilizing an optimized reference DB can increase the accuracy of microbiome classification and enhance biomarker discovery.https://www.frontiersin.org/articles/10.3389/fmicb.2024.1485073/fulloral microbiomegut microbiomePacBioIlluminanext generation sequencingreference database
spellingShingle Hyejung Han
Yoon Hee Choi
Si Yeong Kim
Jung Hwa Park
Jin Chung
Hee Sam Na
Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery
Frontiers in Microbiology
oral microbiome
gut microbiome
PacBio
Illumina
next generation sequencing
reference database
title Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery
title_full Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery
title_fullStr Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery
title_full_unstemmed Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery
title_short Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery
title_sort optimizing microbiome reference databases with pacbio full length 16s rrna sequencing for enhanced taxonomic classification and biomarker discovery
topic oral microbiome
gut microbiome
PacBio
Illumina
next generation sequencing
reference database
url https://www.frontiersin.org/articles/10.3389/fmicb.2024.1485073/full
work_keys_str_mv AT hyejunghan optimizingmicrobiomereferencedatabaseswithpacbiofulllength16srrnasequencingforenhancedtaxonomicclassificationandbiomarkerdiscovery
AT yoonheechoi optimizingmicrobiomereferencedatabaseswithpacbiofulllength16srrnasequencingforenhancedtaxonomicclassificationandbiomarkerdiscovery
AT siyeongkim optimizingmicrobiomereferencedatabaseswithpacbiofulllength16srrnasequencingforenhancedtaxonomicclassificationandbiomarkerdiscovery
AT junghwapark optimizingmicrobiomereferencedatabaseswithpacbiofulllength16srrnasequencingforenhancedtaxonomicclassificationandbiomarkerdiscovery
AT jinchung optimizingmicrobiomereferencedatabaseswithpacbiofulllength16srrnasequencingforenhancedtaxonomicclassificationandbiomarkerdiscovery
AT heesamna optimizingmicrobiomereferencedatabaseswithpacbiofulllength16srrnasequencingforenhancedtaxonomicclassificationandbiomarkerdiscovery