PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts

Abstract The intricate nature of microbiota sequencing data—high dimensionality and sparsity—presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framewo...

Full description

Saved in:
Bibliographic Details
Main Authors: Yin-Cheng Chen, Yin-Yuan Su, Tzu-Yu Chu, Ming-Fong Wu, Chieh-Chun Huang, Chen-Ching Lin
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Biofilms and Microbiomes
Online Access:https://doi.org/10.1038/s41522-024-00598-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559790464008192
author Yin-Cheng Chen
Yin-Yuan Su
Tzu-Yu Chu
Ming-Fong Wu
Chieh-Chun Huang
Chen-Ching Lin
author_facet Yin-Cheng Chen
Yin-Yuan Su
Tzu-Yu Chu
Ming-Fong Wu
Chieh-Chun Huang
Chen-Ching Lin
author_sort Yin-Cheng Chen
collection DOAJ
description Abstract The intricate nature of microbiota sequencing data—high dimensionality and sparsity—presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framework that harnesses microbes’ prevalence to facilitate consistent selection in sparse microbiota data. Upon rigorous benchmarking against established feature selection methodologies across 42 microbiome datasets, PreLect demonstrated superior classification capabilities compared to statistical methods and outperformed machine learning-based methods by selecting features with greater prevalence and abundance. A significant strength of PreLect lies in its ability to reliably identify reproducible microbial features across varied cohorts. Applied to colorectal cancer, PreLect identifies key microbes and highlights crucial pathways, such as lipopolysaccharide and glycerophospholipid biosynthesis, in cancer progression. This case study exemplifies PreLect’s utility in discerning clinically relevant microbial signatures. In summary, PreLect’s accuracy and robustness make it a significant advancement in the analysis of complex microbiota data.
format Article
id doaj-art-1b8f7306779744cbac230ee740461b12
institution Kabale University
issn 2055-5008
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series npj Biofilms and Microbiomes
spelling doaj-art-1b8f7306779744cbac230ee740461b122025-01-05T12:10:12ZengNature Portfolionpj Biofilms and Microbiomes2055-50082025-01-0111111210.1038/s41522-024-00598-2PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohortsYin-Cheng Chen0Yin-Yuan Su1Tzu-Yu Chu2Ming-Fong Wu3Chieh-Chun Huang4Chen-Ching Lin5Institute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityAbstract The intricate nature of microbiota sequencing data—high dimensionality and sparsity—presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framework that harnesses microbes’ prevalence to facilitate consistent selection in sparse microbiota data. Upon rigorous benchmarking against established feature selection methodologies across 42 microbiome datasets, PreLect demonstrated superior classification capabilities compared to statistical methods and outperformed machine learning-based methods by selecting features with greater prevalence and abundance. A significant strength of PreLect lies in its ability to reliably identify reproducible microbial features across varied cohorts. Applied to colorectal cancer, PreLect identifies key microbes and highlights crucial pathways, such as lipopolysaccharide and glycerophospholipid biosynthesis, in cancer progression. This case study exemplifies PreLect’s utility in discerning clinically relevant microbial signatures. In summary, PreLect’s accuracy and robustness make it a significant advancement in the analysis of complex microbiota data.https://doi.org/10.1038/s41522-024-00598-2
spellingShingle Yin-Cheng Chen
Yin-Yuan Su
Tzu-Yu Chu
Ming-Fong Wu
Chieh-Chun Huang
Chen-Ching Lin
PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts
npj Biofilms and Microbiomes
title PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts
title_full PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts
title_fullStr PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts
title_full_unstemmed PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts
title_short PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts
title_sort prelect prevalence leveraged consistent feature selection decodes microbial signatures across cohorts
url https://doi.org/10.1038/s41522-024-00598-2
work_keys_str_mv AT yinchengchen prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts
AT yinyuansu prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts
AT tzuyuchu prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts
AT mingfongwu prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts
AT chiehchunhuang prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts
AT chenchinglin prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts