PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts

Abstract The intricate nature of microbiota sequencing data—high dimensionality and sparsity—presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framewo...

Full description

Saved in:
Bibliographic Details
Main Authors: Yin-Cheng Chen, Yin-Yuan Su, Tzu-Yu Chu, Ming-Fong Wu, Chieh-Chun Huang, Chen-Ching Lin
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Biofilms and Microbiomes
Online Access:https://doi.org/10.1038/s41522-024-00598-2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The intricate nature of microbiota sequencing data—high dimensionality and sparsity—presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framework that harnesses microbes’ prevalence to facilitate consistent selection in sparse microbiota data. Upon rigorous benchmarking against established feature selection methodologies across 42 microbiome datasets, PreLect demonstrated superior classification capabilities compared to statistical methods and outperformed machine learning-based methods by selecting features with greater prevalence and abundance. A significant strength of PreLect lies in its ability to reliably identify reproducible microbial features across varied cohorts. Applied to colorectal cancer, PreLect identifies key microbes and highlights crucial pathways, such as lipopolysaccharide and glycerophospholipid biosynthesis, in cancer progression. This case study exemplifies PreLect’s utility in discerning clinically relevant microbial signatures. In summary, PreLect’s accuracy and robustness make it a significant advancement in the analysis of complex microbiota data.
ISSN:2055-5008