PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts
Abstract The intricate nature of microbiota sequencing data—high dimensionality and sparsity—presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framewo...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | npj Biofilms and Microbiomes |
Online Access: | https://doi.org/10.1038/s41522-024-00598-2 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559790464008192 |
---|---|
author | Yin-Cheng Chen Yin-Yuan Su Tzu-Yu Chu Ming-Fong Wu Chieh-Chun Huang Chen-Ching Lin |
author_facet | Yin-Cheng Chen Yin-Yuan Su Tzu-Yu Chu Ming-Fong Wu Chieh-Chun Huang Chen-Ching Lin |
author_sort | Yin-Cheng Chen |
collection | DOAJ |
description | Abstract The intricate nature of microbiota sequencing data—high dimensionality and sparsity—presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framework that harnesses microbes’ prevalence to facilitate consistent selection in sparse microbiota data. Upon rigorous benchmarking against established feature selection methodologies across 42 microbiome datasets, PreLect demonstrated superior classification capabilities compared to statistical methods and outperformed machine learning-based methods by selecting features with greater prevalence and abundance. A significant strength of PreLect lies in its ability to reliably identify reproducible microbial features across varied cohorts. Applied to colorectal cancer, PreLect identifies key microbes and highlights crucial pathways, such as lipopolysaccharide and glycerophospholipid biosynthesis, in cancer progression. This case study exemplifies PreLect’s utility in discerning clinically relevant microbial signatures. In summary, PreLect’s accuracy and robustness make it a significant advancement in the analysis of complex microbiota data. |
format | Article |
id | doaj-art-1b8f7306779744cbac230ee740461b12 |
institution | Kabale University |
issn | 2055-5008 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | npj Biofilms and Microbiomes |
spelling | doaj-art-1b8f7306779744cbac230ee740461b122025-01-05T12:10:12ZengNature Portfolionpj Biofilms and Microbiomes2055-50082025-01-0111111210.1038/s41522-024-00598-2PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohortsYin-Cheng Chen0Yin-Yuan Su1Tzu-Yu Chu2Ming-Fong Wu3Chieh-Chun Huang4Chen-Ching Lin5Institute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityInstitute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityAbstract The intricate nature of microbiota sequencing data—high dimensionality and sparsity—presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framework that harnesses microbes’ prevalence to facilitate consistent selection in sparse microbiota data. Upon rigorous benchmarking against established feature selection methodologies across 42 microbiome datasets, PreLect demonstrated superior classification capabilities compared to statistical methods and outperformed machine learning-based methods by selecting features with greater prevalence and abundance. A significant strength of PreLect lies in its ability to reliably identify reproducible microbial features across varied cohorts. Applied to colorectal cancer, PreLect identifies key microbes and highlights crucial pathways, such as lipopolysaccharide and glycerophospholipid biosynthesis, in cancer progression. This case study exemplifies PreLect’s utility in discerning clinically relevant microbial signatures. In summary, PreLect’s accuracy and robustness make it a significant advancement in the analysis of complex microbiota data.https://doi.org/10.1038/s41522-024-00598-2 |
spellingShingle | Yin-Cheng Chen Yin-Yuan Su Tzu-Yu Chu Ming-Fong Wu Chieh-Chun Huang Chen-Ching Lin PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts npj Biofilms and Microbiomes |
title | PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts |
title_full | PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts |
title_fullStr | PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts |
title_full_unstemmed | PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts |
title_short | PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts |
title_sort | prelect prevalence leveraged consistent feature selection decodes microbial signatures across cohorts |
url | https://doi.org/10.1038/s41522-024-00598-2 |
work_keys_str_mv | AT yinchengchen prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts AT yinyuansu prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts AT tzuyuchu prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts AT mingfongwu prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts AT chiehchunhuang prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts AT chenchinglin prelectprevalenceleveragedconsistentfeatureselectiondecodesmicrobialsignaturesacrosscohorts |