Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces.

In data-based modeling, correlations between explanatory variables often lead to the formation of distinct gene blocks. This study focuses on identifying influential gene blocks and key variables within these blocks, with a particular application in mind: genotype-phenotype mapping in Saccharomyces....

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Tahir, Bu Yude, Tahir Mehmood, Saima Bashir, Zeeshan Ashraf
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0316350
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555594960437248
author Muhammad Tahir
Bu Yude
Tahir Mehmood
Saima Bashir
Zeeshan Ashraf
author_facet Muhammad Tahir
Bu Yude
Tahir Mehmood
Saima Bashir
Zeeshan Ashraf
author_sort Muhammad Tahir
collection DOAJ
description In data-based modeling, correlations between explanatory variables often lead to the formation of distinct gene blocks. This study focuses on identifying influential gene blocks and key variables within these blocks, with a particular application in mind: genotype-phenotype mapping in Saccharomyces. To overcome the challenges of a limited sample size, we use partial least squares (PLS). These gene blocks, which consist of combinations of genes, play a critical role in explaining phenotypic variations. Using partial least squares with multiple blocks, we propose a novel approach, weighted block importance on projection in partial least squares (BwIP-mbPLS), to identify influential gene blocks. Variable importance on projection is used to select significant genes within these blocks. Our study models copper chloride at 0.375mM and melibiose at 2% efficiency and rate in Saccharomyces cerevisiae yeast. Analysis based on silhouette index and total distance within clusters using k-means shows the classification of 5629 genes into 18 gene blocks. Remarkably, BwIP-mbPLS identifies 4 gene blocks on average and significantly improves the prediction of efficiency-based phenotypes. In contrast, traditional block importance in partial least squares projection identifies 6 gene blocks on average and shows comparable or better performance than BIP-mbPLS for rate-based phenotypes. Remarkably, most gene blocks contain fewer than 10 influential genes. Both proposed variants consistently outperform conventional approaches such as partial least squares and multi-block partial least squares in predicting phenotypes. These results highlight the potential of our methods for advancing data-based modeling and genotype-phenotype mapping.
format Article
id doaj-art-45fd376b7fec4528973c8813272e7236
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-45fd376b7fec4528973c8813272e72362025-01-08T05:31:58ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031635010.1371/journal.pone.0316350Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces.Muhammad TahirBu YudeTahir MehmoodSaima BashirZeeshan AshrafIn data-based modeling, correlations between explanatory variables often lead to the formation of distinct gene blocks. This study focuses on identifying influential gene blocks and key variables within these blocks, with a particular application in mind: genotype-phenotype mapping in Saccharomyces. To overcome the challenges of a limited sample size, we use partial least squares (PLS). These gene blocks, which consist of combinations of genes, play a critical role in explaining phenotypic variations. Using partial least squares with multiple blocks, we propose a novel approach, weighted block importance on projection in partial least squares (BwIP-mbPLS), to identify influential gene blocks. Variable importance on projection is used to select significant genes within these blocks. Our study models copper chloride at 0.375mM and melibiose at 2% efficiency and rate in Saccharomyces cerevisiae yeast. Analysis based on silhouette index and total distance within clusters using k-means shows the classification of 5629 genes into 18 gene blocks. Remarkably, BwIP-mbPLS identifies 4 gene blocks on average and significantly improves the prediction of efficiency-based phenotypes. In contrast, traditional block importance in partial least squares projection identifies 6 gene blocks on average and shows comparable or better performance than BIP-mbPLS for rate-based phenotypes. Remarkably, most gene blocks contain fewer than 10 influential genes. Both proposed variants consistently outperform conventional approaches such as partial least squares and multi-block partial least squares in predicting phenotypes. These results highlight the potential of our methods for advancing data-based modeling and genotype-phenotype mapping.https://doi.org/10.1371/journal.pone.0316350
spellingShingle Muhammad Tahir
Bu Yude
Tahir Mehmood
Saima Bashir
Zeeshan Ashraf
Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces.
PLoS ONE
title Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces.
title_full Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces.
title_fullStr Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces.
title_full_unstemmed Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces.
title_short Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces.
title_sort block selection in multiblock partial least squares for modeling genotype phenotype relations in saccharomyces
url https://doi.org/10.1371/journal.pone.0316350
work_keys_str_mv AT muhammadtahir blockselectioninmultiblockpartialleastsquaresformodelinggenotypephenotyperelationsinsaccharomyces
AT buyude blockselectioninmultiblockpartialleastsquaresformodelinggenotypephenotyperelationsinsaccharomyces
AT tahirmehmood blockselectioninmultiblockpartialleastsquaresformodelinggenotypephenotyperelationsinsaccharomyces
AT saimabashir blockselectioninmultiblockpartialleastsquaresformodelinggenotypephenotyperelationsinsaccharomyces
AT zeeshanashraf blockselectioninmultiblockpartialleastsquaresformodelinggenotypephenotyperelationsinsaccharomyces