A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data

Abstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yige Zhao, Tian Lan, Guojie Zhong, Jake Hagen, Hongbing Pan, Wendy K. Chung, Yufeng Shen
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-05-01
Series:	Nature Communications
Online Access:	https://doi.org/10.1038/s41467-025-59937-2
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849326019409870848
author	Yige Zhao Tian Lan Guojie Zhong Jake Hagen Hongbing Pan Wendy K. Chung Yufeng Shen
author_facet	Yige Zhao Tian Lan Guojie Zhong Jake Hagen Hongbing Pan Wendy K. Chung Yufeng Shen
author_sort	Yige Zhao
collection	DOAJ
description	Abstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( $$d$$ d ) and a population level (selection coefficient, $$s$$ s ), assuming that in the same gene, missense variants with similar $$d$$ d have similar $$s$$ s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that $$s$$ s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, $$s$$ s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts $$s$$ s and yields new insights from genomic data.
format	Article
id	doaj-art-00e8a0f70e6b4cdd8fe7cfa43753c974
institution	Kabale University
issn	2041-1723
language	English
publishDate	2025-05-01
publisher	Nature Portfolio
record_format	Article
series	Nature Communications
spelling	doaj-art-00e8a0f70e6b4cdd8fe7cfa43753c9742025-08-20T03:48:15ZengNature PortfolioNature Communications2041-17232025-05-0116111210.1038/s41467-025-59937-2A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence dataYige Zhao0Tian Lan1Guojie Zhong2Jake Hagen3Hongbing Pan4Wendy K. Chung5Yufeng Shen6Department of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Biomedical Informatics, Columbia University Irving Medical CenterDepartment of Pediatrics, Boston Children’s Hospital and Harvard Medical SchoolDepartment of Systems Biology, Columbia University Irving Medical CenterAbstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( $$d$$ d ) and a population level (selection coefficient, $$s$$ s ), assuming that in the same gene, missense variants with similar $$d$$ d have similar $$s$$ s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that $$s$$ s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, $$s$$ s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts $$s$$ s and yields new insights from genomic data.https://doi.org/10.1038/s41467-025-59937-2
spellingShingle	Yige Zhao Tian Lan Guojie Zhong Jake Hagen Hongbing Pan Wendy K. Chung Yufeng Shen A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data Nature Communications
title	A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_full	A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_fullStr	A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_full_unstemmed	A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_short	A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_sort	probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
url	https://doi.org/10.1038/s41467-025-59937-2
work_keys_str_mv	AT yigezhao aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT tianlan aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT guojiezhong aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT jakehagen aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT hongbingpan aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT wendykchung aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT yufengshen aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT yigezhao probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT tianlan probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT guojiezhong probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT jakehagen probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT hongbingpan probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT wendykchung probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT yufengshen probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata

A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data

Similar Items