A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data

Abstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit,...

Full description

Saved in:
Bibliographic Details
Main Authors: Yige Zhao, Tian Lan, Guojie Zhong, Jake Hagen, Hongbing Pan, Wendy K. Chung, Yufeng Shen
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-59937-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849326019409870848
author Yige Zhao
Tian Lan
Guojie Zhong
Jake Hagen
Hongbing Pan
Wendy K. Chung
Yufeng Shen
author_facet Yige Zhao
Tian Lan
Guojie Zhong
Jake Hagen
Hongbing Pan
Wendy K. Chung
Yufeng Shen
author_sort Yige Zhao
collection DOAJ
description Abstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( $$d$$ d ) and a population level (selection coefficient, $$s$$ s ), assuming that in the same gene, missense variants with similar $$d$$ d have similar $$s$$ s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that $$s$$ s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, $$s$$ s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts $$s$$ s and yields new insights from genomic data.
format Article
id doaj-art-00e8a0f70e6b4cdd8fe7cfa43753c974
institution Kabale University
issn 2041-1723
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-00e8a0f70e6b4cdd8fe7cfa43753c9742025-08-20T03:48:15ZengNature PortfolioNature Communications2041-17232025-05-0116111210.1038/s41467-025-59937-2A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence dataYige Zhao0Tian Lan1Guojie Zhong2Jake Hagen3Hongbing Pan4Wendy K. Chung5Yufeng Shen6Department of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Biomedical Informatics, Columbia University Irving Medical CenterDepartment of Pediatrics, Boston Children’s Hospital and Harvard Medical SchoolDepartment of Systems Biology, Columbia University Irving Medical CenterAbstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( $$d$$ d ) and a population level (selection coefficient, $$s$$ s ), assuming that in the same gene, missense variants with similar $$d$$ d have similar $$s$$ s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that $$s$$ s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, $$s$$ s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts $$s$$ s and yields new insights from genomic data.https://doi.org/10.1038/s41467-025-59937-2
spellingShingle Yige Zhao
Tian Lan
Guojie Zhong
Jake Hagen
Hongbing Pan
Wendy K. Chung
Yufeng Shen
A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
Nature Communications
title A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_full A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_fullStr A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_full_unstemmed A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_short A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
title_sort probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
url https://doi.org/10.1038/s41467-025-59937-2
work_keys_str_mv AT yigezhao aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT tianlan aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT guojiezhong aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT jakehagen aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT hongbingpan aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT wendykchung aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT yufengshen aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT yigezhao probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT tianlan probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT guojiezhong probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT jakehagen probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT hongbingpan probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT wendykchung probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata
AT yufengshen probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata