A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data
Abstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit,...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-59937-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849326019409870848 |
|---|---|
| author | Yige Zhao Tian Lan Guojie Zhong Jake Hagen Hongbing Pan Wendy K. Chung Yufeng Shen |
| author_facet | Yige Zhao Tian Lan Guojie Zhong Jake Hagen Hongbing Pan Wendy K. Chung Yufeng Shen |
| author_sort | Yige Zhao |
| collection | DOAJ |
| description | Abstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( $$d$$ d ) and a population level (selection coefficient, $$s$$ s ), assuming that in the same gene, missense variants with similar $$d$$ d have similar $$s$$ s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that $$s$$ s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, $$s$$ s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts $$s$$ s and yields new insights from genomic data. |
| format | Article |
| id | doaj-art-00e8a0f70e6b4cdd8fe7cfa43753c974 |
| institution | Kabale University |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-00e8a0f70e6b4cdd8fe7cfa43753c9742025-08-20T03:48:15ZengNature PortfolioNature Communications2041-17232025-05-0116111210.1038/s41467-025-59937-2A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence dataYige Zhao0Tian Lan1Guojie Zhong2Jake Hagen3Hongbing Pan4Wendy K. Chung5Yufeng Shen6Department of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Systems Biology, Columbia University Irving Medical CenterDepartment of Biomedical Informatics, Columbia University Irving Medical CenterDepartment of Pediatrics, Boston Children’s Hospital and Harvard Medical SchoolDepartment of Systems Biology, Columbia University Irving Medical CenterAbstract Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We develop a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level ( $$d$$ d ) and a population level (selection coefficient, $$s$$ s ), assuming that in the same gene, missense variants with similar $$d$$ d have similar $$s$$ s . We train it by maximizing probability of observed allele counts in 236,017 individuals of European ancestry. We show that $$s$$ s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, $$s$$ s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts $$s$$ s and yields new insights from genomic data.https://doi.org/10.1038/s41467-025-59937-2 |
| spellingShingle | Yige Zhao Tian Lan Guojie Zhong Jake Hagen Hongbing Pan Wendy K. Chung Yufeng Shen A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data Nature Communications |
| title | A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data |
| title_full | A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data |
| title_fullStr | A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data |
| title_full_unstemmed | A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data |
| title_short | A probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data |
| title_sort | probabilistic graphical model for estimating selection coefficients of nonsynonymous variants from human population sequence data |
| url | https://doi.org/10.1038/s41467-025-59937-2 |
| work_keys_str_mv | AT yigezhao aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT tianlan aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT guojiezhong aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT jakehagen aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT hongbingpan aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT wendykchung aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT yufengshen aprobabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT yigezhao probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT tianlan probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT guojiezhong probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT jakehagen probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT hongbingpan probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT wendykchung probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata AT yufengshen probabilisticgraphicalmodelforestimatingselectioncoefficientsofnonsynonymousvariantsfromhumanpopulationsequencedata |