Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.

Rare diseases affect 1-in-10 people in the United States and despite increased genetic testing, up to half never receive a diagnosis. Even when using advanced genome sequencing platforms to discover variants, if there is no connection between the variants found in the patient's genome and their...

Full description

Saved in:
Bibliographic Details
Main Authors: Michael S Bradshaw, Connor Gibbs, Skylar Martin, Taylor Firman, Alisa Gaskell, Bailey Fosdick, Ryan Layer
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0309205
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555512855887872
author Michael S Bradshaw
Connor Gibbs
Skylar Martin
Taylor Firman
Alisa Gaskell
Bailey Fosdick
Ryan Layer
author_facet Michael S Bradshaw
Connor Gibbs
Skylar Martin
Taylor Firman
Alisa Gaskell
Bailey Fosdick
Ryan Layer
author_sort Michael S Bradshaw
collection DOAJ
description Rare diseases affect 1-in-10 people in the United States and despite increased genetic testing, up to half never receive a diagnosis. Even when using advanced genome sequencing platforms to discover variants, if there is no connection between the variants found in the patient's genome and their phenotypes in the literature, then the patient will remain undiagnosed. When a direct variant-phenotype connection is not known, putting a patient's information in the larger context of phenotype relationships and protein-protein interactions may provide an opportunity to find an indirect explanation. Databases such as STRING contain millions of protein-protein interactions, and the Human Phenotype Ontology (HPO) contains the relations of thousands of phenotypes. By integrating these networks and clustering the entities within, we can potentially discover latent gene-to-phenotype connections. The historical records for STRING and HPO provide a unique opportunity to create a network time series for evaluating the cluster significance. Most excitingly, working with Children's Hospital Colorado, we have provided promising hypotheses about latent gene-to-phenotype connections for 38 patients. We also provide potential answers for 14 patients listed on MyGene2. Clusters our tool finds significant harbor 2.35 to 8.72 times as many gene-to-phenotype edges inferred from known drug interactions than clusters found to be insignificant. Our tool, BOCC, is available as a web app and command line tool.
format Article
id doaj-art-4926eb9cd05843c5ac2fb50d78371a3c
institution Kabale University
issn 1932-6203
language English
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-4926eb9cd05843c5ac2fb50d78371a3c2025-01-08T05:32:29ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e030920510.1371/journal.pone.0309205Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.Michael S BradshawConnor GibbsSkylar MartinTaylor FirmanAlisa GaskellBailey FosdickRyan LayerRare diseases affect 1-in-10 people in the United States and despite increased genetic testing, up to half never receive a diagnosis. Even when using advanced genome sequencing platforms to discover variants, if there is no connection between the variants found in the patient's genome and their phenotypes in the literature, then the patient will remain undiagnosed. When a direct variant-phenotype connection is not known, putting a patient's information in the larger context of phenotype relationships and protein-protein interactions may provide an opportunity to find an indirect explanation. Databases such as STRING contain millions of protein-protein interactions, and the Human Phenotype Ontology (HPO) contains the relations of thousands of phenotypes. By integrating these networks and clustering the entities within, we can potentially discover latent gene-to-phenotype connections. The historical records for STRING and HPO provide a unique opportunity to create a network time series for evaluating the cluster significance. Most excitingly, working with Children's Hospital Colorado, we have provided promising hypotheses about latent gene-to-phenotype connections for 38 patients. We also provide potential answers for 14 patients listed on MyGene2. Clusters our tool finds significant harbor 2.35 to 8.72 times as many gene-to-phenotype edges inferred from known drug interactions than clusters found to be insignificant. Our tool, BOCC, is available as a web app and command line tool.https://doi.org/10.1371/journal.pone.0309205
spellingShingle Michael S Bradshaw
Connor Gibbs
Skylar Martin
Taylor Firman
Alisa Gaskell
Bailey Fosdick
Ryan Layer
Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.
PLoS ONE
title Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.
title_full Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.
title_fullStr Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.
title_full_unstemmed Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.
title_short Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies.
title_sort hypothesis generation for rare and undiagnosed diseases through clustering and classifying time versioned biological ontologies
url https://doi.org/10.1371/journal.pone.0309205
work_keys_str_mv AT michaelsbradshaw hypothesisgenerationforrareandundiagnoseddiseasesthroughclusteringandclassifyingtimeversionedbiologicalontologies
AT connorgibbs hypothesisgenerationforrareandundiagnoseddiseasesthroughclusteringandclassifyingtimeversionedbiologicalontologies
AT skylarmartin hypothesisgenerationforrareandundiagnoseddiseasesthroughclusteringandclassifyingtimeversionedbiologicalontologies
AT taylorfirman hypothesisgenerationforrareandundiagnoseddiseasesthroughclusteringandclassifyingtimeversionedbiologicalontologies
AT alisagaskell hypothesisgenerationforrareandundiagnoseddiseasesthroughclusteringandclassifyingtimeversionedbiologicalontologies
AT baileyfosdick hypothesisgenerationforrareandundiagnoseddiseasesthroughclusteringandclassifyingtimeversionedbiologicalontologies
AT ryanlayer hypothesisgenerationforrareandundiagnoseddiseasesthroughclusteringandclassifyingtimeversionedbiologicalontologies