Generating unseen diseases patient data using ontology enhanced generative adversarial networks

Abstract Generating realistic synthetic health data (e.g., electronic health records), holds promise for fundamental research, AI model development, and enhancing data privacy safeguards. Generative Adversarial Networks (GANs) have been employed for this purpose, but their performance is largely con...

Full description

Saved in:
Bibliographic Details
Main Authors: Chang Sun, Michel Dumontier
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-024-01421-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559042453929984
author Chang Sun
Michel Dumontier
author_facet Chang Sun
Michel Dumontier
author_sort Chang Sun
collection DOAJ
description Abstract Generating realistic synthetic health data (e.g., electronic health records), holds promise for fundamental research, AI model development, and enhancing data privacy safeguards. Generative Adversarial Networks (GANs) have been employed for this purpose, but their performance is largely constrained by their reliance on training data, rendering them inadequate for rare or previously unseen diseases. This study proposes Onto-CGAN, a novel generative framework that combines knowledge from disease ontologies with GANs to generate unseen diseases that are not present in the training data. The quality of the generated data is evaluated using variable distributions, correlation coefficients, and machine learning model performance. Our findings demonstrate that Onto-CGAN generates unseen diseases with statistical characteristics comparable to the real data, and significantly improves the training of machine learning models. This innovative approach addresses the scarcity of data for rare diseases, offering valuable applications in data augmentation, hypothesis generation, and preclinical validation of clinical models.
format Article
id doaj-art-11ba36e553a441c3a562cd9aed39b388
institution Kabale University
issn 2398-6352
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-11ba36e553a441c3a562cd9aed39b3882025-01-05T12:47:23ZengNature Portfolionpj Digital Medicine2398-63522025-01-018111410.1038/s41746-024-01421-0Generating unseen diseases patient data using ontology enhanced generative adversarial networksChang Sun0Michel Dumontier1Institute of Data Science, Faculty of Science and Engineering, Maastricht UniversityInstitute of Data Science, Faculty of Science and Engineering, Maastricht UniversityAbstract Generating realistic synthetic health data (e.g., electronic health records), holds promise for fundamental research, AI model development, and enhancing data privacy safeguards. Generative Adversarial Networks (GANs) have been employed for this purpose, but their performance is largely constrained by their reliance on training data, rendering them inadequate for rare or previously unseen diseases. This study proposes Onto-CGAN, a novel generative framework that combines knowledge from disease ontologies with GANs to generate unseen diseases that are not present in the training data. The quality of the generated data is evaluated using variable distributions, correlation coefficients, and machine learning model performance. Our findings demonstrate that Onto-CGAN generates unseen diseases with statistical characteristics comparable to the real data, and significantly improves the training of machine learning models. This innovative approach addresses the scarcity of data for rare diseases, offering valuable applications in data augmentation, hypothesis generation, and preclinical validation of clinical models.https://doi.org/10.1038/s41746-024-01421-0
spellingShingle Chang Sun
Michel Dumontier
Generating unseen diseases patient data using ontology enhanced generative adversarial networks
npj Digital Medicine
title Generating unseen diseases patient data using ontology enhanced generative adversarial networks
title_full Generating unseen diseases patient data using ontology enhanced generative adversarial networks
title_fullStr Generating unseen diseases patient data using ontology enhanced generative adversarial networks
title_full_unstemmed Generating unseen diseases patient data using ontology enhanced generative adversarial networks
title_short Generating unseen diseases patient data using ontology enhanced generative adversarial networks
title_sort generating unseen diseases patient data using ontology enhanced generative adversarial networks
url https://doi.org/10.1038/s41746-024-01421-0
work_keys_str_mv AT changsun generatingunseendiseasespatientdatausingontologyenhancedgenerativeadversarialnetworks
AT micheldumontier generatingunseendiseasespatientdatausingontologyenhancedgenerativeadversarialnetworks