A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles
Abstract Background Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mo...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | Orphanet Journal of Rare Diseases |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13023-025-03537-2 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544266744070144 |
---|---|
author | Joost Groen Bas M. de Haan Ruben J. Overduin Andrea B. Haijer-Schreuder Terry GJ Derks M. Rebecca Heiner-Fokkema |
author_facet | Joost Groen Bas M. de Haan Ruben J. Overduin Andrea B. Haijer-Schreuder Terry GJ Derks M. Rebecca Heiner-Fokkema |
author_sort | Joost Groen |
collection | DOAJ |
description | Abstract Background Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mostly routine clinical chemical markers and subsequent genetic confirmation. However, a specific and reliable biomarker is lacking. As GSD Ia patients demonstrate altered lipid metabolism and mitochondrial fatty acid oxidation, we built a machine learning model to identify GSD Ia patients based on plasma acylcarnitine profiles. Methods We collected plasma acylcarnitine profiles from 3958 patients, of whom 31 have GSD Ia. Synthetic samples were generated to address the problem of class imbalance in the dataset. We built several machine learning models based on gradient-boosted trees. Our approach included hyperparameter tuning and feature selection and generalization was checked using both nested cross-validation and a held-out test set. Results The binary classifier was able to correctly identify 5/6 GSD Ia patients in a held-out test set without generating significant amounts of false positive results. The best model showed excellent performance with a mean received operator curve (ROC) AUC of 0.955 and precision-recall (PR) curve AUC of 0.674 in nested CV. Conclusions This study demonstrates an innovative approach to applying machine learning to ultra-rare diseases by accurately identifying GSD Ia patients based on plasma free carnitine and acylcarnitine concentrations, leveraging subtle acylcarnitine abnormalities. Acylcarnitine features that were strong predictors for GSD Ia include C16-carnitine, C14OH-carnitine, total carnitine and acetylcarnitine. The model demonstrated high sensitivity and specificity, with selected parameters that were not only robust but also highly interpretable. Our approach offers potential prospect for the inclusion of GSD Ia in newborn screening. Rare diseases are underrepresented in machine learning studies and this work highlights the potential for these techniques, even in ultra-rare diseases such as GSD Ia. |
format | Article |
id | doaj-art-23fc132d3dcf49329ec12f486424deab |
institution | Kabale University |
issn | 1750-1172 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | Orphanet Journal of Rare Diseases |
spelling | doaj-art-23fc132d3dcf49329ec12f486424deab2025-01-12T12:39:31ZengBMCOrphanet Journal of Rare Diseases1750-11722025-01-0120111010.1186/s13023-025-03537-2A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profilesJoost Groen0Bas M. de Haan1Ruben J. Overduin2Andrea B. Haijer-Schreuder3Terry GJ Derks4M. Rebecca Heiner-Fokkema5Laboratory of Metabolic Diseases, Department of Laboratory Medicine, University Medical Center Groningen, University of GroningenLaboratory of Special Chemistry, Department of Laboratory Medicine, University Medical Center Groningen, University of GroningenDivision of Metabolic Diseases, Beatrix Children’s Hospital, University Medical Center Groningen, University of GroningenDivision of Metabolic Diseases, Beatrix Children’s Hospital, University Medical Center Groningen, University of GroningenDivision of Metabolic Diseases, Beatrix Children’s Hospital, University Medical Center Groningen, University of GroningenLaboratory of Metabolic Diseases, Department of Laboratory Medicine, University Medical Center Groningen, University of GroningenAbstract Background Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mostly routine clinical chemical markers and subsequent genetic confirmation. However, a specific and reliable biomarker is lacking. As GSD Ia patients demonstrate altered lipid metabolism and mitochondrial fatty acid oxidation, we built a machine learning model to identify GSD Ia patients based on plasma acylcarnitine profiles. Methods We collected plasma acylcarnitine profiles from 3958 patients, of whom 31 have GSD Ia. Synthetic samples were generated to address the problem of class imbalance in the dataset. We built several machine learning models based on gradient-boosted trees. Our approach included hyperparameter tuning and feature selection and generalization was checked using both nested cross-validation and a held-out test set. Results The binary classifier was able to correctly identify 5/6 GSD Ia patients in a held-out test set without generating significant amounts of false positive results. The best model showed excellent performance with a mean received operator curve (ROC) AUC of 0.955 and precision-recall (PR) curve AUC of 0.674 in nested CV. Conclusions This study demonstrates an innovative approach to applying machine learning to ultra-rare diseases by accurately identifying GSD Ia patients based on plasma free carnitine and acylcarnitine concentrations, leveraging subtle acylcarnitine abnormalities. Acylcarnitine features that were strong predictors for GSD Ia include C16-carnitine, C14OH-carnitine, total carnitine and acetylcarnitine. The model demonstrated high sensitivity and specificity, with selected parameters that were not only robust but also highly interpretable. Our approach offers potential prospect for the inclusion of GSD Ia in newborn screening. Rare diseases are underrepresented in machine learning studies and this work highlights the potential for these techniques, even in ultra-rare diseases such as GSD Ia.https://doi.org/10.1186/s13023-025-03537-2Rare diseasesMachine learningInborn metabolic diseasesArtificial intelligenceGlycogen storage diseaseAcylcarnitines |
spellingShingle | Joost Groen Bas M. de Haan Ruben J. Overduin Andrea B. Haijer-Schreuder Terry GJ Derks M. Rebecca Heiner-Fokkema A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles Orphanet Journal of Rare Diseases Rare diseases Machine learning Inborn metabolic diseases Artificial intelligence Glycogen storage disease Acylcarnitines |
title | A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles |
title_full | A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles |
title_fullStr | A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles |
title_full_unstemmed | A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles |
title_short | A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles |
title_sort | machine learning model accurately identifies glycogen storage disease ia patients based on plasma acylcarnitine profiles |
topic | Rare diseases Machine learning Inborn metabolic diseases Artificial intelligence Glycogen storage disease Acylcarnitines |
url | https://doi.org/10.1186/s13023-025-03537-2 |
work_keys_str_mv | AT joostgroen amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT basmdehaan amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT rubenjoverduin amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT andreabhaijerschreuder amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT terrygjderks amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT mrebeccaheinerfokkema amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT joostgroen machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT basmdehaan machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT rubenjoverduin machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT andreabhaijerschreuder machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT terrygjderks machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles AT mrebeccaheinerfokkema machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles |