A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles

Abstract Background Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mo...

Full description

Saved in:
Bibliographic Details
Main Authors: Joost Groen, Bas M. de Haan, Ruben J. Overduin, Andrea B. Haijer-Schreuder, Terry GJ Derks, M. Rebecca Heiner-Fokkema
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Orphanet Journal of Rare Diseases
Subjects:
Online Access:https://doi.org/10.1186/s13023-025-03537-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841544266744070144
author Joost Groen
Bas M. de Haan
Ruben J. Overduin
Andrea B. Haijer-Schreuder
Terry GJ Derks
M. Rebecca Heiner-Fokkema
author_facet Joost Groen
Bas M. de Haan
Ruben J. Overduin
Andrea B. Haijer-Schreuder
Terry GJ Derks
M. Rebecca Heiner-Fokkema
author_sort Joost Groen
collection DOAJ
description Abstract Background Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mostly routine clinical chemical markers and subsequent genetic confirmation. However, a specific and reliable biomarker is lacking. As GSD Ia patients demonstrate altered lipid metabolism and mitochondrial fatty acid oxidation, we built a machine learning model to identify GSD Ia patients based on plasma acylcarnitine profiles. Methods We collected plasma acylcarnitine profiles from 3958 patients, of whom 31 have GSD Ia. Synthetic samples were generated to address the problem of class imbalance in the dataset. We built several machine learning models based on gradient-boosted trees. Our approach included hyperparameter tuning and feature selection and generalization was checked using both nested cross-validation and a held-out test set. Results The binary classifier was able to correctly identify 5/6 GSD Ia patients in a held-out test set without generating significant amounts of false positive results. The best model showed excellent performance with a mean received operator curve (ROC) AUC of 0.955 and precision-recall (PR) curve AUC of 0.674 in nested CV. Conclusions This study demonstrates an innovative approach to applying machine learning to ultra-rare diseases by accurately identifying GSD Ia patients based on plasma free carnitine and acylcarnitine concentrations, leveraging subtle acylcarnitine abnormalities. Acylcarnitine features that were strong predictors for GSD Ia include C16-carnitine, C14OH-carnitine, total carnitine and acetylcarnitine. The model demonstrated high sensitivity and specificity, with selected parameters that were not only robust but also highly interpretable. Our approach offers potential prospect for the inclusion of GSD Ia in newborn screening. Rare diseases are underrepresented in machine learning studies and this work highlights the potential for these techniques, even in ultra-rare diseases such as GSD Ia.
format Article
id doaj-art-23fc132d3dcf49329ec12f486424deab
institution Kabale University
issn 1750-1172
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series Orphanet Journal of Rare Diseases
spelling doaj-art-23fc132d3dcf49329ec12f486424deab2025-01-12T12:39:31ZengBMCOrphanet Journal of Rare Diseases1750-11722025-01-0120111010.1186/s13023-025-03537-2A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profilesJoost Groen0Bas M. de Haan1Ruben J. Overduin2Andrea B. Haijer-Schreuder3Terry GJ Derks4M. Rebecca Heiner-Fokkema5Laboratory of Metabolic Diseases, Department of Laboratory Medicine, University Medical Center Groningen, University of GroningenLaboratory of Special Chemistry, Department of Laboratory Medicine, University Medical Center Groningen, University of GroningenDivision of Metabolic Diseases, Beatrix Children’s Hospital, University Medical Center Groningen, University of GroningenDivision of Metabolic Diseases, Beatrix Children’s Hospital, University Medical Center Groningen, University of GroningenDivision of Metabolic Diseases, Beatrix Children’s Hospital, University Medical Center Groningen, University of GroningenLaboratory of Metabolic Diseases, Department of Laboratory Medicine, University Medical Center Groningen, University of GroningenAbstract Background Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mostly routine clinical chemical markers and subsequent genetic confirmation. However, a specific and reliable biomarker is lacking. As GSD Ia patients demonstrate altered lipid metabolism and mitochondrial fatty acid oxidation, we built a machine learning model to identify GSD Ia patients based on plasma acylcarnitine profiles. Methods We collected plasma acylcarnitine profiles from 3958 patients, of whom 31 have GSD Ia. Synthetic samples were generated to address the problem of class imbalance in the dataset. We built several machine learning models based on gradient-boosted trees. Our approach included hyperparameter tuning and feature selection and generalization was checked using both nested cross-validation and a held-out test set. Results The binary classifier was able to correctly identify 5/6 GSD Ia patients in a held-out test set without generating significant amounts of false positive results. The best model showed excellent performance with a mean received operator curve (ROC) AUC of 0.955 and precision-recall (PR) curve AUC of 0.674 in nested CV. Conclusions This study demonstrates an innovative approach to applying machine learning to ultra-rare diseases by accurately identifying GSD Ia patients based on plasma free carnitine and acylcarnitine concentrations, leveraging subtle acylcarnitine abnormalities. Acylcarnitine features that were strong predictors for GSD Ia include C16-carnitine, C14OH-carnitine, total carnitine and acetylcarnitine. The model demonstrated high sensitivity and specificity, with selected parameters that were not only robust but also highly interpretable. Our approach offers potential prospect for the inclusion of GSD Ia in newborn screening. Rare diseases are underrepresented in machine learning studies and this work highlights the potential for these techniques, even in ultra-rare diseases such as GSD Ia.https://doi.org/10.1186/s13023-025-03537-2Rare diseasesMachine learningInborn metabolic diseasesArtificial intelligenceGlycogen storage diseaseAcylcarnitines
spellingShingle Joost Groen
Bas M. de Haan
Ruben J. Overduin
Andrea B. Haijer-Schreuder
Terry GJ Derks
M. Rebecca Heiner-Fokkema
A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles
Orphanet Journal of Rare Diseases
Rare diseases
Machine learning
Inborn metabolic diseases
Artificial intelligence
Glycogen storage disease
Acylcarnitines
title A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles
title_full A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles
title_fullStr A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles
title_full_unstemmed A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles
title_short A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles
title_sort machine learning model accurately identifies glycogen storage disease ia patients based on plasma acylcarnitine profiles
topic Rare diseases
Machine learning
Inborn metabolic diseases
Artificial intelligence
Glycogen storage disease
Acylcarnitines
url https://doi.org/10.1186/s13023-025-03537-2
work_keys_str_mv AT joostgroen amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT basmdehaan amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT rubenjoverduin amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT andreabhaijerschreuder amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT terrygjderks amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT mrebeccaheinerfokkema amachinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT joostgroen machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT basmdehaan machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT rubenjoverduin machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT andreabhaijerschreuder machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT terrygjderks machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles
AT mrebeccaheinerfokkema machinelearningmodelaccuratelyidentifiesglycogenstoragediseaseiapatientsbasedonplasmaacylcarnitineprofiles