Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes

<b>Background/Objectives</b>: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based o...

Full description

Saved in:
Bibliographic Details
Main Authors: Erik D. Huckvale, Hunter N. B. Moseley
Format: Article
Language:English
Published: MDPI AG 2024-10-01
Series:Metabolites
Subjects:
Online Access:https://www.mdpi.com/2218-1989/14/11/582
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846153043098206208
author Erik D. Huckvale
Hunter N. B. Moseley
author_facet Erik D. Huckvale
Hunter N. B. Moseley
author_sort Erik D. Huckvale
collection DOAJ
description <b>Background/Objectives</b>: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists. <b>Methods</b>: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset. <b>Results</b>: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098. <b>Conclusions</b>: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.
format Article
id doaj-art-f4b152c4f2b143d19f150fbdc8f10c3b
institution Kabale University
issn 2218-1989
language English
publishDate 2024-10-01
publisher MDPI AG
record_format Article
series Metabolites
spelling doaj-art-f4b152c4f2b143d19f150fbdc8f10c3b2024-11-26T18:13:03ZengMDPI AGMetabolites2218-19892024-10-01141158210.3390/metabo14110582Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and GenomesErik D. Huckvale0Hunter N. B. Moseley1Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USAMarkey Cancer Center, University of Kentucky, Lexington, KY 40536, USA<b>Background/Objectives</b>: Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict based on metabolic pathways. However, there are many other types of pathways in cells and organisms that are of interest to biologists. <b>Methods</b>: While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in the KEGG. From these data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways, followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset. <b>Results</b>: The models trained on 6485 KEGG compounds and 502 pathways scored an overall mean Matthews correlation coefficient (MCC) performance of 0.847, a median MCC of 0.848, and a standard deviation of 0.0098. <b>Conclusions</b>: This performance on all 502 KEGG pathways represents a roughly 6% improvement over the performance of models trained on only the 184 KEGG metabolic pathways, which had a mean MCC of 0.800 and a standard deviation of 0.021. These results demonstrate the capability to effectively predict biochemical pathways in general, in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.https://www.mdpi.com/2218-1989/14/11/582pathway predictionMatthews correlation coefficientmachine learningmulti-layer perceptrontransfer learningKEGG
spellingShingle Erik D. Huckvale
Hunter N. B. Moseley
Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes
Metabolites
pathway prediction
Matthews correlation coefficient
machine learning
multi-layer perceptron
transfer learning
KEGG
title Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes
title_full Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes
title_fullStr Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes
title_full_unstemmed Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes
title_short Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes
title_sort predicting the pathway involvement of all pathway and associated compound entries defined in the kyoto encyclopedia of genes and genomes
topic pathway prediction
Matthews correlation coefficient
machine learning
multi-layer perceptron
transfer learning
KEGG
url https://www.mdpi.com/2218-1989/14/11/582
work_keys_str_mv AT erikdhuckvale predictingthepathwayinvolvementofallpathwayandassociatedcompoundentriesdefinedinthekyotoencyclopediaofgenesandgenomes
AT hunternbmoseley predictingthepathwayinvolvementofallpathwayandassociatedcompoundentriesdefinedinthekyotoencyclopediaofgenesandgenomes