Identifying optimal substrate classes of membrane transporters.

Membrane transporters are responsible for moving a wide variety of molecules across biological membranes, making them integral to key biological pathways in all organisms. Identifying all membrane transporters within a (meta-)proteome, along with their specific substrates, provides important informa...

Full description

Saved in:
Bibliographic Details
Main Authors: Andreas Denger, Volkhard Helms
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0315330
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841555431920500736
author Andreas Denger
Volkhard Helms
author_facet Andreas Denger
Volkhard Helms
author_sort Andreas Denger
collection DOAJ
description Membrane transporters are responsible for moving a wide variety of molecules across biological membranes, making them integral to key biological pathways in all organisms. Identifying all membrane transporters within a (meta-)proteome, along with their specific substrates, provides important information for various research fields, including biotechnology, pharmacology, and metabolomics. Protein datasets are frequently annotated with thousands of molecular functions that form complex networks, often with partial or full redundancy and hierarchical relationships. This complexity, along with the low sample count for more specific functions, makes them unsuitable as classes for supervised learning methods, meaning that the creation of an optimal subset of annotations is required. However, selection of this subset requires extensive manual effort, along with knowledge about the biology behind the respective functions. Here, we present an automated pipeline to address this problem. Unlike previous approaches for reducing redundancy in GO datasets, we employ machine learning to identify a subset of functional annotations in a training dataset. Classes in the resulting predictive model meet four essential criteria: sufficient sample size for training predictive models, minimal redundancy, strong class separability, and relevance to substrate transport. Furthermore, we implemented a pipeline for creating training datasets of transmembrane transporters that cover a wide range of organisms, including plants, bacteria, mammals, and single-cell eukaryotes. For a dataset containing 98.1% of transporters from S. cerevisiae, the pipeline automatically reduced the number of functional annotations from 287 to 11 GO terms that could be classified with a median pairwise F1 score of 0.87±0.16. For a meta-organism dataset containing 96% of all transport proteins from S. cerevisiae, A. thaliana, E. coli and human, the number of classes was reduced from 695 to 49, with a median F1 score of 0.92±0.10 between pairs of GO terms. When lowering the percentage of covered proteins down to 67%, the pipeline found a subset of 30 GO terms with a median F1 score of 0.95±0.06.
format Article
id doaj-art-02d47742dacd4133936512401b27718d
institution Kabale University
issn 1932-6203
language English
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-02d47742dacd4133936512401b27718d2025-01-08T05:32:52ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e031533010.1371/journal.pone.0315330Identifying optimal substrate classes of membrane transporters.Andreas DengerVolkhard HelmsMembrane transporters are responsible for moving a wide variety of molecules across biological membranes, making them integral to key biological pathways in all organisms. Identifying all membrane transporters within a (meta-)proteome, along with their specific substrates, provides important information for various research fields, including biotechnology, pharmacology, and metabolomics. Protein datasets are frequently annotated with thousands of molecular functions that form complex networks, often with partial or full redundancy and hierarchical relationships. This complexity, along with the low sample count for more specific functions, makes them unsuitable as classes for supervised learning methods, meaning that the creation of an optimal subset of annotations is required. However, selection of this subset requires extensive manual effort, along with knowledge about the biology behind the respective functions. Here, we present an automated pipeline to address this problem. Unlike previous approaches for reducing redundancy in GO datasets, we employ machine learning to identify a subset of functional annotations in a training dataset. Classes in the resulting predictive model meet four essential criteria: sufficient sample size for training predictive models, minimal redundancy, strong class separability, and relevance to substrate transport. Furthermore, we implemented a pipeline for creating training datasets of transmembrane transporters that cover a wide range of organisms, including plants, bacteria, mammals, and single-cell eukaryotes. For a dataset containing 98.1% of transporters from S. cerevisiae, the pipeline automatically reduced the number of functional annotations from 287 to 11 GO terms that could be classified with a median pairwise F1 score of 0.87±0.16. For a meta-organism dataset containing 96% of all transport proteins from S. cerevisiae, A. thaliana, E. coli and human, the number of classes was reduced from 695 to 49, with a median F1 score of 0.92±0.10 between pairs of GO terms. When lowering the percentage of covered proteins down to 67%, the pipeline found a subset of 30 GO terms with a median F1 score of 0.95±0.06.https://doi.org/10.1371/journal.pone.0315330
spellingShingle Andreas Denger
Volkhard Helms
Identifying optimal substrate classes of membrane transporters.
PLoS ONE
title Identifying optimal substrate classes of membrane transporters.
title_full Identifying optimal substrate classes of membrane transporters.
title_fullStr Identifying optimal substrate classes of membrane transporters.
title_full_unstemmed Identifying optimal substrate classes of membrane transporters.
title_short Identifying optimal substrate classes of membrane transporters.
title_sort identifying optimal substrate classes of membrane transporters
url https://doi.org/10.1371/journal.pone.0315330
work_keys_str_mv AT andreasdenger identifyingoptimalsubstrateclassesofmembranetransporters
AT volkhardhelms identifyingoptimalsubstrateclassesofmembranetransporters