Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.

Mitochondria and plastids import thousands of proteins. Their experimental localisation remains a frequent task, but can be resource-intensive and sometimes impossible. Hence, hundreds of studies make use of algorithms that predict a localisation based on a protein's sequence. Their reliability...

Full description

Saved in:
Bibliographic Details
Main Authors: Sven B Gould, Jonas Magiera, Carolina García García, Parth K Raval
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-11-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1012575
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846151693201309696
author Sven B Gould
Jonas Magiera
Carolina García García
Parth K Raval
author_facet Sven B Gould
Jonas Magiera
Carolina García García
Parth K Raval
author_sort Sven B Gould
collection DOAJ
description Mitochondria and plastids import thousands of proteins. Their experimental localisation remains a frequent task, but can be resource-intensive and sometimes impossible. Hence, hundreds of studies make use of algorithms that predict a localisation based on a protein's sequence. Their reliability across evolutionary diverse species is unknown. Here, we evaluate the performance of common algorithms (TargetP, Localizer and WoLFPSORT) for four photosynthetic eukaryotes (Arabidopsis thaliana, Zea mays, Physcomitrium patens, and Chlamydomonas reinhardtii) for which experimental plastid and mitochondrial proteome data is available, and 171 eukaryotes using orthology inferences. The match between predictions and experimental data ranges from 75% to as low as 2%. Results worsen as the evolutionary distance between training and query species increases, especially for plant mitochondria for which performance borders on random sampling. Specificity, sensitivity and precision analyses highlight cross-organelle errors and uncover the evolutionary divergence of organelles as the main driver of current performance issues. The results encourage to train the next generation of neural networks on an evolutionary more diverse set of organelle proteins for optimizing performance and reliability.
format Article
id doaj-art-747313b0b5834aa8a54b82d875ed97ef
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2024-11-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-747313b0b5834aa8a54b82d875ed97ef2024-11-27T05:30:46ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582024-11-012011e101257510.1371/journal.pcbi.1012575Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.Sven B GouldJonas MagieraCarolina García GarcíaParth K RavalMitochondria and plastids import thousands of proteins. Their experimental localisation remains a frequent task, but can be resource-intensive and sometimes impossible. Hence, hundreds of studies make use of algorithms that predict a localisation based on a protein's sequence. Their reliability across evolutionary diverse species is unknown. Here, we evaluate the performance of common algorithms (TargetP, Localizer and WoLFPSORT) for four photosynthetic eukaryotes (Arabidopsis thaliana, Zea mays, Physcomitrium patens, and Chlamydomonas reinhardtii) for which experimental plastid and mitochondrial proteome data is available, and 171 eukaryotes using orthology inferences. The match between predictions and experimental data ranges from 75% to as low as 2%. Results worsen as the evolutionary distance between training and query species increases, especially for plant mitochondria for which performance borders on random sampling. Specificity, sensitivity and precision analyses highlight cross-organelle errors and uncover the evolutionary divergence of organelles as the main driver of current performance issues. The results encourage to train the next generation of neural networks on an evolutionary more diverse set of organelle proteins for optimizing performance and reliability.https://doi.org/10.1371/journal.pcbi.1012575
spellingShingle Sven B Gould
Jonas Magiera
Carolina García García
Parth K Raval
Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.
PLoS Computational Biology
title Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.
title_full Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.
title_fullStr Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.
title_full_unstemmed Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.
title_short Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing.
title_sort reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing
url https://doi.org/10.1371/journal.pcbi.1012575
work_keys_str_mv AT svenbgould reliabilityofplastidandmitochondriallocalisationpredictiondeclinesrapidlywiththeevolutionarydistancetothetrainingsetincreasing
AT jonasmagiera reliabilityofplastidandmitochondriallocalisationpredictiondeclinesrapidlywiththeevolutionarydistancetothetrainingsetincreasing
AT carolinagarciagarcia reliabilityofplastidandmitochondriallocalisationpredictiondeclinesrapidlywiththeevolutionarydistancetothetrainingsetincreasing
AT parthkraval reliabilityofplastidandmitochondriallocalisationpredictiondeclinesrapidlywiththeevolutionarydistancetothetrainingsetincreasing