Revisiting the functional annotation of TriTryp using sequence similarity tools

Trypanosomatids are the causative agents of deadly diseases in humans and livestock. Given the high phylogenetic distance of trypanosomatids from model organisms, these organisms have ample unannotated genes. Manual functional annotation is time-consuming, highlighting the importance of automated fu...

Full description

Saved in:
Bibliographic Details
Main Authors: Poorya Mirzavand Borujeni, Reza Salavati
Format: Article
Language:English
Published: Elsevier 2024-10-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844024152747
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846170173648666624
author Poorya Mirzavand Borujeni
Reza Salavati
author_facet Poorya Mirzavand Borujeni
Reza Salavati
author_sort Poorya Mirzavand Borujeni
collection DOAJ
description Trypanosomatids are the causative agents of deadly diseases in humans and livestock. Given the high phylogenetic distance of trypanosomatids from model organisms, these organisms have ample unannotated genes. Manual functional annotation is time-consuming, highlighting the importance of automated functional annotation tools. The development of automated functional tools is a hot research topic, and multiple tools have been developed for the task. PANNZER2 is an automated functional annotation tool that merely relies on the sequence similarity of the query to the annotated proteins. We tried PANNZER2 on Trypanosoma brucei, the most studied organism among trypanosomatids, to see if it could improve our knowledge of the functions of the genes.Even with the availability of automated annotation tools like InterPro2GO in databases such as TriTrypDB, PANNZER2 has made surprisingly confident predictions for some hypothetical proteins in T. brucei. In this study, we identify gaps in such annotations because of not employing pairwise sequence alignment tools in TriTrypDB's automated annotation process. Our findings demonstrate that even the use of stringent cutoffs can successfully annotate a significant number of proteins. Additionally, we discovered that adjusting the open reading frames in certain genes leads to sequences with increased sequence signature coverage—characterized by the length covered by at least one sequence signature—compared to the original sequences. This enhanced sequence signature coverage suggests these genomic fragments could be pseudogenes. To facilitate further exploration, we developed a script to help identify potential pseudogenes within an organism's genome, offering researchers a new tool for genomic analysis and understanding. We extended all our analysis to Trypanosoma cruzi and Leishmania major to assess the impact of this approach across different species.Our study demonstrates that by utilizing pairwise sequence similarity alignment, even with stringent cutoffs, we can attribute 2986, 3953, and 3798 new GO terms to the genomes of T. brucei, T. cruzi, and L. major. Additionally, we found that 210, 239, and 29 genes exhibit increased sequence signature coverage following frame correction, suggesting the presence of pseudogenes.
format Article
id doaj-art-8b7f1149e88a4119b5dde7b124e55e28
institution Kabale University
issn 2405-8440
language English
publishDate 2024-10-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj-art-8b7f1149e88a4119b5dde7b124e55e282024-11-12T05:20:10ZengElsevierHeliyon2405-84402024-10-011020e39243Revisiting the functional annotation of TriTryp using sequence similarity toolsPoorya Mirzavand Borujeni0Reza Salavati1Institute of Parasitology, McGill University, CanadaInstitute of Parasitology, McGill University, Canada; Department of Biochemistry, McGill University, Canada; Corresponding author. Institute of Parasitology, McGill University, Canada.Trypanosomatids are the causative agents of deadly diseases in humans and livestock. Given the high phylogenetic distance of trypanosomatids from model organisms, these organisms have ample unannotated genes. Manual functional annotation is time-consuming, highlighting the importance of automated functional annotation tools. The development of automated functional tools is a hot research topic, and multiple tools have been developed for the task. PANNZER2 is an automated functional annotation tool that merely relies on the sequence similarity of the query to the annotated proteins. We tried PANNZER2 on Trypanosoma brucei, the most studied organism among trypanosomatids, to see if it could improve our knowledge of the functions of the genes.Even with the availability of automated annotation tools like InterPro2GO in databases such as TriTrypDB, PANNZER2 has made surprisingly confident predictions for some hypothetical proteins in T. brucei. In this study, we identify gaps in such annotations because of not employing pairwise sequence alignment tools in TriTrypDB's automated annotation process. Our findings demonstrate that even the use of stringent cutoffs can successfully annotate a significant number of proteins. Additionally, we discovered that adjusting the open reading frames in certain genes leads to sequences with increased sequence signature coverage—characterized by the length covered by at least one sequence signature—compared to the original sequences. This enhanced sequence signature coverage suggests these genomic fragments could be pseudogenes. To facilitate further exploration, we developed a script to help identify potential pseudogenes within an organism's genome, offering researchers a new tool for genomic analysis and understanding. We extended all our analysis to Trypanosoma cruzi and Leishmania major to assess the impact of this approach across different species.Our study demonstrates that by utilizing pairwise sequence similarity alignment, even with stringent cutoffs, we can attribute 2986, 3953, and 3798 new GO terms to the genomes of T. brucei, T. cruzi, and L. major. Additionally, we found that 210, 239, and 29 genes exhibit increased sequence signature coverage following frame correction, suggesting the presence of pseudogenes.http://www.sciencedirect.com/science/article/pii/S2405844024152747Functional annotationSequence similarityPseudogenesTrypanosomatids
spellingShingle Poorya Mirzavand Borujeni
Reza Salavati
Revisiting the functional annotation of TriTryp using sequence similarity tools
Heliyon
Functional annotation
Sequence similarity
Pseudogenes
Trypanosomatids
title Revisiting the functional annotation of TriTryp using sequence similarity tools
title_full Revisiting the functional annotation of TriTryp using sequence similarity tools
title_fullStr Revisiting the functional annotation of TriTryp using sequence similarity tools
title_full_unstemmed Revisiting the functional annotation of TriTryp using sequence similarity tools
title_short Revisiting the functional annotation of TriTryp using sequence similarity tools
title_sort revisiting the functional annotation of tritryp using sequence similarity tools
topic Functional annotation
Sequence similarity
Pseudogenes
Trypanosomatids
url http://www.sciencedirect.com/science/article/pii/S2405844024152747
work_keys_str_mv AT pooryamirzavandborujeni revisitingthefunctionalannotationoftritrypusingsequencesimilaritytools
AT rezasalavati revisitingthefunctionalannotationoftritrypusingsequencesimilaritytools