SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features
Sequencing technologies have advanced considerably in recent years, leading to the sequencing of a vast number of proteins through laboratory methods. However, the functional annotation of these proteins has not kept pace with sequencing efforts, creating a significant gap between sequenced proteins...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11119523/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849762049144389632 |
|---|---|
| author | Gabriel Bianchin de Oliveira Helio Pedrini Zanoni Dias |
| author_facet | Gabriel Bianchin de Oliveira Helio Pedrini Zanoni Dias |
| author_sort | Gabriel Bianchin de Oliveira |
| collection | DOAJ |
| description | Sequencing technologies have advanced considerably in recent years, leading to the sequencing of a vast number of proteins through laboratory methods. However, the functional annotation of these proteins has not kept pace with sequencing efforts, creating a significant gap between sequenced proteins and those with known functions. To address this challenge, computational approaches based solely on amino acid sequence features have been developed to improve functional predictions. In this study, we introduce two novel approaches, one based on machine learning and another using an ensemble of machine learning with local alignment. Our machine learning-based model (SUPERMAGOv2) utilizes transformer-based backbones to extract features from multiple layers, which are then processed by six multilayer perceptrons that incorporate a novel bitscore-weighted input derived from DIAMOND alignments, and by an image classification model that converts the extracted feature vectors into images. Furthermore, we present SUPERMAGOv2+, an ensemble model that combines SUPERMAGOv2 with enhanced DIAMOND-based predictions. In addition, we introduce SUPERMAGOv2+Web, a lightweight web server version of SUPERMAGOv2+. Both proposed methods consistently outperform state-of-the-art approaches across various analyses, establishing themselves as leading methodologies for protein function classification based on amino acid sequences. |
| format | Article |
| id | doaj-art-b7f9cd21bc7e40f18c5403b51d5707b7 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-b7f9cd21bc7e40f18c5403b51d5707b72025-08-20T03:05:50ZengIEEEIEEE Access2169-35362025-01-011313974313975710.1109/ACCESS.2025.359685111119523SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted FeaturesGabriel Bianchin de Oliveira0https://orcid.org/0000-0002-1238-4860Helio Pedrini1https://orcid.org/0000-0003-0125-630XZanoni Dias2https://orcid.org/0000-0003-3333-6822Institute of Computing, University of Campinas, Campinas, BrazilInstitute of Computing, University of Campinas, Campinas, BrazilInstitute of Computing, University of Campinas, Campinas, BrazilSequencing technologies have advanced considerably in recent years, leading to the sequencing of a vast number of proteins through laboratory methods. However, the functional annotation of these proteins has not kept pace with sequencing efforts, creating a significant gap between sequenced proteins and those with known functions. To address this challenge, computational approaches based solely on amino acid sequence features have been developed to improve functional predictions. In this study, we introduce two novel approaches, one based on machine learning and another using an ensemble of machine learning with local alignment. Our machine learning-based model (SUPERMAGOv2) utilizes transformer-based backbones to extract features from multiple layers, which are then processed by six multilayer perceptrons that incorporate a novel bitscore-weighted input derived from DIAMOND alignments, and by an image classification model that converts the extracted feature vectors into images. Furthermore, we present SUPERMAGOv2+, an ensemble model that combines SUPERMAGOv2 with enhanced DIAMOND-based predictions. In addition, we introduce SUPERMAGOv2+Web, a lightweight web server version of SUPERMAGOv2+. Both proposed methods consistently outperform state-of-the-art approaches across various analyses, establishing themselves as leading methodologies for protein function classification based on amino acid sequences.https://ieeexplore.ieee.org/document/11119523/Image classificationmultilayer perceptronneural networklocal alignmentprotein function predictiontransformers |
| spellingShingle | Gabriel Bianchin de Oliveira Helio Pedrini Zanoni Dias SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features IEEE Access Image classification multilayer perceptron neural network local alignment protein function prediction transformers |
| title | SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features |
| title_full | SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features |
| title_fullStr | SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features |
| title_full_unstemmed | SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features |
| title_short | SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features |
| title_sort | supermagov2 protein function prediction via transformer embeddings and bitscore weighted features |
| topic | Image classification multilayer perceptron neural network local alignment protein function prediction transformers |
| url | https://ieeexplore.ieee.org/document/11119523/ |
| work_keys_str_mv | AT gabrielbianchindeoliveira supermagov2proteinfunctionpredictionviatransformerembeddingsandbitscoreweightedfeatures AT heliopedrini supermagov2proteinfunctionpredictionviatransformerembeddingsandbitscoreweightedfeatures AT zanonidias supermagov2proteinfunctionpredictionviatransformerembeddingsandbitscoreweightedfeatures |