SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features

Sequencing technologies have advanced considerably in recent years, leading to the sequencing of a vast number of proteins through laboratory methods. However, the functional annotation of these proteins has not kept pace with sequencing efforts, creating a significant gap between sequenced proteins...

Full description

Saved in:
Bibliographic Details
Main Authors: Gabriel Bianchin de Oliveira, Helio Pedrini, Zanoni Dias
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11119523/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849762049144389632
author Gabriel Bianchin de Oliveira
Helio Pedrini
Zanoni Dias
author_facet Gabriel Bianchin de Oliveira
Helio Pedrini
Zanoni Dias
author_sort Gabriel Bianchin de Oliveira
collection DOAJ
description Sequencing technologies have advanced considerably in recent years, leading to the sequencing of a vast number of proteins through laboratory methods. However, the functional annotation of these proteins has not kept pace with sequencing efforts, creating a significant gap between sequenced proteins and those with known functions. To address this challenge, computational approaches based solely on amino acid sequence features have been developed to improve functional predictions. In this study, we introduce two novel approaches, one based on machine learning and another using an ensemble of machine learning with local alignment. Our machine learning-based model (SUPERMAGOv2) utilizes transformer-based backbones to extract features from multiple layers, which are then processed by six multilayer perceptrons that incorporate a novel bitscore-weighted input derived from DIAMOND alignments, and by an image classification model that converts the extracted feature vectors into images. Furthermore, we present SUPERMAGOv2+, an ensemble model that combines SUPERMAGOv2 with enhanced DIAMOND-based predictions. In addition, we introduce SUPERMAGOv2+Web, a lightweight web server version of SUPERMAGOv2+. Both proposed methods consistently outperform state-of-the-art approaches across various analyses, establishing themselves as leading methodologies for protein function classification based on amino acid sequences.
format Article
id doaj-art-b7f9cd21bc7e40f18c5403b51d5707b7
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-b7f9cd21bc7e40f18c5403b51d5707b72025-08-20T03:05:50ZengIEEEIEEE Access2169-35362025-01-011313974313975710.1109/ACCESS.2025.359685111119523SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted FeaturesGabriel Bianchin de Oliveira0https://orcid.org/0000-0002-1238-4860Helio Pedrini1https://orcid.org/0000-0003-0125-630XZanoni Dias2https://orcid.org/0000-0003-3333-6822Institute of Computing, University of Campinas, Campinas, BrazilInstitute of Computing, University of Campinas, Campinas, BrazilInstitute of Computing, University of Campinas, Campinas, BrazilSequencing technologies have advanced considerably in recent years, leading to the sequencing of a vast number of proteins through laboratory methods. However, the functional annotation of these proteins has not kept pace with sequencing efforts, creating a significant gap between sequenced proteins and those with known functions. To address this challenge, computational approaches based solely on amino acid sequence features have been developed to improve functional predictions. In this study, we introduce two novel approaches, one based on machine learning and another using an ensemble of machine learning with local alignment. Our machine learning-based model (SUPERMAGOv2) utilizes transformer-based backbones to extract features from multiple layers, which are then processed by six multilayer perceptrons that incorporate a novel bitscore-weighted input derived from DIAMOND alignments, and by an image classification model that converts the extracted feature vectors into images. Furthermore, we present SUPERMAGOv2+, an ensemble model that combines SUPERMAGOv2 with enhanced DIAMOND-based predictions. In addition, we introduce SUPERMAGOv2+Web, a lightweight web server version of SUPERMAGOv2+. Both proposed methods consistently outperform state-of-the-art approaches across various analyses, establishing themselves as leading methodologies for protein function classification based on amino acid sequences.https://ieeexplore.ieee.org/document/11119523/Image classificationmultilayer perceptronneural networklocal alignmentprotein function predictiontransformers
spellingShingle Gabriel Bianchin de Oliveira
Helio Pedrini
Zanoni Dias
SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features
IEEE Access
Image classification
multilayer perceptron
neural network
local alignment
protein function prediction
transformers
title SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features
title_full SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features
title_fullStr SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features
title_full_unstemmed SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features
title_short SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features
title_sort supermagov2 protein function prediction via transformer embeddings and bitscore weighted features
topic Image classification
multilayer perceptron
neural network
local alignment
protein function prediction
transformers
url https://ieeexplore.ieee.org/document/11119523/
work_keys_str_mv AT gabrielbianchindeoliveira supermagov2proteinfunctionpredictionviatransformerembeddingsandbitscoreweightedfeatures
AT heliopedrini supermagov2proteinfunctionpredictionviatransformerembeddingsandbitscoreweightedfeatures
AT zanonidias supermagov2proteinfunctionpredictionviatransformerembeddingsandbitscoreweightedfeatures