SUPERMAGOv2: Protein Function Prediction via Transformer Embeddings and Bitscore-Weighted Features
Sequencing technologies have advanced considerably in recent years, leading to the sequencing of a vast number of proteins through laboratory methods. However, the functional annotation of these proteins has not kept pace with sequencing efforts, creating a significant gap between sequenced proteins...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11119523/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Sequencing technologies have advanced considerably in recent years, leading to the sequencing of a vast number of proteins through laboratory methods. However, the functional annotation of these proteins has not kept pace with sequencing efforts, creating a significant gap between sequenced proteins and those with known functions. To address this challenge, computational approaches based solely on amino acid sequence features have been developed to improve functional predictions. In this study, we introduce two novel approaches, one based on machine learning and another using an ensemble of machine learning with local alignment. Our machine learning-based model (SUPERMAGOv2) utilizes transformer-based backbones to extract features from multiple layers, which are then processed by six multilayer perceptrons that incorporate a novel bitscore-weighted input derived from DIAMOND alignments, and by an image classification model that converts the extracted feature vectors into images. Furthermore, we present SUPERMAGOv2+, an ensemble model that combines SUPERMAGOv2 with enhanced DIAMOND-based predictions. In addition, we introduce SUPERMAGOv2+Web, a lightweight web server version of SUPERMAGOv2+. Both proposed methods consistently outperform state-of-the-art approaches across various analyses, establishing themselves as leading methodologies for protein function classification based on amino acid sequences. |
|---|---|
| ISSN: | 2169-3536 |