Integrated Machine Learning Algorithms-Enhanced Predication for Cervical Cancer from Mass Spectrometry-Based Proteomics Data

Early diagnosis is critical for improving outcomes in cancer patients; however, the application of diagnostic markers derived from serum proteomic screening remains challenging. Artificial intelligence (AI), encompassing deep learning and machine learning (ML), has gained increasing prominence acros...

Full description

Saved in:
Bibliographic Details
Main Authors: Da Zhang, Lihong Zhao, Bo Guo, Aihong Guo, Jiangbo Ding, Dongdong Tong, Bingju Wang, Zhangjian Zhou
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Bioengineering
Subjects:
Online Access:https://www.mdpi.com/2306-5354/12/3/269
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Early diagnosis is critical for improving outcomes in cancer patients; however, the application of diagnostic markers derived from serum proteomic screening remains challenging. Artificial intelligence (AI), encompassing deep learning and machine learning (ML), has gained increasing prominence across various scientific disciplines. In this study, we utilized cervical cancer (CC) as a model to develop an AI-driven pipeline for the identification and validation of serum biomarkers for early cancer diagnosis, leveraging mass spectrometry-based proteomics data. By processing and normalizing serum polypeptide differential peaks from 240 patients, we employed eight distinct ML algorithms to classify and analyze these differential polypeptide peaks, subsequently constructing receiver operating characteristic (ROC) curves and confusion matrices. Key performance metrics, including accuracy, precision, recall, and F1 score, were systematically evaluated. Furthermore, by integrating feature importance values, Shapley values, and local interpretable model-agnostic explanation (LIME) values, we demonstrated that the diagnostic area under the curve (AUC) achieved by our multi-dimensional learning models approached 1, significantly outperforming the diagnostic AUC of single markers derived from the PRIDE database. These findings underscore the potential of proteomics-driven integrated machine learning as a robust strategy to enhance early cancer diagnosis, offering a promising avenue for clinical translation.
ISSN:2306-5354