T-cell receptor binding prediction: A machine learning revolution

Recent advancements in immune sequencing and experimental techniques are generating extensive T cell receptor (TCR) repertoire data, enabling the development of models to predict TCR binding specificity. Despite the computational challenges posed by the vast diversity of TCRs and epitopes, significa...

Full description

Saved in:
Bibliographic Details
Main Authors: Anna Weber, Aurélien Pélissier, María Rodríguez Martínez
Format: Article
Language:English
Published: Elsevier 2024-09-01
Series:ImmunoInformatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667119024000107
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recent advancements in immune sequencing and experimental techniques are generating extensive T cell receptor (TCR) repertoire data, enabling the development of models to predict TCR binding specificity. Despite the computational challenges posed by the vast diversity of TCRs and epitopes, significant progress has been made. This review explores the evolution of computational models designed for this task, emphasizing machine learning efforts, including early unsupervised clustering approaches, supervised models, and recent applications of Protein Language Models (PLMs), deep learning models pretrained on extensive collections of unlabeled protein sequences that capture crucial biological properties.We survey the most prominent models in each category and offer a critical discussion on recurrent challenges, including the lack of generalization to new epitopes, dataset biases, and shortcomings in model validation designs. Focusing on PLMs, we discuss the transformative impact of Transformer-based protein models in bioinformatics, particularly in TCR specificity analysis. We discuss recent studies that exploit PLMs to deliver notably competitive performances in TCR-related tasks, while also examining current limitations and future directions. Lastly, we address the pressing need for improved interpretability in these often opaque models, and examine current efforts to extract biological insights from large black box models.
ISSN:2667-1190