Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation

Abstract Reliable molecular property prediction is essential for various scientific endeavors and industrial applications, such as drug discovery. However, the data scarcity, combined with the highly non-linear causal relationships between physicochemical and biological properties and conventional m...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue Wan, Jialu Wu, Tingjun Hou, Chang-Yu Hsieh, Xiaowei Jia
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-55082-4
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Reliable molecular property prediction is essential for various scientific endeavors and industrial applications, such as drug discovery. However, the data scarcity, combined with the highly non-linear causal relationships between physicochemical and biological properties and conventional molecular featurization schemes, complicates the development of robust molecular machine learning models. Self-supervised learning (SSL) has emerged as a popular solution, utilizing large-scale, unannotated molecular data to learn a foundational representation of chemical space that might be advantageous for downstream tasks. Yet, existing molecular SSL methods largely overlook chemical knowledge, including molecular structure similarity, scaffold composition, and the context-dependent aspects of molecular properties when operating over the chemical space. They also struggle to learn the subtle variations in structure-activity relationship. This paper introduces a multi-channel pre-training framework that learns robust and generalizable chemical knowledge. It leverages the structural hierarchy within the molecule, embeds them through distinct pre-training tasks across channels, and aggregates channel information in a task-specific manner during fine-tuning. Our approach demonstrates competitive performance across various molecular property benchmarks and offers strong advantages in particularly challenging yet ubiquitous scenarios like activity cliffs.
ISSN:2041-1723