Multi-fidelity graph neural networks for predicting toluene/water partition coefficients

Abstract Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available da...

Full description

Saved in:
Bibliographic Details
Main Authors: Thomas Nevolianis, Jan G. Rittig, Alexander Mitsos, Kai Leonhard
Format: Article
Language:English
Published: BMC 2025-08-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-01057-6
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available data, we apply multi-fidelity learning approaches leveraging a quantum chemical dataset (low fidelity) of approximately 9000 entries generated by COSMO-RS and an experimental dataset (high fidelity) of about 250 entries collected from the literature. We explore the transfer learning, feature-augmented learning, and multi-target learning approaches in combination with graph neural networks, validating them on two external datasets: one with molecules similar to training data (EXT-Zamora) and one with more challenging molecules (EXT-SAMPL9). Our results show that multi-target learning significantly improves predictive accuracy, achieving a root-mean-square error of 0.44 $$\log {P}$$ log P units for the EXT-Zamora, compared to a root-mean-square error of 0.63 $$\log {P}$$ log P units for single-task models. For the EXT-SAMPL9 dataset, multi-target learning achieves a root-mean-square error of 1.02 $$\log {P}$$ log P units, indicating reasonable performance even for more complex molecular structures. These findings highlight the potential of multi-fidelity learning approaches that leverage quantum chemical data to improve toluene/water partition coefficient predictions and address challenges posed by limited experimental data. We expect the applicability of the methods used beyond just toluene/water partition coefficients.
ISSN:1758-2946