Multi-fidelity graph neural networks for predicting toluene/water partition coefficients
Abstract Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available da...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-08-01
|
| Series: | Journal of Cheminformatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13321-025-01057-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available data, we apply multi-fidelity learning approaches leveraging a quantum chemical dataset (low fidelity) of approximately 9000 entries generated by COSMO-RS and an experimental dataset (high fidelity) of about 250 entries collected from the literature. We explore the transfer learning, feature-augmented learning, and multi-target learning approaches in combination with graph neural networks, validating them on two external datasets: one with molecules similar to training data (EXT-Zamora) and one with more challenging molecules (EXT-SAMPL9). Our results show that multi-target learning significantly improves predictive accuracy, achieving a root-mean-square error of 0.44 $$\log {P}$$ log P units for the EXT-Zamora, compared to a root-mean-square error of 0.63 $$\log {P}$$ log P units for single-task models. For the EXT-SAMPL9 dataset, multi-target learning achieves a root-mean-square error of 1.02 $$\log {P}$$ log P units, indicating reasonable performance even for more complex molecular structures. These findings highlight the potential of multi-fidelity learning approaches that leverage quantum chemical data to improve toluene/water partition coefficient predictions and address challenges posed by limited experimental data. We expect the applicability of the methods used beyond just toluene/water partition coefficients. |
|---|---|
| ISSN: | 1758-2946 |