Language Models for Predicting Organic Synthesis Procedures

In optimizing organic chemical synthesis, researchers often face challenges in efficiently generating viable synthesis procedures that conserve time and resources in laboratory settings. This paper systematically analyzes multiple approaches to efficiently generate synthesis procedures for a wide va...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mantas Vaškevičius, Jurgita Kapočiūtė-Dzikienė
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Applied Sciences
Subjects:	deep learning large language model organic synthesis synthesis procedure machine learning artificial intelligence
Online Access:	https://www.mdpi.com/2076-3417/14/24/11526
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846106115221225472
author	Mantas Vaškevičius Jurgita Kapočiūtė-Dzikienė
author_facet	Mantas Vaškevičius Jurgita Kapočiūtė-Dzikienė
author_sort	Mantas Vaškevičius
collection	DOAJ
description	In optimizing organic chemical synthesis, researchers often face challenges in efficiently generating viable synthesis procedures that conserve time and resources in laboratory settings. This paper systematically analyzes multiple approaches to efficiently generate synthesis procedures for a wide variety of organic synthesis reactions, aiming to decrease time and resource consumption in laboratory work. We investigated the suitability of different sizes of BART, T5, FLAN-T5, molT5, and classic sequence-to-sequence transformer models for our text-to-text task and utilized a large dataset prepared specifically for the task. Experimental investigations demonstrated that a fine-tuned molT5-large model achieves a BLEU score of 47.75. The results demonstrate the capability of LLMs to predict chemical synthesis procedures involving 24 possible distinct actions, many of which include various parameters like solvents, reaction agents, temperature, duration, solvent ratios, and other specific parameters. Our findings show that only when the core reactants are used as input, the models learn to correctly predict what ancillary components need to be included in the resulting procedure. These results are valuable for AI researchers and chemists, suggesting that curated datasets and large language model fine-tuning techniques can be tailored for specific reaction classes and practical applications. This research contributes to the field by demonstrating how deep-learning-based methods can be customized to meet the specific requirements of chemical synthesis, leading to more intelligent and resource-efficient laboratory processes.
format	Article
id	doaj-art-2225ea0fa2fa4c35b0b9353ec8ce3327
institution	Kabale University
issn	2076-3417
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-2225ea0fa2fa4c35b0b9353ec8ce33272024-12-27T14:07:33ZengMDPI AGApplied Sciences2076-34172024-12-0114241152610.3390/app142411526Language Models for Predicting Organic Synthesis ProceduresMantas Vaškevičius0Jurgita Kapočiūtė-Dzikienė1Department of Applied Informatics, Vytautas Magnus University, Universiteto str. 10–202, LT-44404 Kaunas, LithuaniaDepartment of Applied Informatics, Vytautas Magnus University, Universiteto str. 10–202, LT-44404 Kaunas, LithuaniaIn optimizing organic chemical synthesis, researchers often face challenges in efficiently generating viable synthesis procedures that conserve time and resources in laboratory settings. This paper systematically analyzes multiple approaches to efficiently generate synthesis procedures for a wide variety of organic synthesis reactions, aiming to decrease time and resource consumption in laboratory work. We investigated the suitability of different sizes of BART, T5, FLAN-T5, molT5, and classic sequence-to-sequence transformer models for our text-to-text task and utilized a large dataset prepared specifically for the task. Experimental investigations demonstrated that a fine-tuned molT5-large model achieves a BLEU score of 47.75. The results demonstrate the capability of LLMs to predict chemical synthesis procedures involving 24 possible distinct actions, many of which include various parameters like solvents, reaction agents, temperature, duration, solvent ratios, and other specific parameters. Our findings show that only when the core reactants are used as input, the models learn to correctly predict what ancillary components need to be included in the resulting procedure. These results are valuable for AI researchers and chemists, suggesting that curated datasets and large language model fine-tuning techniques can be tailored for specific reaction classes and practical applications. This research contributes to the field by demonstrating how deep-learning-based methods can be customized to meet the specific requirements of chemical synthesis, leading to more intelligent and resource-efficient laboratory processes.https://www.mdpi.com/2076-3417/14/24/11526deep learninglarge language modelorganic synthesissynthesis proceduremachine learningartificial intelligence
spellingShingle	Mantas Vaškevičius Jurgita Kapočiūtė-Dzikienė Language Models for Predicting Organic Synthesis Procedures Applied Sciences deep learning large language model organic synthesis synthesis procedure machine learning artificial intelligence
title	Language Models for Predicting Organic Synthesis Procedures
title_full	Language Models for Predicting Organic Synthesis Procedures
title_fullStr	Language Models for Predicting Organic Synthesis Procedures
title_full_unstemmed	Language Models for Predicting Organic Synthesis Procedures
title_short	Language Models for Predicting Organic Synthesis Procedures
title_sort	language models for predicting organic synthesis procedures
topic	deep learning large language model organic synthesis synthesis procedure machine learning artificial intelligence
url	https://www.mdpi.com/2076-3417/14/24/11526
work_keys_str_mv	AT mantasvaskevicius languagemodelsforpredictingorganicsynthesisprocedures AT jurgitakapociutedzikiene languagemodelsforpredictingorganicsynthesisprocedures

Language Models for Predicting Organic Synthesis Procedures

Similar Items