Language Models for Predicting Organic Synthesis Procedures
In optimizing organic chemical synthesis, researchers often face challenges in efficiently generating viable synthesis procedures that conserve time and resources in laboratory settings. This paper systematically analyzes multiple approaches to efficiently generate synthesis procedures for a wide va...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/24/11526 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846106115221225472 |
|---|---|
| author | Mantas Vaškevičius Jurgita Kapočiūtė-Dzikienė |
| author_facet | Mantas Vaškevičius Jurgita Kapočiūtė-Dzikienė |
| author_sort | Mantas Vaškevičius |
| collection | DOAJ |
| description | In optimizing organic chemical synthesis, researchers often face challenges in efficiently generating viable synthesis procedures that conserve time and resources in laboratory settings. This paper systematically analyzes multiple approaches to efficiently generate synthesis procedures for a wide variety of organic synthesis reactions, aiming to decrease time and resource consumption in laboratory work. We investigated the suitability of different sizes of BART, T5, FLAN-T5, molT5, and classic sequence-to-sequence transformer models for our text-to-text task and utilized a large dataset prepared specifically for the task. Experimental investigations demonstrated that a fine-tuned molT5-large model achieves a BLEU score of 47.75. The results demonstrate the capability of LLMs to predict chemical synthesis procedures involving 24 possible distinct actions, many of which include various parameters like solvents, reaction agents, temperature, duration, solvent ratios, and other specific parameters. Our findings show that only when the core reactants are used as input, the models learn to correctly predict what ancillary components need to be included in the resulting procedure. These results are valuable for AI researchers and chemists, suggesting that curated datasets and large language model fine-tuning techniques can be tailored for specific reaction classes and practical applications. This research contributes to the field by demonstrating how deep-learning-based methods can be customized to meet the specific requirements of chemical synthesis, leading to more intelligent and resource-efficient laboratory processes. |
| format | Article |
| id | doaj-art-2225ea0fa2fa4c35b0b9353ec8ce3327 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-2225ea0fa2fa4c35b0b9353ec8ce33272024-12-27T14:07:33ZengMDPI AGApplied Sciences2076-34172024-12-0114241152610.3390/app142411526Language Models for Predicting Organic Synthesis ProceduresMantas Vaškevičius0Jurgita Kapočiūtė-Dzikienė1Department of Applied Informatics, Vytautas Magnus University, Universiteto str. 10–202, LT-44404 Kaunas, LithuaniaDepartment of Applied Informatics, Vytautas Magnus University, Universiteto str. 10–202, LT-44404 Kaunas, LithuaniaIn optimizing organic chemical synthesis, researchers often face challenges in efficiently generating viable synthesis procedures that conserve time and resources in laboratory settings. This paper systematically analyzes multiple approaches to efficiently generate synthesis procedures for a wide variety of organic synthesis reactions, aiming to decrease time and resource consumption in laboratory work. We investigated the suitability of different sizes of BART, T5, FLAN-T5, molT5, and classic sequence-to-sequence transformer models for our text-to-text task and utilized a large dataset prepared specifically for the task. Experimental investigations demonstrated that a fine-tuned molT5-large model achieves a BLEU score of 47.75. The results demonstrate the capability of LLMs to predict chemical synthesis procedures involving 24 possible distinct actions, many of which include various parameters like solvents, reaction agents, temperature, duration, solvent ratios, and other specific parameters. Our findings show that only when the core reactants are used as input, the models learn to correctly predict what ancillary components need to be included in the resulting procedure. These results are valuable for AI researchers and chemists, suggesting that curated datasets and large language model fine-tuning techniques can be tailored for specific reaction classes and practical applications. This research contributes to the field by demonstrating how deep-learning-based methods can be customized to meet the specific requirements of chemical synthesis, leading to more intelligent and resource-efficient laboratory processes.https://www.mdpi.com/2076-3417/14/24/11526deep learninglarge language modelorganic synthesissynthesis proceduremachine learningartificial intelligence |
| spellingShingle | Mantas Vaškevičius Jurgita Kapočiūtė-Dzikienė Language Models for Predicting Organic Synthesis Procedures Applied Sciences deep learning large language model organic synthesis synthesis procedure machine learning artificial intelligence |
| title | Language Models for Predicting Organic Synthesis Procedures |
| title_full | Language Models for Predicting Organic Synthesis Procedures |
| title_fullStr | Language Models for Predicting Organic Synthesis Procedures |
| title_full_unstemmed | Language Models for Predicting Organic Synthesis Procedures |
| title_short | Language Models for Predicting Organic Synthesis Procedures |
| title_sort | language models for predicting organic synthesis procedures |
| topic | deep learning large language model organic synthesis synthesis procedure machine learning artificial intelligence |
| url | https://www.mdpi.com/2076-3417/14/24/11526 |
| work_keys_str_mv | AT mantasvaskevicius languagemodelsforpredictingorganicsynthesisprocedures AT jurgitakapociutedzikiene languagemodelsforpredictingorganicsynthesisprocedures |