Inferring experimental procedures from text-based representations of chemical reactions
Abstract The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retr...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2021-05-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-021-22951-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849221115814084608 |
|---|---|
| author | Alain C. Vaucher Philippe Schwaller Joppe Geluykens Vishnu H. Nair Anna Iuliano Teodoro Laino |
| author_facet | Alain C. Vaucher Philippe Schwaller Joppe Geluykens Vishnu H. Nair Anna Iuliano Teodoro Laino |
| author_sort | Alain C. Vaucher |
| collection | DOAJ |
| description | Abstract The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases. |
| format | Article |
| id | doaj-art-dcccf88e9c3d4e90a2cd8b02fcfe56f3 |
| institution | Kabale University |
| issn | 2041-1723 |
| language | English |
| publishDate | 2021-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-dcccf88e9c3d4e90a2cd8b02fcfe56f32024-11-24T12:35:23ZengNature PortfolioNature Communications2041-17232021-05-0112111110.1038/s41467-021-22951-1Inferring experimental procedures from text-based representations of chemical reactionsAlain C. Vaucher0Philippe Schwaller1Joppe Geluykens2Vishnu H. Nair3Anna Iuliano4Teodoro Laino5IBM Research EuropeIBM Research EuropeIBM Research EuropeIBM Research EuropeDipartimento di Chimica e Chimica Industriale, Università di PisaIBM Research EuropeAbstract The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.https://doi.org/10.1038/s41467-021-22951-1 |
| spellingShingle | Alain C. Vaucher Philippe Schwaller Joppe Geluykens Vishnu H. Nair Anna Iuliano Teodoro Laino Inferring experimental procedures from text-based representations of chemical reactions Nature Communications |
| title | Inferring experimental procedures from text-based representations of chemical reactions |
| title_full | Inferring experimental procedures from text-based representations of chemical reactions |
| title_fullStr | Inferring experimental procedures from text-based representations of chemical reactions |
| title_full_unstemmed | Inferring experimental procedures from text-based representations of chemical reactions |
| title_short | Inferring experimental procedures from text-based representations of chemical reactions |
| title_sort | inferring experimental procedures from text based representations of chemical reactions |
| url | https://doi.org/10.1038/s41467-021-22951-1 |
| work_keys_str_mv | AT alaincvaucher inferringexperimentalproceduresfromtextbasedrepresentationsofchemicalreactions AT philippeschwaller inferringexperimentalproceduresfromtextbasedrepresentationsofchemicalreactions AT joppegeluykens inferringexperimentalproceduresfromtextbasedrepresentationsofchemicalreactions AT vishnuhnair inferringexperimentalproceduresfromtextbasedrepresentationsofchemicalreactions AT annaiuliano inferringexperimentalproceduresfromtextbasedrepresentationsofchemicalreactions AT teodorolaino inferringexperimentalproceduresfromtextbasedrepresentationsofchemicalreactions |