Error loading player: No playable sources found

3538616

Learning from literature-extracted synthesis actions for organic synthesis

Date
April 13, 2021

With advances in computational models for retrosynthetic analysis in recent years [1-4], it is now possible to routinely propose reliable synthetic routes for target molecules of interest. However, synthesizing such molecules accordingly has remained largely manual: both the formulation of adequate experimental steps and the actual synthesis in the laboratory rely on chemists’ knowledge and experience gathered in decades of practice.

To further accelerate chemical discovery and enable the automated synthesis of any suggested synthetic route (defined in terms of reagents and intermediates), one must be able to determine, in an automated fashion, the sequence of operations necessary to execute any reaction step in the laboratory. Given the reaction knowledge accumulated in the literature over many decades, data-driven strategies provide a natural approach for this task. Nevertheless, such strategies require data in a machine-friendly format, which is not readily available in the literature: experimental procedures are usually reported in prose.

Our recent work addresses these challenges in the following way. First, we design a transformer-based machine learning model to extract experimental actions from text. This model is pre-trained on a corpus of 2M sentences and associated actions obtained with a rule-based model, and fine-tuned on a set of more than 2000 hand-annotated sentences [5]. Second, the fine-tuned model is applied to experimental procedures from patents to generate a data set of 0.8M chemical equations and associated action sequences, with which, in a third step, we train another machine learning model predicting experimental operations for arbitrary reactions given in SMILES format [6].

Finally, we present how these machine learning models can be coupled with commercial chemical robots for autonomously synthesizing molecules.

Presenter

Speaker Image for Alain Vaucher
Research Scientist, IBM Zurich Research Laboratory

Related Products

Thumbnail for Inferring missing molecules in incomplete chemical equations
Inferring missing molecules in incomplete chemical equations
Deep-learning models applied to chemical reactions have received much attention in recent years:
Thumbnail for Human-in-the-loop for a disconnection aware retrosynthesis
Human-in-the-loop for a disconnection aware retrosynthesis
In single-step retrosynthesis, a target molecule is broken down by considering the bonds to be changed and/or functional group interconversions. In modern computer-assisted synthesis planning tools, the predictions of these changes are typically carried out automatically…
Thumbnail for POCSTagger: Identifying part-of-chemical-speech with transformers
POCSTagger: Identifying part-of-chemical-speech with transformers
In the quest to build better automatic retrosynthetic tools, the ability to interface artificial intelligence models with a more traditional computational chemistry software becomes of paramount importance…
Thumbnail for Low-data regime yield predictions with uncertainty estimation using deep learning approaches
Low-data regime yield predictions with uncertainty estimation using deep learning approaches
Artificial intelligence is driving one of the most important revolutions in organic chemistry…