3902284

Substrate-yield relationships in the low-data limit

Date
August 16, 2023
Explore related products in the following collection:

While millions of reactions have been published in the past few decades, modern synthetic chemistry is still dominated by a few dozen versatile and robust types of reactions, while many others remain relatively underexplored. A quantitative understanding of substrate compatibility can enable both the assessment of feasibility for hypothetical transformations, and the identification of understudied reactions that could potentially become cornerstone reactions in the chemist’s synthetic toolbox. Specifically, published substrate scope tables, which each correspond to a narrow reaction class and inherently represent a relationship between molecular structure and reaction yield, provide an opportunity to analyze substrate compatibility via reaction yield prediction.

However, most substrate scope tables are very small (<10-20 reactions), and are thus hard to model. Indeed, yield prediction models have to-date only successfully learned within large-scale, high-throughput experimental data regimes, where confounding variables are tightly controlled; in contrast, attempts to similarly model noisy, large-scale literature data have been unsuccessful. To this end, we evaluate the potential for building low-data substrate-yield models using over 6,700 scope tables constructed from the CAS Content Collection (literature and patent data), along with the implementation and results of common single-task machine learning (ML) models on individual scopes. We additionally examine the challenges of yield prediction from substrate scopes by (a) analyzing patterns in reported yield data and (b) comparing and critiquing molecular representations commonly used in ML applications for chemistry. Finally, we describe initial efforts to build a multi-task, meta-learning model that enables learning across multiple reaction classes to predict reaction yields.

Speakers

Speaker Image for John Bradshaw
Postdoctoral Associate, Massachusetts Institute of Technology
Speaker Image for Connor Coley
Massachusetts Institute of Technology

Related Products

Thumbnail for Machine Learning and AI for Organic Chemistry:
Machine Learning and AI for Organic Chemistry:
DIVISION/COMMITTEE: [CINF] Division of Chemical Information
Thumbnail for CINF Virtual Welcome Reception
CINF Virtual Welcome Reception
DIVISION/COMMITTEE: [CINF] Division of Chemical Information
Thumbnail for Machine Learning and AI for Organic Chemistry:
Machine Learning and AI for Organic Chemistry:
DIVISION/COMMITTEE: [CINF] Division of Chemical Information
Thumbnail for Machine Learning and AI for Organic Chemistry:
Machine Learning and AI for Organic Chemistry:
DIVISION/COMMITTEE: [CINF] Division of Chemical Information