Integrating generative AI with computational chemistry for catalyst design in biofuel/bioproduct applications

Date

March 18, 2024

Explore related products in the following collection:

ACS Spring 2024 - Sessions

Catalysts play an ubiquitous role in producing renewable fuels and chemicals to achieve world-wide NetZero goals. However, effective catalyst design requires significant effort on literature review and deep knowledge of catalysis fundamentals. With the advent of AI for scientific discovery, particularly large language models (LLMs), new possibilities exist for using learned language representations to augment decision making during the catalyst design process. However, issues dealing with uncertainty, such as LLM hallucinations and lack of explainability, can curtail the LLMs’ utility.

Our work focuses on integrating computational chemistry approaches, i.e., density functional theory (DFT), in the generative LLM to identify the suitable catalytic descriptor along with the corresponding catalysts. Identification of these catalytic descriptors provides a way to develop models of surface reactivity and helps to connect the most important atomistic level properties of the catalyst that govern the macroscopic catalytic activity. ChemReasoner, our proposed system, provides an ability to intelligently search the scientific literature through LLM knowledge representation for the optimal set of descriptors via feedback obtained from DFT-simulation guided ML models.

In this effort, we developed a heuristic tree search to augment catalyst discovery by prompting the LLM with various properties and descriptors. Compared with the baseline single-prompt LLM output, which mostly predicted monometallic catalysts, our LLM model provides access to various novel catalyst structures (e.g. metal alloys) with favorable adsorption energy to facilitate the target reaction. The model also provides more specific and scientifically viable explanations for the catalyst choice (i.e. less LLM hallucinations). We enable these improvements via integration of a surrogate graph neural network (GNN) trained on density functional theory (DFT) calculations (Open Catalyst Project) to verify the quality of LLM predicted catalysts using calculated adsorption energies, which boosts the validity of our methodology.

Speakers

Presenter

Mariefel Olarte

Pacific Northwest National Laboratory

Tracks

[COMP] Division of Computers in Chemistry

Related Products

Towards a Benchmark for Markov State Models: The Folding of HP35

Adopting a 300 us-long MD trajectory of the folding of Villin headpiece (HP35) by D.E.Shaw Research, we recently constructed a Markov state model (MSM) based on inter-residue contacts…

Unveiling the mechanistic pathways of the Chan-Lam C-N arylation reaction: A computational exploration

The copper-catalyzed Chan-Lam (CL) reaction has emerged as a powerful tool for the synthesis of diverse and valuable heteroatom-containing compounds…

Making the Most of Your Interview: Outshine the Competition