High-throughput approaches for PFAS: Generating fast, reliable data for ML investigations

Date
March 24, 2022

Machine learning (ML) techniques have demonstrated their ability to both interpolate and extrapolate useful information from datasets of known values. Unsurprisingly, this strength has led to an increasing trend in employing ML techniques for accelerating chemical discoveries. However, the availability of large quantities of property-specific data can be a significant challenge. Some experimental data may be costly or difficult to obtain, or, in the case of more novel compounds, may simply not exist. Computational methods are commonly employed to fill in such gaps. However, computed structural and property data can suffer similar detriments to availability at a large scale, and potentially more if differences in computational methodology are taken into account. This potential lack of availability presents problems for ML techniques which require a set of similar data on a large, diverse set of molecules for accurate interpretations and predictions. This presentation will demonstrate our framework methodology for rapidly generating optimized 3D coordinates de novo for a large number of fluorinated compounds and their subsequent use in building a database of computed physical and chemical properties.

Related Products

Thumbnail for Machine learning and data challenges in PFAS property prediction
Machine learning and data challenges in PFAS property prediction
Poly- and perfluoroalkyl substances (PFAS) represent a long-term contamination and health hazard challenge. These substances have highly advantageous properties that have led to them being used in a myriad of industries, materials, and products…
Thumbnail for PFAS classes for intelligent subset selection via stepwise machine learning cluster models to support remediation development
PFAS classes for intelligent subset selection via stepwise machine learning cluster models to support remediation development
Poly- and perfluoroalkyl substances PFASs are broad category of compound, which include a high number of carbon-fluorine bonds. According to the Organisation for Economic Co-operation and Development, a PFAS may be, generally, any molecule with a -CF2- or -CF3 moiety present…
Thumbnail for Using chemical identifiers to predict environmentally relevant properties of poly- and perfluorinated compounds
Using chemical identifiers to predict environmentally relevant properties of poly- and perfluorinated compounds
With the large, and ever increasing, number of poly- and perfluorinated compounds present in the environment, the task of predicting their movement, accumulation, and reactivity for the purposes of capture and/or remediation becomes an ever more daunting task…