3742532

Structured extraction of comprehensive chemical data from Wikipedia

Date
August 22, 2022

Wikipedia provides individual data pages for over 20,000 chemicals, making it a ubiquitous resource for open chemical information. Its collaborative community model introduces unique opportunities and challenges for the reuse of that information. Our new effort to comprehensively harvest the chemical space on Wikipedia, integrating automated, semi-automated, and human data extraction processes, has provided a structured dataset for comparison and reuse across sources. We discuss previous such efforts and expansions on their processes; correlate this dataset with other open resources; and analyze the derived data in relation to those resources.

Related Products

Thumbnail for Using cheminformatics approaches to develop a structure searchable database of analytical methods
Using cheminformatics approaches to develop a structure searchable database of analytical methods
Analytical methods can vary in nature from detailed regulatory methods to more summary in nature…
Thumbnail for AMOS: the EPA database of analytical methods and open mass spectral database supporting non-targeted analysis
AMOS: the EPA database of analytical methods and open mass spectral database supporting non-targeted analysis
The field of non-targeted analysis (NTA) is rapidly advancing due to technology developments supporting high-resolution mass spectrometry. The EPA has been developing software tools to support NTA and a core aspect of this work has been the delivery of supporting cheminformatics tools…
Thumbnail for Understanding open chemical structure information with InChI
Understanding open chemical structure information with InChI
One of the major advancements of the InChI standard is in growing interoperability of chemical structure information across open data resources…
Thumbnail for Comparison of lists of per- and polyfluoroalkyl substances (PFAS) based on different definitions
Comparison of lists of per- and polyfluoroalkyl substances (PFAS) based on different definitions
Per- and polyfluoroalkyl substances (PFAS) are a group of fluorinated substances that have generated increased public attention due to their potential health hazard and widespread presence in the environment. An attempt to define PFAS and establish standard categories was established in Buck et al…