Technical Program Archive

custom image

To learn more about the advanced search features, please refer to the Question Mark (?) icon below in the search window.

Please note: when the search returns an oral presentation, the session of that oral presentation and the session's full presentation schedule will be returned in the results.

Conference Color Key:

Virtual Sessions
In-Person Sessions
Hybrid Sessions

Reset

Advanced Filters
The Race and Road Ahead of Machine and Deep Learning in Drug Discovery, Metabolomics, and Toxicology:
04:30pm - 06:25pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Karina Martinez Mayorga, Organizer, Instituto de Quimica, UNAM; Jose Medina-Franco, Organizer, Universidad Nacional Autonoma de Mexico; Karina Martinez Mayorga, Presider, Instituto de Quimica, UNAM; Jose Medina-Franco, Presider, Universidad Nacional Autonoma de Mexico
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
Division/Committee: [CINF] Division of Chemical Information

Applications and development of artificial intelligence in Chemistry and Biology will continue to blow. In this symposium, we aim to collect and discuss recent progress with a focus on drug discovery, metabolomics, and toxicology. Progress on software and algorithm development, web servers, and practical applications in these three areas are welcome. Let's discuss the hype, the facts, and the opportunities on the field.

Tuesday
Introductory Remarks
04:30pm - 04:35pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual

Tuesday
Targeted expansion of the peptide chemical space by enumeration, genetic algorithms, and machine learning
04:35pm - 04:55pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Prof. Jean-Louis Reymond, Presenter, University of Bern
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
Artificial intelligence (AI) allows to learn and expand targeted regions of chemical space defined by collections of examples, and is revolutionizing molecular design. AI methods were mostly designed for text data and are therefore particularly well suited for peptide design because peptides can be simply written as character strings, each letter representing a different amino acid. Here we present three recent examples from our laboratory aimed at a selected expansion of the peptide chemical space, focusing on antimicrobial activities. Two examples involve molecular fingerprint guided selection in targeted libraries generated either by enumeration or using a genetic algorithm, and a third example is based on Recurrent Neural Networks for both sequence enumeration and compound selection. We will compare and discuss the different approaches in terms of scope, achievable molecular diversity, and results.

Tuesday
Combining machine learning with chemical knowledge to improve binding affinity predictions
04:55pm - 05:15pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Dr. Norberto Sánchez-Cruz, Presenter, Chemotargets SL; Jose Medina-Franco, Universidad Nacional Autonoma de Mexico; Jordi Mestres; Xavier Barril
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
Accurately predicting the binding affinity of a small molecule to a macromolecular target is a challenging problem in drug design. Scoring functions (SFs) from molecular docking play a key role for this task, since they intend to estimate the binding affinity of putative protein-ligand complexes. It has been demonstrated that classical SFs perform well for the tasks of docking (binding mode identification) and screening (identification of true binders), but the task of scoring (obtaining binding scores in a linear correlation with experimental binding data) still represents a challenge for them.

Advances in artificial intelligence and the increasing availability of structural and binding affinity data have given rise to a new type of SFs, the so-called machine learning SFs. These SFs have been found to outperform classical SFs in different tasks, particularly in terms of obtaining binding scores in a linear correlation with experimental data. The breakthroughs of deep learning led to the study of different neural network architectures for the development of new SFs. However, although the implementation of increasingly complex algorithms has been covered, the chemical description of the protein-ligand complexes has not been fully exploited.

In this talk we present a recently proposed set of descriptors to represent protein-ligand complexes: Extended Connectivity Interaction Features (ECIF). ECIF are a set of protein-ligand atom-type pair counts that consider the chemical neighborhood of each atom to define it and thus derive the possible pairs. We show the application of these descriptors for the derivation of machine learning SFs for binding affinity predictions. To demonstrate the descriptive power of ECIF, we show the superior performance of the generated SFs on the Comparative Assessment of Scoring Functions 2016 when compared to different state-of-the-art SFs.

Tuesday
Combining ensemble docking and machine learning to improve structure-based virtual screening
05:15pm - 05:35pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
Ensemble docking is an inexpensive and widely used methodology to account for receptor flexibility in Structure-based virtual screening (SBVS). During the ensemble docking campaign, each ligand is docked to a set of rigid conformations of the target receptor. As a result, multiple docking scores are computed per ligand. However, there is still no agreement on how to combine the ensemble docking scores to obtain the final ligand ranking. A common alternative to aggregate these results is the use of consensus strategies. However, these strategies exhibit slight improvement regarding the single-conformation approach. Here, we evaluated the use of machine learning methodologies over the ensemble docking results to enhance the predictive power of SBVS. We selected two targets as study cases: the CDK2 and FXa proteins. The conformational ensembles of these proteins were built from their available crystallographic structures. The compound library included compounds from three benchmarking datasets (DUD, DEKOIS2, CSAR-2012) and from cocrystallized molecules. Ensemble Docking was performed using smina/Vinardo, and the results were processed through 30x4 cross-validation to train and evaluate two machine learning (ML) classifiers: Logistic Regression and Gradient Boosting Trees. Then, we statistically compared the ML classifiers' performance with that of the traditional consensus strategies. We also explored the effect of protein ensemble sizes and conformational selection criteria on ML models' performance. Our results indicate that the ML classifiers significantly outperform traditional consensus strategies and even the best performances achieved by smina/Vinardo with single-structure docking. We provided statistical evidence that supports the effectiveness of ML to improve the ensemble docking performance.
<b>Overview of the methodology workflow: </b>ensemble docking and 30x4 cross-validation to implement machine learning models (GBT: Gradient Boosting Tress, LR: Logistic Regression) over ensemble docking scores. ML performance outperformed traditional consensus strategies (csGEO, csAVG, csMIN) and the best result of single-conformation docking (best SCD).

Overview of the methodology workflow: ensemble docking and 30x4 cross-validation to implement machine learning models (GBT: Gradient Boosting Tress, LR: Logistic Regression) over ensemble docking scores. ML performance outperformed traditional consensus strategies (csGEO, csAVG, csMIN) and the best result of single-conformation docking (best SCD).


Tuesday
Intermission
05:35pm - 05:45pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual

Tuesday
Development of skin sensitization, skin irritation, and eye irritation models using online data sources and Python-based machine learning
05:45pm - 06:05pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
In 2018, US EPA released a draft policy to reduce animal testing for skin sensitization. The goal
of this study was to assemble experimental data from online data sources and develop QSAR (quantitative structure activity relationship) models to predict skin sensitization, skin irritation, and eye irritation. Data was extracted from a variety of online data sources including eChemPortal, NICEATM, QSAR Toolbox, and the open literature. Using Java code, the data was converted to a consistent data format and stored in an SQLite database. Each record was mapped to a unique substance ID in EPA’s Distributed Structure-Searchable Toxicology Database. The substance ID allows one to associate each record with a “QSAR-ready” SMILES string which is then used to generate molecular descriptors. Data set records consist of an ID value, a property value, and the molecular descriptor values. Records which contained the same two-dimensional inChiKey were merged. Discordant records were omitted and the data sets were randomly split into a training and prediction sets. For the skin irritation models, to account for corrosive behavior, two layers of binary classification were employed from intervals of the primary irritation index endpoint: distinguishing active vs. inactive substances, and then within the active set, distinguishing irritant vs. corrosive substances. Models were built using methods including random forest, support vector machines (SVM), XGBOOST, Deep Neural Networks (DNN), and k nearest neighbors (kNN). We optimized the hyperparameters for each model by selecting the set which performed best for internal cross validation of the training set or among many different external validation sets. We optimized the classification error, gamma and nu parameters for the SVM method and the learning rate, estimator count, and maximum depth for the XGBoost method. Consensus models averaging the results from the approaches listed above were also evaluated.

Tuesday
Predictive global models of cruzain inhibitors with large chemical coverage
06:05pm - 06:25pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Jose Guadalupe Rosas, Presenter; Marco Garcia-Revilla; Dr. Abraham Madariaga, Universidad Nacional Autonoma de Mexico; Karina Martinez Mayorga, Instituto de Quimica, UNAM
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual

Chagas disease affects 8-11 million people worldwide. The two available drugs, nifurtimox and benznidazole, are 50 years old and need improvement in their pharmacological and toxicological profiles. Cruzain is a major cysteine protease in Trypanosoma cruzi, the etiological agent of Chagas disease, and it is involved in parasite survival and immune evasion. The use of cruzain inhibitors in animal models can decrease the parasite burden to undetectable levels and prevent heart damage, making cruzain an attractive drug target for Chagas disease. In this work, we compiled and carefully curated a database of 344 diverse cruzain inhibitors previously reported in scientific literature. This data set was used to build local and global predictive models of pIC50 values. For local models, molecules were classified according to their chemical family, and show high predictability even with linear regression algorithms. The performance of those models is comparable to previously published QSAR models for this endpoint. Global models, built with non-linear algorithms, performed with acceptable predictability, and cover the chemical space of currently known cruzain inhibitors. Global models are suitable for the screening of databases in a search for potential inhibitors. For this purpose, a Python script that applies the calculated models was prepared, and it is freely available to use in antichagasic drug research.

The Race and Road Ahead of Machine and Deep Learning in Drug Discovery, Metabolomics, and Toxicology:
07:00pm - 08:30pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Karina Martinez Mayorga, Organizer, Presider, Instituto de Quimica, UNAM; Jose Medina-Franco, Organizer, Presider, Universidad Nacional Autonoma de Mexico
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
Division/Committee: [CINF] Division of Chemical Information

Applications and development of artificial intelligence in Chemistry and Biology will continue to blow. In this symposium, we aim to collect and discuss recent progress with a focus on drug discovery, metabolomics, and toxicology. Progress on software and algorithm development, web servers, and practical applications in these three areas are welcome. Let's discuss the hype, the facts, and the opportunities on the field.

Tuesday
Introductory Remark
07:00pm - 07:05pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual

Tuesday
Artificial intelligence in chemistry-related discoveries: Current trends and future opportunities
07:05pm - 07:25pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
For chemists, artificial intelligence (AI) has shown great promise in dramatically reducing the amount of repetitive bench work and time spent predicting bioactivities of new drugs, optimizing reaction conditions, and suggesting synthetic routes to complex target molecules. Even as AI is being applied in various fields of chemistry, the most relevant use cases may not be apparent for many who are focused on their respective research interests. Here, the CAS content collection, covering chemistry from thousands of journals, 64 patent authorities, and many other sources, is analyzed as a corpus of literature to reveal the current trends and future opportunities of AI in chemistry. We contextualize the current research landscape by classifying and quantifying AI-related chemistry publications from 2000 to 2020. AI applications in various research areas of chemistry, their conditions for successful applications, and the connections among different research areas are discussed together with emerging use cases and challenges. Our work may assist researchers working in AI-amenable chemistry to identify growth opportunities, potential collaborations, and emerging work in underdeveloped areas.
Tuesday
Molecular de novo design using context-free grammars
07:25pm - 07:45pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Alejandro Hernandez Cano, Presenter; JOSE DE JESUS NAVEJA ROMERO; Dr. Abraham Madariaga, Universidad Nacional Autonoma de Mexico; Karina Martinez Mayorga, Instituto de Quimica, UNAM
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
Machine-learning (ML) technologies are expanding the boundaries of molecular de novo design. This cooperative effort allows increasing the searchable chemical space, whilst decreasing time and expenses related to chemical synthesis. This work seeks to use ML technologies to design synthetically feasible novel molecules, with suitable pharmacological and toxicological profiles and potential activity towards targets relevant in diabetes mellitus. This criteria will serve as an objective function to maximize in a multi-target optimization problem.

Character-by-character generation of SMILES by deep recurrent networks requires careful hyper-parameter initialization since catastrophic forgetting might occur for complex optimization targets. To overcome this problem we introduce a novel molecule optimization framework, which characterizes SMILES representation as a tree structure. Since the SMILES' sintaxis can be specified as a context-free grammar, it provides a definition of what can or can not be permitted in the generation of the SMILES strings. Using this specification, a deep neural network to learn how to properly generate SMILES is no longer needed. This strategy allowed us to only focus on maximizing the objective function, using the syntactic rules the grammar provides.

This tree structure will serve as the main component in a genetic algorithm optimization: we select promising molecules, then we recombine and mutate them to produce better fitted individuals. This tree can be parsed using the syntactic specification of the SMILES, which allows for a rich definition of genetic operators, while minimizing the chances of getting invalid structures.
Example of crossover operation between two SMILES using tree representation. In this case, the last character, <i>C</i>, of the first string, <i>COC</i>, was replaced by the prefix <i>[NH3+]</i> of the second string.

Example of crossover operation between two SMILES using tree representation. In this case, the last character, C, of the first string, COC, was replaced by the prefix [NH3+] of the second string.


Tuesday
Epigenetic target profiler: A web server for epigenetic target fishing via machine learning
07:45pm - 08:05pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Dr. Norberto Sánchez-Cruz, Presenter, Chemotargets SL; Jose Medina-Franco, Universidad Nacional Autonoma de Mexico
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual
The identification of protein targets of small molecules is an essential task in drug discovery projects. With the increasing amount of chemogenomic data in the public domain, multiple ligand-based models to support this task have emerged. These models usually assign the targets for a small molecule from the known targets of the most similar ligands in their datasets. Chemogenomic data for epigenetic targets represents a minimal amount when compared to other protein families such as ion channels, or G protein coupled receptors. This suggests that, despite their significant importance in drug discovery research, epigenetic targets are under-represented in the currently available tools.

In this talk, we present Epigenetic Target Profiler (ETP), an easy-to-use web application that through a combination of machine learning-based binary classification models can predict the bioactivity profile of small molecules over a panel of 55 epigenetic targets. We discuss the construction, validation and selection of the models implemented, which involved a comprehensive comparison of 15 machine learning models resulted from the combination of five different machine learning algorithms for binary classification and three molecular fingerprints of different design. ETP is part of D-Tools and it is freely available at http://www.epigenetictargetprofiler.com

Tuesday
Withdrawn
08:05pm - 08:25pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual

Tuesday
Concluding Remark
08:25pm - 08:30pm USA / Canada - Eastern - August 24, 2021 | Room: Zoom Room 07
Division: [CINF] Division of Chemical Information
Session Type: Oral - Virtual