The accreditors of this session require that you periodically check in to verify that you are still attentive.
Please click the button below to indicate that you are.
4109780
MoLMamba: A large state-space-based foundation model for Chemistry
Date
August 20, 2024
Chemical foundation models (FMs) have emerged as potent catalysts for scientific discovery, leveraging extensive pretraining on vast unlabeled datasets. Typically built upon sequence architectures like Transformers, these models excel in processing a wide array of inputs, ranging from SMILES notations to 3D images.
In this research, we present MoLMamba, a novel foundation model pretrained on a dataset encompassing 91 million SMILES (equivalent to 4.3 billion tokens) extracted from PubChem and carefully curated. Diverging from conventional Transformer architectures, MoLMamba adopts a state-space approach, offering advantages such as accelerated inference speeds and linear scalability with variable sequence length. Even when confronted with sequences spanning billions of tokens, MoLMamba maintains robust performance.
Evaluation on the MoleculeNet benchmark dataset underscores MoLMamba's capabilities across diverse tasks and domains. Its performance is compared with the current state-of-the-art methodologies, affirming its efficacy in chemical machine learning applications. MoLMamba's introduction marks a significant advancement in the field, offering promising avenues for further exploration and application in real-world chemical contexts.
Large-scale pre-training methodologies for chemical language models have revolutionized the field of cheminformatics, offering significant advancements in tasks such as molecular property prediction and molecule generation…
Biodegradability is a crucial factor in assessing the long-term impact of chemicals on the environment. However, experimental testing to determine biodegradability is time-consuming and laborious…
Chemical Foundation Models (FM) have successfully supported the creation of state-of-the-art property predictors, which evidence the quality of the corresponding latent space representations created by these models…
To convert 3D electron density grids into meaningful latent representations, vector quantized autoencoders have proven effective, particularly in addressing the blurriness typical of traditional variational autoencoders…