A benchmarking platform for molecular generation models, providing datasets, implementations, and evaluation metrics for drug discovery research.
Molecular Sets (MOSES) is an open-source benchmarking platform for evaluating deep generative models in molecular generation for drug discovery. It provides a standardized dataset, pre-implemented models, and a comprehensive set of metrics to assess the quality, diversity, and novelty of generated molecules. The platform aims to facilitate reproducible research and comparison across different molecular generation approaches.
Researchers and data scientists working on machine learning for drug discovery, particularly those developing or evaluating deep generative models for molecular design. It is also valuable for academic labs and pharmaceutical companies aiming to benchmark their models against established baselines.
MOSES offers a standardized and reproducible framework that eliminates inconsistencies in evaluation, allowing researchers to compare models fairly. Its curated dataset, ready-to-use model implementations, and comprehensive metrics save time and effort, accelerating innovation in computational drug discovery.
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers a curated subset from ZINC with 1.9 million drug-like molecules, explicitly split into training, test, and scaffold test sets to ensure robust evaluation, as detailed in the dataset section.
Includes a wide range of metrics like validity, uniqueness, FCD, SNN, and novelty, providing a holistic assessment of molecular quality and diversity, illustrated in the detailed metrics table.
Provides end-to-end scripts for training, sampling, and evaluation, enabling consistent benchmarking and easy reproduction of results, as shown in the 'Reproducing the baselines' section.
Includes a Docker image to simplify installation and ensure environment consistency, though the image is large at 4.1Gb, as mentioned in the installation instructions.
Requires manual installation of RDKit and additional dependencies for models like LatentGAN, which can be error-prone on non-Linux systems, as noted in the installation steps.
Focuses on a fixed set of models (e.g., CharRNN, VAE, AAE) and may not support newer architectures without significant custom integration, limiting its adaptability.
The Docker image is 4.1Gb, which can be cumbersome for quick deployments or environments with limited storage, as highlighted in the Docker setup section.