Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Cheminformatics
  3. datamol

datamol

Apache-2.0Python0.12.5

A Python library for molecular processing built on RDKit with a simple API and good defaults.

Visit WebsiteGitHubGitHub
539 stars63 forks0 contributors

What is datamol?

Datamol is a Python library that simplifies molecular processing for cheminformatics and drug discovery applications. It provides a lightweight wrapper around RDKit, offering a user-friendly API for tasks like molecule manipulation, standardization, fingerprint generation, and visualization. It solves the problem of RDKit's sometimes verbose and complex interface by providing sensible defaults and modern utilities.

Target Audience

Cheminformatics researchers, computational chemists, and drug discovery scientists who need to process and analyze molecular data in Python. It's also suitable for bioinformatics developers building pipelines that involve chemical structure handling.

Value Proposition

Developers choose Datamol because it reduces the boilerplate code required for common molecular operations while maintaining full compatibility with RDKit. Its performance optimizations, modern I/O capabilities, and thoughtful defaults make it more productive than using raw RDKit directly for many tasks.

Overview

Molecular Processing Made Easy.

Use Cases

Best For

  • Processing and standardizing large chemical datasets
  • Generating molecular fingerprints and descriptors for machine learning
  • Visualizing 2D and 3D molecular structures in Jupyter notebooks
  • Converting between molecular representations (SMILES, SELFIES, InChI)
  • Working with chemical data from cloud storage (S3, GCS)
  • Building cheminformatics pipelines with parallel processing

Not Ideal For

  • Projects requiring deep, low-level access to RDKit's native API for custom modifications
  • Environments where RDKit cannot be installed due to system restrictions or licensing issues
  • Teams already heavily invested in alternative cheminformatics suites like Open Babel or ChemAxon
  • Real-time applications where minimal latency is critical, as the wrapper layer adds slight overhead

Pros & Cons

Pros

Intuitive Pythonic API

Simplifies common molecular operations like conversion and standardization with functions like dm.to_mol() and dm.sanitize_mol(), reducing RDKit's verbosity.

Built-in Performance Optimizations

Includes efficient parallelization with progress bars for tasks like conformer generation, enabling faster batch processing of large datasets.

Modern I/O Support

Leverages fsspec to read and write from remote paths (e.g., S3, GCS) for formats like SDF and CSV, facilitating cloud-native workflows.

Comprehensive Visualization Tools

Provides 2D and 3D molecular visualization directly in Jupyter notebooks via dm.viz.to_image() and dm.viz.conformers(), enhancing exploratory analysis.

Sensible Defaults

Reduces configuration overhead with well-chosen parameters for functions like sanitization and standardization, as highlighted in the philosophy.

Cons

RDKit Dependency Complexity

Inherits RDKit's installation challenges and strict versioning requirements, necessitating careful management as shown in the compatibility table.

Limited Advanced Feature Exposure

As a lightweight wrapper, it may not expose all RDKit functionalities, forcing users to revert to raw RDKit for niche or advanced tasks.

Ecosystem Lock-in

Tightly coupled to Python and RDKit, making it unsuitable for projects requiring multi-language support or integration with non-Python tools.

Performance Trade-offs

The abstraction layer can introduce minimal latency compared to direct RDKit usage, though optimized for common cases.

Frequently Asked Questions

Quick Stats

Stars539
Forks63
Contributors0
Open Issues11
Last commit1 month ago
CreatedSince 2021

Tags

#scientific-computing#cheminformatics#python-library#molecule#rdkit#python#drug-discovery#medicinal-chemistry#bioinformatics#data-processing#drug-design#molecular-modeling

Built With

R
RDKit
f
fsspec
P
Python
n
nglview

Links & Resources

Website

Included in

Cheminformatics848
Auto-fetched 5 hours ago

Related Projects

IndigoIndigo

Universal cheminformatics toolkit, utilities and database search tools

Stars398
Forks128
Last commit20 hours ago
MolecularGraph.jlMolecularGraph.jl

Graph-based molecule modeling toolkit for cheminformatics

Stars224
Forks26
Last commit29 days ago
ChemmineRChemmineR

Cheminformatics package for analyzing drug-like small molecule data in R

Stars0
Forks0
Last commit
CDK (Chemistry Development Kit)CDK (Chemistry Development Kit)

Algorithms for structural chemo- and bioinformatics, implemented in Java

Stars0
Forks0
Last commit
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub