A Python library for molecular processing built on RDKit with a simple API and good defaults.
Datamol is a Python library that simplifies molecular processing for cheminformatics and drug discovery applications. It provides a lightweight wrapper around RDKit, offering a user-friendly API for tasks like molecule manipulation, standardization, fingerprint generation, and visualization. It solves the problem of RDKit's sometimes verbose and complex interface by providing sensible defaults and modern utilities.
Cheminformatics researchers, computational chemists, and drug discovery scientists who need to process and analyze molecular data in Python. It's also suitable for bioinformatics developers building pipelines that involve chemical structure handling.
Developers choose Datamol because it reduces the boilerplate code required for common molecular operations while maintaining full compatibility with RDKit. Its performance optimizations, modern I/O capabilities, and thoughtful defaults make it more productive than using raw RDKit directly for many tasks.
Molecular Processing Made Easy.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Simplifies common molecular operations like conversion and standardization with functions like dm.to_mol() and dm.sanitize_mol(), reducing RDKit's verbosity.
Includes efficient parallelization with progress bars for tasks like conformer generation, enabling faster batch processing of large datasets.
Leverages fsspec to read and write from remote paths (e.g., S3, GCS) for formats like SDF and CSV, facilitating cloud-native workflows.
Provides 2D and 3D molecular visualization directly in Jupyter notebooks via dm.viz.to_image() and dm.viz.conformers(), enhancing exploratory analysis.
Reduces configuration overhead with well-chosen parameters for functions like sanitization and standardization, as highlighted in the philosophy.
Inherits RDKit's installation challenges and strict versioning requirements, necessitating careful management as shown in the compatibility table.
As a lightweight wrapper, it may not expose all RDKit functionalities, forcing users to revert to raw RDKit for niche or advanced tasks.
Tightly coupled to Python and RDKit, making it unsuitable for projects requiring multi-language support or integration with non-Python tools.
The abstraction layer can introduce minimal latency compared to direct RDKit usage, though optimized for common cases.