A Python library for molecular processing built on RDKit with a simple API and good defaults.
Datamol is a Python library that simplifies molecular processing for cheminformatics and drug discovery. It provides a lightweight, user-friendly layer on top of RDKit, offering a simple API for common tasks like molecule standardization, fingerprint generation, and conformer analysis. It aims to make molecular manipulations more accessible by providing good defaults and efficient performance.
Cheminformaticians, computational chemists, and data scientists working in drug discovery or molecular modeling who need a streamlined tool for processing and analyzing chemical data.
Developers choose Datamol for its simple, Pythonic API that reduces boilerplate code, its performance optimizations like built-in parallelization, and its focus on providing sensible defaults for common molecular operations while maintaining full compatibility with RDKit.
Molecular Processing Made Easy.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers an intuitive interface with functions like dm.to_mol() and dm.viz.to_image(), reducing boilerplate code for common molecular manipulations as shown in the quick API tour.
Includes built-in efficient parallelization with optional progress bars, enhancing processing speed for large datasets, a key feature highlighted in the README.
Supports reading and writing multiple formats (SDF, Excel, CSV) with remote path handling via fsspec, enabling seamless cloud integration for data pipelines.
All operations manipulate standard rdkit.Chem.Mol objects, ensuring full compatibility with the RDKit ecosystem and ease of integration with existing code.
Heavily reliant on RDKit, so any bugs, performance issues, or feature gaps in RDKit directly affect Datamol, and installation can be complex on some systems.
Tested only with specific Python and RDKit versions as per the compatibility table, which may cause deployment issues with newer or unsupported combinations.
Focuses on simplification and sensible defaults, so users needing cutting-edge cheminformatics algorithms or extensive customization might find it lacking compared to native RDKit or specialized libraries.