A Python script to filter chemical compounds using structural alerts from ChEMBL and property filters from RDKit.
rd_filters is a Python script that filters chemical compounds using structural alerts from the ChEMBL database and property calculations from RDKit. It helps researchers identify molecules with undesirable functional groups or physicochemical properties, which is critical for early-stage drug discovery and chemical library triaging. The tool processes SMILES files in parallel and outputs filtered compounds along with detailed alert reports.
Cheminformaticians, computational chemists, and drug discovery researchers who need to screen compound libraries for problematic substructures and property violations.
Developers choose rd_filters because it consolidates multiple ChEMBL alert sets and RDKit property filters into a single, easy-to-use command-line tool with parallel processing support. Its open-source nature and customizable rules allow for tailored filtering workflows without relying on commercial software.
A script to run structural alerts using the RDKit and ChEMBL
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Defaults to using all CPU cores for efficient filtering of large datasets, as highlighted in the usage examples with the --np flag to optimize performance.
Allows fine-tuning via a JSON configuration file for property thresholds and alert set selection, enabling tailored workflows without code changes.
Includes modified SMARTS patterns to ensure compatibility with RDKit, addressing issues with original ChEMBL alerts as noted in the Notes.txt file.
Provides eight structural alert sets from ChEMBL, including PAINS, allowing immediate application of community-vetted filters without manual setup.
The README admits ChEMBL has little documentation on alert sets, making it hard to understand the rationale behind specific substructure flags without external research.
Requires RDKit installation, which can be challenging for users unfamiliar with cheminformatics toolchains, especially on non-Linux systems.
Users must handle alert_collection.csv and rules.json files, set environment variables, and ensure correct file paths, adding operational complexity.