A fast, robust Python library to detect offensive language in text using a machine learning model.
profanity-check is a Python library that detects profane or offensive language in text strings using a machine learning model. It solves the problem of inaccurate and slow profanity detection by training a linear SVM on a large dataset of labeled samples, providing fast and reliable predictions without relying on explicit blacklists.
Developers and data scientists building applications that require automated content moderation, such as social platforms, comment sections, or chat systems, where filtering offensive language is necessary.
Developers choose profanity-check for its superior combination of speed and accuracy, leveraging a machine learning model that outperforms traditional wordlist-based libraries and slower, more complex alternatives.
A fast, robust Python library to check for offensive language in strings.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieves 95% test accuracy and balanced metrics like precision and recall, outperforming similar libraries such as profanity-filter, as shown in the README's comparison table.
Processes predictions 300–4000 times faster than profanity-filter, with benchmarks showing 0.2 ms per prediction, making it ideal for real-time applications like chat moderation.
Uses a machine learning model without explicit blacklists, enabling it to detect phrases like 'You cocksucker' that wordlist-based libraries miss, as explained in the No Explicit Blacklist section.
Provides predict_prob() to output the probability a string is offensive, offering nuanced insights for fine-grained moderation decisions, as demonstrated in the usage examples.
Admits difficulty detecting less common variants like 'f4ck you' or 'you b1tch' due to their rarity in the training corpus, leading to potential misses, as noted in the Caveats section.
Trained exclusively on English datasets from sources like Kaggle, making it ineffective for profanity detection in other languages unless retrained, which isn't supported out-of-the-box.
Offers no built-in retraining or customization options, so it cannot adapt to new slang or domain-specific terminology without manual intervention, relying solely on the pre-trained linear SVM.