How accurate is profanity-check compared to other libraries?

profanity-check boasts 95% test accuracy, outperforming profanity-filter's 91.8% and profanity's 85.6%. It also has higher recall and F1 scores, making it more reliable for catching offensive content while minimizing false negatives.

Can I use profanity-check for non-English text?

No, profanity-check is trained exclusively on English datasets from sources like Kaggle and hate speech databases. For other languages, you would need to retrain the model, which isn't straightforward with the current library's static setup.

How to handle false positives in profanity-check?

Use predict_prob() to get probability scores and set custom thresholds, e.g., flag content only if probability exceeds 0.7. This allows tuning based on your risk tolerance, though the README warns predictions are not infallible.

profanity-check vs better-profanity: which should I choose?

Choose profanity-check for speed and accuracy in English content moderation; it's 300–4000 times faster and more accurate due to ML. better-profanity is simpler for basic wordlist-based filtering if performance isn't critical and you prefer a blacklist approach.

How to integrate profanity-check into a Flask app?

Install via pip, then import predict or predict_prob in your routes. For example, check user input from forms before processing, using the arrays returned to filter or flag offensive content dynamically in real-time.

Does profanity-check work for detecting hate speech?

Partially, as it was trained on datasets including hate speech from sources like the Toxic Comment Challenge. However, it's optimized for general profanity, so for specialized hate speech detection, consider additional validation or custom models.

profanity-check — Offensive Language Detection Library

What is profanity-check?

profanity-check is a Python library that detects profane or offensive language in text strings using a machine learning model. It solves the problem of inaccurate and slow profanity detection by training a linear SVM on a large dataset of labeled samples, providing fast and reliable predictions without relying on explicit blacklists.

Target Audience

Developers and data scientists building applications that require automated content moderation, such as social platforms, comment sections, or chat systems, where filtering offensive language is necessary.

Value Proposition

Developers choose profanity-check for its superior combination of speed and accuracy, leveraging a machine learning model that outperforms traditional wordlist-based libraries and slower, more complex alternatives.

A fast, robust Python library to check for offensive language in strings.

Use Cases

Best For

Moderating user comments on forums or social media platforms
Filtering offensive language in real-time chat applications
Screening text content in educational or collaborative tools
Enhancing content safety in automated customer support systems
Integrating profanity detection into Python-based web applications
Benchmarking or replacing slower libraries like profanity-filter

Not Ideal For

Detecting intentionally obfuscated profanity like 'f4ck' or 'b1tch'
Multilingual content moderation beyond English
High-stakes applications requiring zero-tolerance for moderation errors
Projects needing custom retraining or domain-specific profanity lists

Pros & Cons

Pros

High Test Accuracy

Achieves 95% test accuracy and balanced metrics like precision and recall, outperforming similar libraries such as profanity-filter, as shown in the README's comparison table.

Exceptional Speed

Processes predictions 300–4000 times faster than profanity-filter, with benchmarks showing 0.2 ms per prediction, making it ideal for real-time applications like chat moderation.

Contextual Detection

Uses a machine learning model without explicit blacklists, enabling it to detect phrases like 'You cocksucker' that wordlist-based libraries miss, as explained in the No Explicit Blacklist section.

Probability Scoring

Provides predict_prob() to output the probability a string is offensive, offering nuanced insights for fine-grained moderation decisions, as demonstrated in the usage examples.

Cons

Poor with Variants

Admits difficulty detecting less common variants like 'f4ck you' or 'you b1tch' due to their rarity in the training corpus, leading to potential misses, as noted in the Caveats section.

Limited Language Support

Trained exclusively on English datasets from sources like Kaggle, making it ineffective for profanity detection in other languages unless retrained, which isn't supported out-of-the-box.

Static Model

Offers no built-in retraining or customization options, so it cannot adapt to new slang or domain-specific terminology without manual intervention, relying solely on the pre-trained linear SVM.

Frequently Asked Questions

What is profanity-check?

Target Audience

Value Proposition

Use Cases

Best For

Moderating user comments on forums or social media platforms
Filtering offensive language in real-time chat applications
Screening text content in educational or collaborative tools
Enhancing content safety in automated customer support systems
Integrating profanity detection into Python-based web applications
Benchmarking or replacing slower libraries like profanity-filter

Not Ideal For

Detecting intentionally obfuscated profanity like 'f4ck' or 'b1tch'
Multilingual content moderation beyond English
High-stakes applications requiring zero-tolerance for moderation errors
Projects needing custom retraining or domain-specific profanity lists

Pros & Cons

Pros

High Test Accuracy

Achieves 95% test accuracy and balanced metrics like precision and recall, outperforming similar libraries such as profanity-filter, as shown in the README's comparison table.

Exceptional Speed

Processes predictions 300–4000 times faster than profanity-filter, with benchmarks showing 0.2 ms per prediction, making it ideal for real-time applications like chat moderation.

Contextual Detection

Uses a machine learning model without explicit blacklists, enabling it to detect phrases like 'You cocksucker' that wordlist-based libraries miss, as explained in the No Explicit Blacklist section.

Probability Scoring

Provides predict_prob() to output the probability a string is offensive, offering nuanced insights for fine-grained moderation decisions, as demonstrated in the usage examples.

Cons

Poor with Variants

Admits difficulty detecting less common variants like 'f4ck you' or 'you b1tch' due to their rarity in the training corpus, leading to potential misses, as noted in the Caveats section.

Limited Language Support

Trained exclusively on English datasets from sources like Kaggle, making it ineffective for profanity detection in other languages unless retrained, which isn't supported out-of-the-box.

Static Model

Offers no built-in retraining or customization options, so it cannot adapt to new slang or domain-specific terminology without manual intervention, relying solely on the pre-trained linear SVM.

Frequently Asked Questions

profanity-check

What is profanity-check?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?

profanity-check

What is profanity-check?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?