When it comes to filtering for toxic language, whether that's profanity or content most people would be offended by there's no one size fits all for everyone. Given this fact, we've taken the approach of building and tuning different types of models for different scenarios.

Here's a breakdown of the filter methods with its use cases explained:

Blacklist/whitelist

Use case: You only wish to allow certain types of words on your community

Pros:

users can specify exactly what is allowed or blocked, offering a tailored content moderation approach.
It has built in capability for variation/letter substitution, meaning "smuurf", "$murf" would be treated as a defined bad word "smurf".
No limit on the number of words you can send in one message scan/submission.

Considerations:

You choose exactly what words you allow/disallow on your community
No semantic/intent understanding

Hybrid

Use case:

You want a smarter filter that takes a holistic approach that looks at the probabilities of bad words, and how often they show up in text. It is quite fast and can often be more sensitive compared to InsightGuard Filter type.

Pros:

It is quite fast, it has a base model that is trained on generally unacceptable language, allows you to define custom phrases to disallow and train as bad, and has false positive reporting capability.
ability to quickly scan and identify potential profanity based on word presence and variations, suitable for real-time applications where speed is crucial.
It can generally handle certain variation/letter substitutions, meaning "smuurf", "$murf" would be treated as a defined bad word "smurf".
No limit on the number of words you can send in one message scan/submission.

Considerations:

It can often be more sensitive compared to InsightGuard Filter type if a common word from training shows up in the submitted text, it has no contextual understanding. This means it does not try to understand the meaning/intent behind the text - this results in it being faster at processing text.

InsightGuard filter

Key features:

Trained on generally unacceptable language like profane language, sexualized content, hate speech, racism and more
Binary Classification: its training on "good/bad" language simplifies the model's decision-making process, potentially making it faster for straightforward content moderation tasks.
Custom Training: The ability to incorporate user-defined datasets allows for a high degree of customization. This feature is particularly valuable for platforms with unique content moderation needs or evolving language use.
It can handle certain variation/letter substitutions, meaning "smuurf", "$murf" would be treated as a defined bad word "smurf" (as an example).

Use case:

Best suited for environments where the content is complex, and the context of conversations is key to determining the appropriateness of the language. Ideal for adult audiences or platforms where discussions are sophisticated and layered.

Pros:

It has a base model that is trained on generally unacceptable language, allows you to define custom phrases to disallow and train as bad, and has false positive reporting capability. It has semantic/nuance and intent understanding.

Considerations:

No Bias Mitigation: Unlike the DeepSense Filter, there are no explicit mechanisms to mitigate the model's own biases, which could be a limitation in scenarios where sensitivity to identity-related content is crucial.
You can't customise class thresholds for content classes - choose DeepSense Filter if this is a requirement
255-word limit on you can send in one message scan/submission.

Sentiment analysis sample:

This is not a definitive/exhaustive list and has been simplified, given the content example below, it is hidden and needs to be explicitly toggled below.

DeepSense filter

Key features:

Toxic/Unbiased Toxic Handling: This capability to differentiate and mitigate the model's own biases against specific words or identities is a significant advancement. It addresses the challenge of false positives in cases where certain identities might trigger unintended toxic classifications. In other words, a sophisticated training regime aimed at reducing the model's inherent biases, making it more equitable and fair in content moderation.
Multilingual: The ability to support multiple languages (6: English, Italian, Spanish, Russian, Portuguese and Turkish) is a significant advantage for global platforms. This feature, combined with its nuanced understanding of context and efforts to mitigate bias, positions the DeepSense Filter as a versatile and sensitive tool for diverse communities. Learn more about Multilingual support
Word-level analysis option: extract word-level classification - allows reporting on how many bad words your users use and extract trends on the word usage. Also allows a more selective mode of content filtering if you care about censoring specific bad words instead of sentence-level classification.
Sentence classification: Ability to set class thresholds for up to 7 classes (toxic, severe toxic, obscene, identity attack, insult, threat and sexually explicit). You can define your threshold for certain classes to allow certain types/degrees of strictness resulting in more fine-tuning for your community. Additionally, you can turn classes off if you wish to not filter this type of class.

Class thresholds

Class thresholds can be configured under filter options per Discord server/licence. This allows a more fine-tuned approach to content moderation for your community.

Base detection for a class is 0.5, 0.9-1-2.5 being the highest and least strict. Setting the "toxicity" class threshold to 0.9 would allow all but the most target/severe forms of this class. This would allow you to see slurs that are causally used in conversation but not necessarily swearing at someone. Making the threshold higher will make the filter less sensitive and allow you to see more and more types of profanity and bad language. You may customise any of the 7 classes (which may be particularly useful if you wish to allow obscene or generally toxic language that is not too severe within your community). Experiment with the different class thresholds to see what works for you.

A 1.5 toxicity threshold would allow things like:

Use Case:

Excellently suited for platforms that require a high degree of sensitivity and accuracy in content moderation, including detecting subtle forms of toxicity that depend heavily on context. It's versatile across different types of communities.
Trained on generally unacceptable language like profane language, sexualized content, hate speech, racism and more.
Ability to understand the context of words in sentences by looking at the words that come before and after, providing a comprehensive understanding of language.
Communities that require a sophisticated level of moderation, detailed analysis and sentence classification.

Considerations:

Resource Intensity: could impact scalability and real-time processing needs.
Lack of Custom Training: The absence of user-defined dataset training might be a problem if your users use a lot of new slurs that are not the "norm" - InsightGuard Filter might be better for your use case if you frequently find yourself moderating new types of remarks unfamiliar to the machine learning model. It depends on your content moderation requirements.
555-word limit on you can send in one message scan/submission.

Thinking about InsightGuard filter or DeepSense Filter?

The InsightGuard filter is tailored for platforms seeking straightforward, binary content moderation with the option for high customization through user-defined datasets. It's well-suited for environments where the primary concern is filtering out universally recognized inappropriate content, with the flexibility to adapt to niche language use.
The DeepSense Filter, with its advanced handling of toxic/unbiased toxic content, multi-language support, and nuanced understanding of context, is ideal for platforms requiring sophisticated moderation capabilities. It's especially valuable in diverse, global communities and scenarios where sensitivity to identity-related content and the mitigation of model biases are critical.

The choice would depend on the specific needs of the platform, including the importance of bias mitigation, the need for customization, and the computational resources available for content moderation tasks.

In summary, while both filters provide advanced profanity and toxicity detection capabilities, the DeepSense Filter stands out for its sophisticated approach to understanding context, mitigating bias, and supporting multiple languages. The InsightGuard filter offers significant flexibility and customization potential, making it a strong candidate for platforms with specific moderation needs or those looking to evolve their content standards dynamically.

Next steps

Choosing the right profanity filter for your community is crucial to maintaining a positive and inclusive environment.

Our suite of filters, including Blacklist/Whitelist, Hybrid, InsightGuard Filter, and DeepSense Filter, offers a range of options tailored to different needs.

Work out your audience, your community policies and what kind of content you think your users or your community standards will need. Every community has a different demographic, age range, and sensitivity to certain topics.

Understand Your Community

Demographics: Consider the age, diversity, and interests of your community members.
Content-Type: Assess the nature of the content shared within your community (text, memes, etc.).
Community Standards: Define what is considered acceptable and unacceptable in your community.

Speed vs. Accuracy:

Determine if your priority is fast content moderation or thorough, context-sensitive filtering.
Customization: Decide if you need the ability to tailor the filter with your dataset.

Context Sensitivity:

Evaluate the importance of understanding the context and nuances of conversations in your community.

Bias Mitigation:

Consider if your community would benefit from a filter that reduces its own bias, especially in sensitive discussions.

Trial Period:

Use a trial period to test the selected filter(s) in your community.
Community Feedback: Gather feedback from your community members on the effectiveness and any issues with the chosen filter.
Adjust Accordingly: Be prepared to adjust your choice based on feedback and evolving community standards.

Example communities and recommendations

Gaming Communities with a Younger Audience

If you manage a gaming community with members ranging from late teens to early 20s, you're likely familiar with the challenges of moderating conversations. The presence of minors and the competitive nature of gaming can often lead to arguments and the use of typical bad language.

Recommendation:
Hybrid Filter: This is your go-to choice if the predominant issues in your community involve slurs and general profanity. The Hybrid filter is adept at identifying and managing common offensive language, making it a solid choice for communities where such language is the main concern.

The Hybrid filter, with its fast processing and sensitivity to variations of bad words, can quickly moderate such language, maintaining a friendly environment without significantly impacting the real-time nature of gaming interactions.

However, it's important to note that the Hybrid filter might not be the best fit if your community frequently encounters offensive content in memes or messages with more nuanced toxicity. In such cases, exploring other filter options is advisable.

Communities with a More Mature Audience:

Adult communities, or those that have been manually moderating content that includes complex sentences, nuanced meanings, and intent

Recommendation:

InsightGuard Filter. This community will benefit from the InsightGuard filter. This filter excels at understanding the context and subtleties of conversations, making it ideal for mature audiences where discussions may involve layered meanings or sophisticated language.

Educational Platforms for Young Learners

These platforms need to ensure a safe learning environment, free from any inappropriate language.

Recommendation:

Blacklist/Whitelist. The Blacklist/Whitelist filter method can effectively block known bad words (blacklisted words) and allow educational and child-friendly terms (whitelist), ensuring that the content is appropriate for young learners. This filter operates on a straightforward principle—blocking or allowing content based on predefined lists of words. It's suitable for communities with very specific moderation needs or where the acceptable language is clearly defined. However, it lacks the nuance to understand context or the subtlety of language.

Online Forums for Mental Health Support

Recommendation:

InsightGuard filter. Use Case: These forums require a nuanced understanding of language to differentiate between discussions that might include sensitive topics versus actual harmful content. The InsightGuard filter, with its ability to understand complex sentences, meanings, and intent, can provide the necessary moderation to support a safe space for vulnerable individuals seeking advice and sharing experiences.

Global Social Platforms

Communities that require a sophisticated level of moderation, detailed analysis and sentence classification

—especially those needing to navigate the complexities of hate speech, sexualization, and other nuanced forms of toxic content (up to 7 class types checked), similar to InsightGuard Filter—the DeepSense Filter filter is recommended. It leverages advanced models to understand context deeply, mitigate bias, and support multiple languages. This makes it particularly useful for diverse, global communities or those discussing sensitive topics.

Notes

Choosing the right profanity filter for your community depends on various factors, including the age range of your members, the type of content typically shared, and the specific challenges you face in content moderation. By understanding the strengths and limitations of each filter type—Hybrid, InsightGuard, Blacklist/Whitelist, and DeepSense—you can make an informed decision that best suits your community's needs.
Remember, the goal is to create a safe and welcoming environment for all members, and selecting the appropriate filter is a crucial step in achieving this objective.

The good news is you can experiment with what method you use, it depends on the type of content, the interactions of your community and how far they go with toxic content. If you have highly sensitive toxics getting discussed around race, transgender/sexual orientation Hybrid may not be the best fit.

Resources

Page:
Custom Model Filter - explained (ProfanityBlocker Support) —
As a user of the profanity filtering service, you interact with two main components: the request history and your custom profanity dataset. Here's how they work from your perspective:
Request History
Whenever you use the service to check text for profanity, the system records the details of your request. This includes:
- scaffolding
- filter
Page:
Inviting and bot setup (ProfanityBlocker Support)
—
There are two ways to add the bot to your Discord server:
Invite link via Dashboard
Invite link from third-party Discord bot list
- scaffolding
- filter
Page:
False Positive Reporting - explained (ProfanityBlocker Support) —
As a user of the profanity filtering service, you interact with two main components: the request history and your custom profanity dataset with the selected filter's base model. Refer to custom model filtering help to learn more about this.
Here's how they work from your perspective:
Request History
- scaffolding
- filter
Page:
Supported Languages in Models - explained (ProfanityBlocker Support) —
Our filter models (excluding DeepSense) are trained on the English language, and message submissions in other languages could potentially block those messages (a false positive). Conversely, the filter will be unable to detect toxic and inappropriate language that is not in English.
At present, the DeepSense filter supports the following languages simultaneously:
- scaffolding
- filter