False Positive Reporting - explained

As a user of the profanity filtering service, you interact with two main components: the request history and your custom profanity dataset with the selected filter's base model. Refer to custom model filtering help to learn more about this.

Here's how they work from your perspective:

Request History

Whenever you use the service to check text for profanity, the system records the details of your request.

This includes:

- What you asked to be checked: The actual text you submitted for profanity filtering.
- When you made the request: The date and time of your request.
- The outcome: What the service returned, such as whether it found profanity or not.

This record is part of your "request history." It's like a logbook of all the times you've used the service. You can look back at this history to see what was checked and when, which can be helpful for tracking and understanding how the service is being used and the decision of the filter model(s).

Custom Profanity Dataset

If you're an advanced user, you might have specific needs for what counts as profanity. For example, certain words might be fine in one context but not in another - ordinarily you could opt for the blacklist/whitelist system but you'd lose out on the filter smarts (it all depends on your use case).

This is where your custom profanity dataset comes in. It's a collection of text entries that you've identified as profane or not, tailored to your specific context. Here's what happens with your custom dataset:

- Training the Service: Your custom entries are used to teach the service about what you consider profane. This helps the filter model make better decisions when filtering text for you.
- Avoiding Redundancy: To keep your dataset effective, the system checks new entries against existing ones to avoid duplicates. This way, your dataset remains high-quality and efficient.
- Updating Your Preferences/Criteria: As you add more examples to your custom dataset, the service adapts and becomes more aligned with your specific needs and preferences.

In essence, your request history is a record of your interactions with the service, while your custom profanity dataset is a tool for personalizing the service to match your definition of inappropriate language. Together, they help create a profanity filtering experience that's tailored just for you.




Managing your dataset

To manage your custom model dataset:

  • Browse the filter logs or request history (non Discord users)
  • Find entries of interest in the logs
  • Use the "Train as bad" button to add the entry to your custom dataset
  • It won't be reflected in real time, the learning process happens at intervals and will take time to take affect. 
    • If you find yourself needing to add a lot of phrases to your custom dataset, you may need help from us to determine your use case. Is it domain specific (gaming community/hentai, etc)? We are exploring specialised domains, if this is of interest to you, reach out to us.

Entries you cannot train on would be entries already determined to be bad/toxic, in which case you can report as a false positive.

Similarly duplicate entries (either exact, or too similar will say "similar to trained dataset". Learn more.




Notes

Custom model dataset feature is designed to allow you to handle edge cases, or a handful of specific things you do not deem appropriate (e.g. target audience) that the filter does not filter (we are mindful, but focus on generally unacceptable forms of language)


Resources