Dashboard-V2

ProfanityBlocker Release Notes


Hello there,

We are glad to inform you that the latest version Dashboard-V2 of ProfanityBlocker is out. Below is the list of JIRA issues that were part of this upgrade.

Summary -

Project - ProfanityBlocker
Version - Dashboard-V2
Planned Release Date -

New Feature

Priority Key Summary Description Status Component/s
Low PB-171 Request History csv export (non-discord view)

Allows to download 1K entries from request history against the service licence

Done Dashboard
Medium PB-89 Overhaul Profanity Training and False Positive Reporting System

Implement false positive reporting for request history.

We no longer store false positives and profanity training data in .csv files in the dashboard filesystem.

Done Dashboard
Low PB-51 DeepSense class thresholds, probability tuning for clients

Interpret probabilities from the class result, and allow users to set sensitivity thresholds for how confident and strict the DeepSense filter is, that the content is bad.

 

Done Dashboard, DeepSense, Filter

Improvements

Dashboard
Priority Key Summary Description Status Component/s
Highest PB-167 Create placeholder users, licences and transaction for guilds and link them - transfer ownership when owner logs in

This further simplifies onboarding as we create a placeholder user, licence and transaction for the guild the bot is in so users can use the free plan without additional action (no need to visit dashboard).

 

When logging in via Discord, we check owned guilds for matching placeholder licences, create the new discord account and transfer ownership.

Done Dashboard
Low PB-129 Dashboard filter Options for non-discord users

Add licence filter options to non-Discord dashboard.

Done Dashboard
Medium PB-109 If Discord serverlist is empty we should try the login again

This also fixes a keyerror when serverlist is empty

Done Dashboard
Medium PB-91 Separated OAuth and Django Authentication

. Refactored User and Password Generation Functions: We updated the MakeDiscordOTPUser and MakeDiscordOTPPass functions to ensure they are used correctly for generating random usernames and passwords when creating new user accounts. These functions are not used for authentication purposes.

    2. Updated User Authentication Logic: We modified the authenticateuser function to handle the Discord OAuth login flow properly. The function now:
    - Checks if a user with the provided Discord email already exists in the database.
    - If the user exists, it logs them in without checking the password, as the OAuth process handles authentication.
    - If the user does not exist, it creates a new user with a random username and password. The password is not used for login purposes since Discord OAuth is used for authentication.

    3. Separated OAuth and Django Authentication: We ensured that the Discord OAuth login flow is separate from the Django login flow. The OAuth process does not rely on the user's Django password for authentication, and the user's session is managed independently of the Django password.

    5. Session Management: After a successful OAuth login, the user's session is created or updated in a way that is recognized by the Django application, regardless of the user's password in Django.

    These changes aim to provide a seamless login experience for users who choose to log in via Discord OAuth and to ensure that the application's internal authentication mechanisms (such as password changes through the Django admin) do not interfere with the OAuth login process.

Done Dashboard
Dashboard, Discord Bot
Priority Key Summary Description Status Component/s
Medium PB-145 Simplify onboarding for discord
  • Simplify invite screen with clearer text on the steps to complete setup.
  • Simplify and explain on the bot side what to do when the bot is directly invited (not via dashboard)
    • Strings in strings for easier maintainability
  • Specially crafted login link that leads to the Discord server Home page (one-click setup)
  • Use embeds to make the onboarding channel message tidier
  • Update dashboard bot command to use the one-click login to server home dashboard
Done Dashboard, Discord Bot
Dashboard, Filter
Priority Key Summary Description Status Component/s
Blocker PB-137 Redis, celery and Kafka - Database optimisations

With the use of the Chrome extension, we have observed a significant increase in the number of requests made to the filter, often reaching a thousand within a 30-minute span.

Improvements:

  • Implemented caching for frequently accessed data, especially for results of expensive queries.
  • Utilized Django's caching framework to reduce database load during high traffic periods.
  • Various caching backends, such as Redis, are implemented for enhanced performance.
Done Dashboard, Filter
DeepSense, Filter
Priority Key Summary Description Status Component/s
Medium PB-131 DeepSense class thresholds needs request history to store thresholds for request history

Since clients can set probabilities for the class result, reproducing results from false positives will require us to know the class thresholds for the request at the time it was made. Save in request history table (DB). Request history is only kept for 30 days.

 

Done DeepSense, Filter
Discord Bot, Filter
Priority Key Summary Description Status Component/s
Medium PB-90 Better align filter and bot request history recording

Improve synchronization between the filter service and the Discord bot for recording request history.

Previously, both the filter service and the Discord bot were independently creating request history entries, resulting in duplicate entries. Furthermore, the fields in these entries did not match, leading to inconsistencies in the displayed data in the request history views.

To address this issue, we have implemented a solution. The filter service now includes the request ID it generates, and the Discord bot reads this ID to update the corresponding record with additional information, rather than creating a new entry in the database.

This ensures better alignment between the filter service and the Discord bot in recording request history, eliminating duplicate entries and ensuring consistent data in the request history views.

Done Discord Bot, Filter

Tasks

Celery, Dashboard, Discord Bot
Priority Key Summary Description Status Component/s
Medium PB-186 Instantly create licences for new bot users

Improved onboarding by creating a service licence on the spot, bypassing our background job which can take a few minutes - our analytics found users adding the bot, immediately sending content into their channel (presumably to test) and removing it when nothing happened (because licence setup in the background hadn't finished/ran yet.

Done Celery, Dashboard, Discord Bot
Dashboard
Priority Key Summary Description Status Component/s
Medium PB-142 Delete request history data after 30 days where there is no linked custom datasets

Reduce clutter by removing request history entries older than 30 days.

Dataset entries created from request history stays intact for tracing/integrity - migration required if we want to delete the original request history entry linked to custom datasets, out of scope of this task.

Done Dashboard
Dashboard, Discord Premium, Filter, Hybrid, InsightGuard
Priority Key Summary Description Status Component/s
Low PB-81 PoC: custom model criteria

At a high level: Implement custom model training per client.
 
TL;DR you can add custom data per licence and we train to block that type of content. It should be noted that training requires premium as you would need to select data from discord server logs to add to to the clients' custom training data 

 

As per PB-89 we do not store the custom training data on FS but in DB, and we do not do version as this issue originally went towards.

Done Dashboard, Discord Premium, Filter, Hybrid, InsightGuard
 

Subtasks

Dashboard
Priority Key Summary Description Status Component/s
Low PB-93 False Positive Validation

*Enhancements to False Positive Reports Management*

  • *Validation Results Display Enhancement*: We've improved the False Positive Reports interface to show validation results for entries both with and without licenses. This enhancement allows users to clearly distinguish between the two scenarios, improving the analysis capabilities of the dashboard. The display templates have been updated to include these new fields, and a database migration has been added to incorporate these changes into the schema.
  • *Global False Positive Marking for Superusers*: Superusers now have the ability to designate reports as global false positives directly from the dashboard. This new feature, accessible through the FalsePositiveDatasetView, adds a layer of permission-based control and includes updates to the FalsePositive_Reports_Dataset model to reflect the global status. The dashboard's user interface now features a button for this action, streamlining the process for superusers and enhancing report management across all clients.
  • *CSV Export of False Positives*: We've refined the command line output to provide more precise probability information regarding profanity checks. This includes accurate handling of probabilities from both the base and extra models, as well as calculations for non-profane probabilities. Additionally, superusers can now download a CSV file of false positive reports from the dashboard. This new functionality facilitates the easy collection of data associated with various licenses.
  • *Global False Positives Export and Automated Validation*: Superusers can now export a list of global false positives to a CSV file via the new `FalsePositiveDatasetViewExtra`. This allows for external analysis and auditing of global flags. We've also introduced an automated process for validating false positives through a new management command, which helps maintain data integrity and streamlines the validation process. The data model now includes an `is_global` field to differentiate between global and local false positives. The following additional features have been implemented:
  • A URL path for the CSV download has been added to the urlpatterns.
  • The `false_positive_validator` management command now targets unvalidated entries, enhancing system automation.
  • Temporary file usage in `FalsePositiveDatasetViewExtra` ensures no data is left behind post-download, optimizing resource management.
  • *Validation Process Optimization*: The `false_positive_validator` has been refactored to prioritize the population of original validation results before any subsequent validations, and to accurately timestamp each event. Furthermore, to prevent unnecessary re-validation, the validator now waits for a 7-day period before re-validating an entry, reducing computational load and improving efficiency. This applies to both license-inclusive and license-exempted validations.
  • *Improved Handling of False Positive Validation*: A new attribute, 'failed_subsequent_validation', has been introduced to track false positives that fail validation after a fix has been attempted. This helps in quickly identifying entries that have been marked as fixed but fail later validations. The CSV export logic has been updated to include these entries for better reporting, and the display templates have been refactored to use this new flag for improved visibility and handling.
Done Dashboard
Low PB-92 Train for duplicates in Profanity Training dataset
  • We've made several improvements to the way we handle duplicates in our Profanity Training dataset. Below is a summary of the key enhancements:
  • *Many-to-Many Relationship Tracking*: We've updated our system to track the origins of duplicate datasets more effectively. Now, a single dataset can be linked to multiple original sources, reflecting a many-to-many relationship. This change has been reflected in the user interface and the relevant commands.
  • *Improved Duplicate Detection*: The process for identifying duplicates now takes into account global datasets, enhancing the accuracy of our checks when importing new profanity data.
  • *Batch Processing Enhancements*: The `update_duplicates` command has been refactored to process entries in batches. This reduces the number of database operations and improves overall performance.
  • *Optimized Request History Updates*: We now collect information on similar dataset relationships before marking entries in the request history as checked. This ensures that we only update the request history after all potential duplicates have been identified, minimizing unnecessary database calls.
  • *Management Command Refactoring*: The `update_duplicates` management command has been overhauled for better efficiency. It now prioritizes entries that have detected profanity, uses weighted random sampling to focus on likely duplicates, extends the time frame for considering recent entries to 7 days, and includes better error handling.
  • *Database Migration*: We've added new fields to our database to support these enhanced duplicate checking features.
  • *Adaptive Sampling*: The duplicate detection process now includes adaptive sampling, which dynamically adjusts to focus on entries that are more likely to be duplicates.

Overall, these updates lead to a more reliable and efficient system for detecting duplicates in both client-specific and global datasets, ensuring the integrity of our data import workflow.

Done Dashboard

Bugs

Dashboard, DeepSense, Hybrid, InsightGuard
Priority Key Summary Description Status Component/s
Low PB-170 Dashboard: address request history train and false positive conditions inconsistencies

Problems:

  • Incorrect logic for displaying "Train as bad" and "Report as false positive" options.
  • Improper handling of genre_id restrictions.
  • Inconsistent conditions between the two files.

How they've been addressed:
Corrected logic:

  • "Report as false positive" is now shown when the request string doesn't match the returned string (indicating that profanity was detected and modified).
  • "Train as bad" is now shown when the request string matches the returned string (indicating that no profanity was detected, but the user thinks it should have been).

Genre_id restrictions:

  • Training options are now only available for genre_id 1 and 2.
  • The condition {% elif history_entry.genre_id == 1 or history_entry.genre_id == 2 %} (or word.genre_id for Discord) ensures this restriction.

Consistency

  • Both files now use the same logic structure for determining when to show training options.
  • The unnecessary check for genre_id != 3 has been removed from both files.

These changes ensure that both the main dashboard and the Discord dashboard handle profanity detection training consistently and correctly, improving the overall functionality and user experience of the system.

Done Dashboard, DeepSense, Hybrid, InsightGuard
Billing, Dashboard, Discord Premium, Filter
Priority Key Summary Description Status Component/s
Highest PB-135 maxrequests not updated upon plan change

maxrequests for a licence is not updated dynamically when a licence plan is updated.

This means until the application nodes are restarted, request limits are kept at initialisation values.

Done Billing, Dashboard, Discord Premium, Filter
Dashboard, Filter
Priority Key Summary Description Status Component/s
Low PB-45 Data too long for request history (request_string, returned_string)

Migrate these columns to a larger type to allow more text for the fields to fix the error with filter service.

Done Dashboard, Filter

 

Looking for more nitty gritty details? Take a look:

Number of issues resolved: 51 (including private issues)

Looking forward to hear your feedback.

 

Thanks,

Team ProfanityBlocker