Dashboard-V2

ProfanityBlocker Release Notes

Hello there,

We are glad to inform you that the latest version Dashboard-V2 of ProfanityBlocker is out. Below is the list of JIRA issues that were part of this upgrade.

Summary -

Project - ProfanityBlocker

Version - Dashboard-V2

Planned Release Date -

New Feature

Key	Summary	Description	Status	Component/s
PB-171	Request History csv export (non-discord view)	Allows to download 1K entries from request history against the service licence	Done	Dashboard
PB-89	Overhaul Profanity Training and False Positive Reporting System	Implement false positive reporting for request history. We no longer store false positives and profanity training data in .csv files in the dashboard filesystem.	Done	Dashboard
PB-51	DeepSense class thresholds, probability tuning for clients	Interpret probabilities from the class result, and allow users to set sensitivity thresholds for how confident and strict the DeepSense filter is, that the content is bad.	Done	Dashboard, DeepSense, Filter

Improvements

Dashboard

Key	Summary	Description	Status	Component/s
PB-216	Kafka Consumer Implementation 100 batch issues	Fixes: I identified two main issues in the original Kafka consumer implementation: Batch Size Issue: The original code only processed messages when reaching 100 messages in a batch If a license had low volume (less than 100 messages), they would sit in the batch indefinitely, and may get lost in Kafka as entries expire There was no time-based processing mechanism Per-License Processing Issue: Messages from different licenses were mixed together in the same batch All licenses had to wait for the global batch to reach 100 messages This meant low-volume licenses were dependent on high-volume licenses for processing The solution we implemented addresses both issues by: Adding a 60-second maximum batch age (so messages won't wait indefinitely) 2. Implementing per-license batching (so each license's messages are processed independently)	Done	Dashboard
PB-167	Create placeholder users, licences and transaction for guilds and link them - transfer ownership when owner logs in	This further simplifies onboarding as we create a placeholder user, licence and transaction for the guild the bot is in so users can use the free plan without additional action (no need to visit dashboard). When logging in via Discord, we check owned guilds for matching placeholder licences, create the new discord account and transfer ownership.	Done	Dashboard
PB-146	Display request usage/limit like Discord view for non-discord filter option	Brings the view in-line with the Discord version.	Done	Dashboard
PB-129	Dashboard filter Options for non-discord users	Add licence filter options to non-Discord dashboard.	Done	Dashboard
PB-109	If Discord serverlist is empty we should try the login again	This also fixes a keyerror when serverlist is empty	Done	Dashboard
PB-91	Separated OAuth and Django Authentication	. Refactored User and Password Generation Functions: We updated the MakeDiscordOTPUser and MakeDiscordOTPPass functions to ensure they are used correctly for generating random usernames and passwords when creating new user accounts. These functions are not used for authentication purposes. 2. Updated User Authentication Logic: We modified the authenticateuser function to handle the Discord OAuth login flow properly. The function now: - Checks if a user with the provided Discord email already exists in the database. - If the user exists, it logs them in without checking the password, as the OAuth process handles authentication. - If the user does not exist, it creates a new user with a random username and password. The password is not used for login purposes since Discord OAuth is used for authentication. 3. Separated OAuth and Django Authentication: We ensured that the Discord OAuth login flow is separate from the Django login flow. The OAuth process does not rely on the user's Django password for authentication, and the user's session is managed independently of the Django password. 5. Session Management: After a successful OAuth login, the user's session is created or updated in a way that is recognized by the Django application, regardless of the user's password in Django. These changes aim to provide a seamless login experience for users who choose to log in via Discord OAuth and to ensure that the application's internal authentication mechanisms (such as password changes through the Django admin) do not interfere with the OAuth login process.	Done	Dashboard

Billing, Dashboard, Discord Premium, Slack

Priority	Key	Summary	Description	Status	Component/s
	PB-200	Payment support for non-discord user	Implement payment support for the non-discord dashboard (for Slack/YouTube/API, etc)	Done	Billing, Dashboard, Discord Premium, Slack

Dashboard, Discord Premium

Priority	Key	Summary	Description	Status	Component/s
	PB-197	Profanity Report improvements	Improved error message when there are no reports Currently Profanity Reports require a paid plan to view Filter by specific licence/server, including month/year Overview report numbers for filter selection Implement monthly baseline comparison for profanity reports Add monthly baseline tracking by comparing against last month's report Calculate month boundaries correctly using timezone-aware dates Use distinct per-licence queries to ensure accurate month-to-month comparison Optimize query by filtering to previous month's reports only Refactor profanity report generation with improved baseline comparison Add comprehensive tracking of active users, sample messages, and severe messages Implement more robust data collection and reporting logic Update management command to handle first-month and subsequent month scenarios Add test suite for profanity report generation Improve HTML template for displaying report details	In Review	Dashboard, Discord Premium

Celery, Dashboard, Discord Bot

Priority	Key	Summary	Description	Status	Component/s
	PB-186	Instantly create licences for new bot users	Improved onboarding by creating a service licence on the spot, bypassing our background job which can take a few minutes - our analytics found users adding the bot, immediately sending content into their channel (presumably to test) and removing it when nothing happened (because licence setup in the background hadn't finished/ran yet.	In Review	Celery, Dashboard, Discord Bot

Dashboard, Discord Bot

Priority	Key	Summary	Description	Status	Component/s
	PB-145	Simplify onboarding for discord	Simplify invite screen with clearer text on the steps to complete setup. Simplify and explain on the bot side what to do when the bot is directly invited (not via dashboard) Strings in strings for easier maintainability Specially crafted login link that leads to the Discord server Home page (one-click setup) Use embeds to make the onboarding channel message tidier Update dashboard bot command to use the one-click login to server home dashboard	Done	Dashboard, Discord Bot

Dashboard, Filter

Priority	Key	Summary	Description	Status	Component/s
	PB-137	Redis, celery and Kafka - Database optimisations	With the use of the Chrome extension, we have observed a significant increase in the number of requests made to the filter, often reaching a thousand within a 30-minute span. Improvements: Implemented caching for frequently accessed data, especially for results of expensive queries. Utilized Django's caching framework to reduce database load during high traffic periods. Various caching backends, such as Redis, are implemented for enhanced performance.	Done	Dashboard, Filter

DeepSense, Filter

Priority	Key	Summary	Description	Status	Component/s
	PB-131	DeepSense class thresholds needs request history to store thresholds for request history	Since clients can set probabilities for the class result, reproducing results from false positives will require us to know the class thresholds for the request at the time it was made. Save in request history table (DB). Request history is only kept for 30 days.	Done	DeepSense, Filter

Discord Bot, Filter

Priority	Key	Summary	Description	Status	Component/s
	PB-90	Better align filter and bot request history recording	Improve synchronization between the filter service and the Discord bot for recording request history. Previously, both the filter service and the Discord bot were independently creating request history entries, resulting in duplicate entries. Furthermore, the fields in these entries did not match, leading to inconsistencies in the displayed data in the request history views. To address this issue, we have implemented a solution. The filter service now includes the request ID it generates, and the Discord bot reads this ID to update the corresponding record with additional information, rather than creating a new entry in the database. This ensures better alignment between the filter service and the Discord bot in recording request history, eliminating duplicate entries and ensuring consistent data in the request history views.	Done	Discord Bot, Filter

Tasks

Dashboard

Priority	Key	Summary	Description	Status	Component/s
	PB-142	Delete request history data after 30 days where there is no linked custom datasets	Reduce clutter by removing request history entries older than 30 days. Dataset entries created from request history stays intact for tracing/integrity - migration required if we want to delete the original request history entry linked to custom datasets, out of scope of this task.	Done	Dashboard

Dashboard, Discord Premium, Filter, Hybrid, InsightGuard

Priority	Key	Summary	Description	Status	Component/s
	PB-81	PoC: custom model criteria	At a high level: Implement custom model training per client. TL;DR you can add custom data per licence and we train to block that type of content. It should be noted that training requires premium as you would need to select data from discord server logs to add to to the clients' custom training data As per ~~PB-89~~ we do not store the custom training data on FS but in DB, and we do not do version as this issue originally went towards.	Done	Dashboard, Discord Premium, Filter, Hybrid, InsightGuard

Subtasks

Dashboard

Priority

Key

Summary

Description

Status

Component/s

PB-93

False Positive Validation

*Enhancements to False Positive Reports Management*

*Validation Results Display Enhancement*: We've improved the False Positive Reports interface to show validation results for entries both with and without licenses. This enhancement allows users to clearly distinguish between the two scenarios, improving the analysis capabilities of the dashboard. The display templates have been updated to include these new fields, and a database migration has been added to incorporate these changes into the schema.

*Global False Positive Marking for Superusers*: Superusers now have the ability to designate reports as global false positives directly from the dashboard. This new feature, accessible through the FalsePositiveDatasetView, adds a layer of permission-based control and includes updates to the FalsePositive_Reports_Dataset model to reflect the global status. The dashboard's user interface now features a button for this action, streamlining the process for superusers and enhancing report management across all clients.

*CSV Export of False Positives*: We've refined the command line output to provide more precise probability information regarding profanity checks. This includes accurate handling of probabilities from both the base and extra models, as well as calculations for non-profane probabilities. Additionally, superusers can now download a CSV file of false positive reports from the dashboard. This new functionality facilitates the easy collection of data associated with various licenses.

*Global False Positives Export and Automated Validation*: Superusers can now export a list of global false positives to a CSV file via the new `FalsePositiveDatasetViewExtra`. This allows for external analysis and auditing of global flags. We've also introduced an automated process for validating false positives through a new management command, which helps maintain data integrity and streamlines the validation process. The data model now includes an `is_global` field to differentiate between global and local false positives. The following additional features have been implemented:
A URL path for the CSV download has been added to the urlpatterns.
The `false_positive_validator` management command now targets unvalidated entries, enhancing system automation.
Temporary file usage in `FalsePositiveDatasetViewExtra` ensures no data is left behind post-download, optimizing resource management.

*Validation Process Optimization*: The `false_positive_validator` has been refactored to prioritize the population of original validation results before any subsequent validations, and to accurately timestamp each event. Furthermore, to prevent unnecessary re-validation, the validator now waits for a 7-day period before re-validating an entry, reducing computational load and improving efficiency. This applies to both license-inclusive and license-exempted validations.

*Improved Handling of False Positive Validation*: A new attribute, 'failed_subsequent_validation', has been introduced to track false positives that fail validation after a fix has been attempted. This helps in quickly identifying entries that have been marked as fixed but fail later validations. The CSV export logic has been updated to include these entries for better reporting, and the display templates have been refactored to use this new flag for improved visibility and handling.

Done

Dashboard

PB-92

Train for duplicates in Profanity Training dataset

We've made several improvements to the way we handle duplicates in our Profanity Training dataset. Below is a summary of the key enhancements:

*Many-to-Many Relationship Tracking*: We've updated our system to track the origins of duplicate datasets more effectively. Now, a single dataset can be linked to multiple original sources, reflecting a many-to-many relationship. This change has been reflected in the user interface and the relevant commands.

*Improved Duplicate Detection*: The process for identifying duplicates now takes into account global datasets, enhancing the accuracy of our checks when importing new profanity data.

*Batch Processing Enhancements*: The `update_duplicates` command has been refactored to process entries in batches. This reduces the number of database operations and improves overall performance.

*Optimized Request History Updates*: We now collect information on similar dataset relationships before marking entries in the request history as checked. This ensures that we only update the request history after all potential duplicates have been identified, minimizing unnecessary database calls.

*Management Command Refactoring*: The `update_duplicates` management command has been overhauled for better efficiency. It now prioritizes entries that have detected profanity, uses weighted random sampling to focus on likely duplicates, extends the time frame for considering recent entries to 7 days, and includes better error handling.

*Database Migration*: We've added new fields to our database to support these enhanced duplicate checking features.

*Adaptive Sampling*: The duplicate detection process now includes adaptive sampling, which dynamically adjusts to focus on entries that are more likely to be duplicates.

Overall, these updates lead to a more reliable and efficient system for detecting duplicates in both client-specific and global datasets, ensuring the integrity of our data import workflow.

Done

Dashboard

Billing, Dashboard, Discord Bot

Priority	Key	Summary	Description	Status	Component/s
	PB-72	Implement Pro PLUS and Ultimate price plans in Dashboard, inc dynamic pricing and plan selection	3 new billing plans: Pro PLUS and Ultimate. Existing plans will stay on Premium plan. Spruced up subscribe page	Done	Billing, Dashboard, Discord Bot

Bugs

Dashboard

Priority	Key	Summary	Description	Status	Component/s
	PB-196	Report regression	Introduced in fd429373e7565af1613635ee213ea10426e710be	Done	Dashboard

Dashboard, DeepSense, Hybrid, InsightGuard

Priority

Key

Summary

Description

Status

Component/s

PB-170

Dashboard: address request history train and false positive conditions inconsistencies

Problems:

Incorrect logic for displaying "Train as bad" and "Report as false positive" options.
Improper handling of genre_id restrictions.
Inconsistent conditions between the two files.

How they've been addressed:
Corrected logic:

"Report as false positive" is now shown when the request string doesn't match the returned string (indicating that profanity was detected and modified).
"Train as bad" is now shown when the request string matches the returned string (indicating that no profanity was detected, but the user thinks it should have been).

Genre_id restrictions:

Training options are now only available for genre_id 1 and 2.
The condition {% elif history_entry.genre_id == 1 or history_entry.genre_id == 2 %} (or word.genre_id for Discord) ensures this restriction.

Consistency

Both files now use the same logic structure for determining when to show training options.
The unnecessary check for genre_id != 3 has been removed from both files.

These changes ensure that both the main dashboard and the Discord dashboard handle profanity detection training consistently and correctly, improving the overall functionality and user experience of the system.

Done

Dashboard, DeepSense, Hybrid, InsightGuard

Billing, Dashboard, Discord Premium, Filter

Priority	Key	Summary	Description	Status	Component/s
	PB-135	maxrequests not updated upon plan change	maxrequests for a licence is not updated dynamically when a licence plan is updated. This means until the application nodes are restarted, request limits are kept at initialisation values.	Done	Billing, Dashboard, Discord Premium, Filter

Dashboard, Filter

Priority	Key	Summary	Description	Status	Component/s
	PB-45	Data too long for request history (request_string, returned_string)	Migrate these columns to a larger type to allow more text for the fields to fix the error with filter service.	Done	Dashboard, Filter

Looking for more nitty gritty details? Take a look:

Task

Priority	Key	Summary	Status
	PB-93	False Positive Validation	Done
	PB-92	Train for duplicates in Profanity Training dataset	Done

Number of issues resolved: 59 (including private issues)

Looking forward to hear your feedback.

Thanks,

Team ProfanityBlocker