Automated Moderation
AI-powered instant content analysis and decision-making
Automated moderation uses AI to instantly analyze and make decisions about content. It's the fastest way to moderate content at scale, providing immediate feedback on whether submitted content meets your platform's standards.
What You Can Configure
Automated moderation has two main configuration areas that work together to define how content gets analyzed:
Category Selection
Choose which types of harmful content you want the AI to detect. Each category can be independently enabled or disabled for your project.
- โข 24 different categories available
- โข Simple enable/disable for each category
- โข Only enabled categories are analyzed
Global Threshold
Set the sensitivity level for all automated moderation. This single setting controls how strict the AI should be across all categories.
- โข One threshold for all categories
- โข Range from 0.0 (very strict) to 1.0 (very lenient)
- โข Affects the final harmful/not harmful decision
Understanding Thresholds
The threshold is the most important configuration setting in automated moderation. It's a single number between 0.0 and 1.0 that determines how confident the AI needs to be before flagging content as harmful.
How Thresholds Work
When the AI analyzes content, it assigns a confidence score between 0.0 and 1.0 for each enabled category. Your threshold setting determines which scores result in "harmful" vs "not harmful" decisions.
if (ai_confidence_score โฅ your_threshold) โ harmful
else โ not harmful
How Threshold Settings Affect Results
Low Threshold (0.1 - 0.3) - Very Strict
The AI needs very little confidence to flag content as harmful. This catches almost everything potentially problematic but creates many false positives.
Example Results:
- โข AI confidence: 0.15 โ Result: Harmful
- โข AI confidence: 0.25 โ Result: Harmful
- โข AI confidence: 0.05 โ Result: Not Harmful
Best for: Children's platforms, highly regulated industries, or content where safety is paramount
Medium Threshold (0.4 - 0.6) - Balanced
The AI needs moderate confidence to flag content. This provides a good balance between catching harmful content and avoiding false positives.
Example Results:
- โข AI confidence: 0.35 โ Result: Not Harmful
- โข AI confidence: 0.55 โ Result: Harmful
- โข AI confidence: 0.45 โ Result: Not Harmful (threshold 0.5)
Best for: Most general platforms, social media, community forums, or business applications
High Threshold (0.7 - 0.9) - Permissive
The AI needs high confidence to flag content as harmful. This reduces false positives but may miss some genuinely problematic content.
Example Results:
- โข AI confidence: 0.65 โ Result: Not Harmful (threshold 0.7)
- โข AI confidence: 0.85 โ Result: Harmful
- โข AI confidence: 0.75 โ Result: Harmful
Best for: Creative platforms, adult content sites, or environments where free expression is prioritized
Practical Configuration Examples
๐ฎ Gaming Community Forum
Configuration:
- โข Categories: Violence, Harassment, Bullying, Hate, Profanity
- โข Threshold: 0.4 (balanced)
Strategy:
Allow some competitive banter but prevent serious harassment and hate speech. Violence detection helps with graphic content discussions.
๐จโ๐ฉโ๐งโ๐ฆ Family-Friendly Platform
Configuration:
- โข Categories: Sexual, Violence, Profanity, Alcohol, Smoking
- โข Threshold: 0.2 (very strict)
Strategy:
Heavily protect children with very low tolerance for any inappropriate content. Better to have false positives than miss harmful content.
๐ผ Professional Network
Configuration:
- โข Categories: Harassment, Hate, Discrimination, Sexual
- โข Threshold: 0.5 (moderate)
Strategy:
Focus on workplace-appropriate content. Allow professional discussions while preventing harassment and discrimination.
๐จ Creative Arts Platform
Configuration:
- โข Categories: CSAM, Terrorism, Doxxing, Scams
- โข Threshold: 0.8 (very permissive)
Strategy:
Prioritize creative freedom while catching only the most serious legal violations. High threshold prevents artistic censorship.
Configuring Automated Moderation
All automated moderation settings are configured through the Outharm Console. Here's how to set up your configuration:
Configuration Steps
- 1Navigate to Automated Moderation
In your project console, go to the "Automated" section in the sidebar
- 2Select Categories
Enable the categories you want the AI to detect. Each category has a simple enable/disable toggle
- 3Set Global Threshold
Adjust the threshold slider to control overall sensitivity. Preview shows how strict/lenient your setting is
- 4Test & Save
Use the test feature to validate your settings, then save the configuration
Configuration Tips
- โข Start with a moderate threshold (0.5) and adjust based on results
- โข Enable categories gradually - you can always add more later
- โข Use the test feature extensively before going live
- โข Monitor your first few hundred submissions to tune settings
- โข Configuration changes apply immediately to new submissions
Using the Automated Moderation API
Once configured, your automated moderation settings apply to all requests made to the POST /moderation/automated
endpoint. The API will use your category selection and threshold to analyze content and return instant results.
What Happens During API Calls
- 1. Your content is analyzed using only your enabled categories
- 2. AI generates confidence scores for each enabled category
- 3. Scores are compared against your global threshold
- 4. Final harmful/not harmful decision is made
- 5. Detailed results are returned instantly
Ready to Get Started?
Configure your automated moderation settings and start protecting your platform with AI-powered content analysis.
Console Setup
Learn how to configure automated moderation
Pricing
Cost per request, volume discounts, and subscription options
Related Documentation
- โข Categories - What content types can be detected
- โข Schemas & Components - How to structure content for analysis
- โข Manual Moderation - Human review workflows
- โข Quick Start Guide - Get up and running quickly