🤖

Automated Moderation

AI-powered instant content analysis and decision-making

Automated moderation uses AI to instantly analyze and make decisions about content. It's the fastest way to moderate content at scale, providing immediate feedback on whether submitted content meets your platform's standards.

What You Can Configure

Automated moderation has two main configuration areas that work together to define how content gets analyzed:

🏷️

Category Selection

Choose which types of harmful content you want the AI to detect. Each category can be independently enabled or disabled for your project.

• 24 different categories available
• Simple enable/disable for each category
• Only enabled categories are analyzed

🎯

Global Threshold

Set the sensitivity level for all automated moderation. This single setting controls how strict the AI should be across all categories.

• One threshold for all categories
• Range from 0.0 (very strict) to 1.0 (very lenient)
• Affects the final harmful/not harmful decision

Understanding Thresholds

The threshold is the most important configuration setting in automated moderation. It's a single number between 0.0 and 1.0 that determines how confident the AI needs to be before flagging content as harmful.

How Thresholds Work

When the AI analyzes content, it assigns a confidence score between 0.0 and 1.0 for each enabled category. Your threshold setting determines which scores result in "harmful" vs "not harmful" decisions.

if (ai_confidence_score ≥ your_threshold) → harmful
else → not harmful

How Threshold Settings Affect Results

🚫

Low Threshold (0.1 - 0.3) - Very Strict

The AI needs very little confidence to flag content as harmful. This catches almost everything potentially problematic but creates many false positives.

Example Results:

• AI confidence: 0.15 → Result: Harmful
• AI confidence: 0.25 → Result: Harmful
• AI confidence: 0.05 → Result: Not Harmful

Best for: Children's platforms, highly regulated industries, or content where safety is paramount

⚖️

Medium Threshold (0.4 - 0.6) - Balanced

The AI needs moderate confidence to flag content. This provides a good balance between catching harmful content and avoiding false positives.

Example Results:

• AI confidence: 0.35 → Result: Not Harmful
• AI confidence: 0.55 → Result: Harmful
• AI confidence: 0.45 → Result: Not Harmful (threshold 0.5)

Best for: Most general platforms, social media, community forums, or business applications

✅

High Threshold (0.7 - 0.9) - Permissive

The AI needs high confidence to flag content as harmful. This reduces false positives but may miss some genuinely problematic content.

Example Results:

• AI confidence: 0.65 → Result: Not Harmful (threshold 0.7)
• AI confidence: 0.85 → Result: Harmful
• AI confidence: 0.75 → Result: Harmful

Best for: Creative platforms, adult content sites, or environments where free expression is prioritized

Practical Configuration Examples

🎮 Gaming Community Forum

Configuration:

• Categories: Violence, Harassment, Bullying, Hate, Profanity
• Threshold: 0.4 (balanced)

Strategy:

Allow some competitive banter but prevent serious harassment and hate speech. Violence detection helps with graphic content discussions.

👨‍👩‍👧‍👦 Family-Friendly Platform

Configuration:

• Categories: Sexual, Violence, Profanity, Alcohol, Smoking
• Threshold: 0.2 (very strict)

Strategy:

Heavily protect children with very low tolerance for any inappropriate content. Better to have false positives than miss harmful content.

💼 Professional Network

Configuration:

• Categories: Harassment, Hate, Discrimination, Sexual
• Threshold: 0.5 (moderate)

Strategy:

Focus on workplace-appropriate content. Allow professional discussions while preventing harassment and discrimination.

🎨 Creative Arts Platform

Configuration:

• Categories: CSAM, Terrorism, Doxxing, Scams
• Threshold: 0.8 (very permissive)

Strategy:

Prioritize creative freedom while catching only the most serious legal violations. High threshold prevents artistic censorship.

Configuring Automated Moderation

All automated moderation settings are configured through the Outharm Console. Here's how to set up your configuration:

Configuration Steps

1
Navigate to Automated Moderation
In your project console, go to the "Automated" section in the sidebar
2
Select Categories
Enable the categories you want the AI to detect. Each category has a simple enable/disable toggle
3
Set Global Threshold
Adjust the threshold slider to control overall sensitivity. Preview shows how strict/lenient your setting is
4
Test & Save
Use the test feature to validate your settings, then save the configuration

💡

Configuration Tips

• Start with a moderate threshold (0.5) and adjust based on results
• Enable categories gradually - you can always add more later
• Use the test feature extensively before going live
• Monitor your first few hundred submissions to tune settings
• Configuration changes apply immediately to new submissions

Using the Automated Moderation API

Once configured, your automated moderation settings apply to all requests made to the POST /moderation/automated endpoint. The API will use your category selection and threshold to analyze content and return instant results.

What Happens During API Calls

1. Your content is analyzed using only your enabled categories
2. AI generates confidence scores for each enabled category
3. Scores are compared against your global threshold
4. Final harmful/not harmful decision is made
5. Detailed results are returned instantly

Ready to Get Started?

Configure your automated moderation settings and start protecting your platform with AI-powered content analysis.

Console Setup

Learn how to configure automated moderation

→

Pricing

Cost per request, volume discounts, and subscription options

→

Automated Moderation

What You Can Configure

Category Selection

Global Threshold

Understanding Thresholds

How Thresholds Work

How Threshold Settings Affect Results

Low Threshold (0.1 - 0.3) - Very Strict

Example Results:

Medium Threshold (0.4 - 0.6) - Balanced

Example Results:

High Threshold (0.7 - 0.9) - Permissive

Example Results:

Practical Configuration Examples

🎮 Gaming Community Forum

👨‍👩‍👧‍👦 Family-Friendly Platform

💼 Professional Network

🎨 Creative Arts Platform

Configuring Automated Moderation

Configuration Steps

Configuration Tips

Using the Automated Moderation API

What Happens During API Calls

Ready to Get Started?

Console Setup

Pricing

Related Documentation