Content Moderation with Outharm
This example demonstrates a complete content moderation pipeline: users create posts, others report them, and after 3 reports the content automatically goes to Outharm's AI moderator. If it passes but gets 6 reports total, it escalates to human review.
You'll learn how to integrate both automated and manual moderation, handle webhook callbacks securely, and manage the complete moderation lifecycle from report thresholds to final decisions.
⚠️ Prerequisites
You'll need an Outharm account with API token and moderation schema configured. Set up PostgreSQL database and have basic Node.js/Express knowledge.
Project Setup
The highlighted package.json
contains the essential dependencies for our moderation system: Express for the API, Prisma for database operations, and dotenv for environment configuration.
The "type": "module"
setting enables ES modules, allowing cleaner async/await patterns in our moderation functions. Prisma provides type-safe database operations including the count()
queries we'll use for report thresholds.
Run npm install
to get these dependencies, then we'll configure the moderation API credentials.
Moderation API Configuration
The highlighted .env
file contains the credentials that connect your app to Outharm's moderation API. Each variable serves a specific purpose in the moderation workflow.
OUTHARM_API_URL
points to the moderation endpoints. OUTHARM_TOKEN
authenticates your API requests.OUTHARM_SCHEMA_ID
tells the AI what content fields to analyze (in our case: title and content).
🔒 Critical: Webhook Security
OUTHARM_WEBHOOK_SECRET
is your defense against fake moderation results. Without proper validation, attackers could delete legitimate content by sending fake "harmful" webhooksor bypass moderation entirely with fake "safe" results.
Get your API token from Outharm Console → Access Tokens. Create a schema defining your content structure (we'll send title and content fields). Configure webhooks in Console → Manual with your server URL.
The database connection string should point to your PostgreSQL instance. We'll use this for storing posts, reports, and tracking moderation status.
Database Schema for Moderation
The highlighted schema contains two models: Post
and Report
. The key moderation fields are submissionId
(links to Outharm) and manualModerationStatus
(tracks human review state).
Post Model - Moderation Fields
submissionId
stores the ID returned by Outharm's API. This is crucial for two things: correlating webhook callbacks back to the right post, and escalating content to manual review.
manualModerationStatus
tracks human review state:
"moderating"
- Currently under human review"harmful"
- Confirmed harmful by human"not-harmful"
- Approved by humannull
- No manual review needed
Report Model - Threshold Logic
The @@unique([postId, reportedBy])
constraint prevents duplicate reports from the same user. This ensures accurate threshold counts for triggering moderation.
We use prisma.report.count()
with a where clause to check thresholds: 3 reports triggers AI moderation, 6 reports escalates to humans.
💡 Why Count Queries?
We count reports dynamically instead of storing a counter. This prevents race conditions when multiple users report the same post simultaneously and keeps individual reports for audit purposes.
After creating this schema, run npx prisma generate
and npx prisma db push
to set up your database tables.
Server Setup
The highlighted code shows the server initialization with the essential imports: Express for the web server, Prisma for database operations, and dotenv for loading environment variables.
The PrismaClient
instance handles all database operations. We'll use it for counting reports with prisma.report.count()
to check if posts hit our moderation thresholds (3 for AI, 6 for human review).
express.json()
middleware parses both API requests and webhook payloads from Outharm.
Post Creation
The highlighted endpoint creates posts with title
, content
, and authorId
. These are the fields we'll later send to Outharm for moderation analysis.
Notice that submissionId
and manualModerationStatus
start as null
. They get populated when the post enters moderation (after receiving reports).
Posts aren't moderated when created - only when they receive their third report. This keeps posting fast while still catching harmful content through community reporting.
Report System & Moderation Triggers
The highlighted code shows the core moderation logic. When someone reports a post, we create a report record and count total reports for that post using prisma.report.count()
.
Threshold Logic
3 reports: Calls sendToModeration(post)
to send content to Outharm's AI moderator.
6 reports: If the post has a submissionId
(meaning it passed AI moderation), escalates to human review with sendToManualModeration()
.
Duplicate Prevention
The unique constraint @@unique([postId, reportedBy])
prevents the same user from reporting a post multiple times. When violated, Prisma throws error P2002
, which we catch and return a user-friendly message.
💡 Why This Flow Works
Content gets progressively more scrutiny as reports accumulate. AI catches obvious violations quickly, while edge cases that generate more complaints get human attention.
Automated Moderation with Outharm
The highlighted sendToModeration
function handles AI moderation. Let's break down each part:
Schema Validation
First, we check if OUTHARM_SCHEMA_ID
exists. Without it, we can't tell Outharm what content fields to analyze. The function logs a warning and returns early if missing.
API Request Structure
The moderationPayload
maps our post data to Outharm's expected format:
schema_id
- Which moderation schema to usecontent.title
- Array containing the post titlecontent.content
- Array containing the post body
Arrays allow multiple content pieces per field, useful for comment threads or multi-part content.
Critical: Submission ID Storage
We store result.submission_id
in our database. This ID is essential for two things: escalating to manual review later and correlating webhook callbacks back to the right post.
Immediate Action on Harmful Content
If result.is_harmful
is true, we delete the post immediately. No waiting for webhooks - harmful content gets removed fast to limit exposure.
⚡ Speed is Critical
Automated moderation gives instant results. Harmful content is deleted within seconds of the API call, minimizing the time it's visible to users.
Manual Escalation
The highlighted sendToManualModeration
function escalates content to human reviewers. This happens when content passes AI moderation but gets 6 reports total.
Using Existing Submission
Instead of creating a new submission, we escalate the existing one usingPOST /submissions/[submissionId]/manual
. This is more efficient because human reviewers can see the AI's decision and context.
Status Tracking
We update manualModerationStatus
to 'moderating'
to indicate the post is under human review. This helps your UI show the right status to users and admins.
Human review results come back via webhooks, usually within 24-48 hours. The webhook handler processes these results and takes final action.
Webhook Security
The highlighted webhook endpoint receives moderation results from Outharm. Security validation happens before any database operations to prevent fake moderation results.
Signature Verification
We check req.headers['outharm-webhook-secret']
against OUTHARM_WEBHOOK_SECRET
. If they don't match, we return 401 Unauthorized
and ignore the request.
🚨 Why Security Matters
Without validation, attackers could send fake webhooks to delete legitimate posts with forged "harmful" results or bypass moderation by marking harmful content as "safe". This simple check prevents most attacks.
Event Filtering
We only process moderation.manual.completed
events. This prevents errors from unexpected event types and ensures our handler only runs when human review is finished.
The data
object contains the moderation result, including submission_id
and is_harmful
which we'll use to take action on the right post.
Processing Moderation Results
The highlighted handleManualModerationResult
function processes human review results. It uses the submission ID to find the right post and takes action based on the verdict.
Finding the Post
prisma.post.findFirst()
with a where clause finds the post that matches the webhook's submission ID. This links the moderation result back to the right content.
Handling Harmful Content
If data.is_harmful
is true, we delete the post with prisma.post.delete()
. The cascade delete in our schema automatically removes associated reports too.
Handling Safe Content
For safe content, we update manualModerationStatus
to 'not-harmful'
. This marks the post as human-approved and can help inform future moderation decisions.
💡 Why This Works
Storing submission IDs during initial moderation creates a reliable link between webhook events and your posts. No guesswork or complex matching needed.
Complete API & Testing
The highlighted code completes our API with a posts listing endpoint and server startup. The GET /posts
endpoint includes report counts and moderation status for full pipeline visibility.
Posts Listing with Moderation Data
The query includes reports and report counts to fetch both individual reports and the total count. This shows the complete moderation state for each post.
Testing Your Moderation System
Start your server with npm run dev
and test the flow:
- Create posts via
POST /posts
- Report the same post 3 times to trigger AI moderation
- Check Outharm console for submission
- Report 6 times total to trigger human review
- Use ngrok for webhook testing:
ngrok http 3000
🎉 You're Done!
Your moderation system handles reports intelligently, uses AI for fast decisions, escalates edge cases to humans, and prevents race conditions. It's ready for production scaling.
🚀Ready to Get Started?
Congratulations! You've built a complete content moderation system. Here are some recommended next steps to get the most out of Outharm:
Related Documentation
- • Schemas & Components - Structure content for better analysis
- • Categories - Configure what content types to detect
- • API Authentication - Secure your moderation endpoints
- • Console Guide - Manage projects and review decisions