Content Moderation with Outharm

This example demonstrates a complete content moderation pipeline: users create posts, others report them, and after 3 reports the content automatically goes to Outharm's AI moderator. If it passes but gets 6 reports total, it escalates to human review.

You'll learn how to integrate both automated and manual moderation, handle webhook callbacks securely, and manage the complete moderation lifecycle from report thresholds to final decisions.

⚠️ Prerequisites

You'll need an Outharm account with API token and moderation schema configured. Set up PostgreSQL database and have basic Node.js/Express knowledge.

Project Setup

The highlighted package.json contains the essential dependencies for our moderation system: Express for the API, Prisma for database operations, and dotenv for environment configuration.

The "type": "module" setting enables ES modules, allowing cleaner async/await patterns in our moderation functions. Prisma provides type-safe database operations including the count() queries we'll use for report thresholds.

Run npm install to get these dependencies, then we'll configure the moderation API credentials.

Moderation API Configuration

The highlighted .env file contains the credentials that connect your app to Outharm's moderation API. Each variable serves a specific purpose in the moderation workflow.

OUTHARM_API_URL points to the moderation endpoints. OUTHARM_TOKEN authenticates your API requests.OUTHARM_SCHEMA_ID tells the AI what content fields to analyze (in our case: title and content).

🔒 Critical: Webhook Security

OUTHARM_WEBHOOK_SECRET is your defense against fake moderation results. Without proper validation, attackers could delete legitimate content by sending fake "harmful" webhooksor bypass moderation entirely with fake "safe" results.

Get your API token from Outharm Console → Access Tokens. Create a schema defining your content structure (we'll send title and content fields). Configure webhooks in Console → Manual with your server URL.

The database connection string should point to your PostgreSQL instance. We'll use this for storing posts, reports, and tracking moderation status.

Database Schema for Moderation

The highlighted schema contains two models: Post and Report. The key moderation fields are submissionId (links to Outharm) and manualModerationStatus(tracks human review state).

Post Model - Moderation Fields

submissionId stores the ID returned by Outharm's API. This is crucial for two things: correlating webhook callbacks back to the right post, and escalating content to manual review.

manualModerationStatus tracks human review state:

  • "moderating" - Currently under human review
  • "harmful" - Confirmed harmful by human
  • "not-harmful" - Approved by human
  • null - No manual review needed

Report Model - Threshold Logic

The @@unique([postId, reportedBy]) constraint prevents duplicate reports from the same user. This ensures accurate threshold counts for triggering moderation.

We use prisma.report.count() with a where clause to check thresholds: 3 reports triggers AI moderation, 6 reports escalates to humans.

💡 Why Count Queries?

We count reports dynamically instead of storing a counter. This prevents race conditions when multiple users report the same post simultaneously and keeps individual reports for audit purposes.

After creating this schema, run npx prisma generate and npx prisma db pushto set up your database tables.

Server Setup

The highlighted code shows the server initialization with the essential imports: Express for the web server, Prisma for database operations, and dotenv for loading environment variables.

The PrismaClient instance handles all database operations. We'll use it for counting reports with prisma.report.count() to check if posts hit our moderation thresholds (3 for AI, 6 for human review).

express.json() middleware parses both API requests and webhook payloads from Outharm.

Post Creation

The highlighted endpoint creates posts with title, content, and authorId. These are the fields we'll later send to Outharm for moderation analysis.

Notice that submissionId and manualModerationStatus start as null. They get populated when the post enters moderation (after receiving reports).

Posts aren't moderated when created - only when they receive their third report. This keeps posting fast while still catching harmful content through community reporting.

Report System & Moderation Triggers

The highlighted code shows the core moderation logic. When someone reports a post, we create a report record and count total reports for that post using prisma.report.count().

Threshold Logic

3 reports: Calls sendToModeration(post) to send content to Outharm's AI moderator.

6 reports: If the post has a submissionId (meaning it passed AI moderation), escalates to human review with sendToManualModeration().

Duplicate Prevention

The unique constraint @@unique([postId, reportedBy]) prevents the same user from reporting a post multiple times. When violated, Prisma throws error P2002, which we catch and return a user-friendly message.

💡 Why This Flow Works

Content gets progressively more scrutiny as reports accumulate. AI catches obvious violations quickly, while edge cases that generate more complaints get human attention.

Automated Moderation with Outharm

The highlighted sendToModeration function handles AI moderation. Let's break down each part:

Schema Validation

First, we check if OUTHARM_SCHEMA_ID exists. Without it, we can't tell Outharm what content fields to analyze. The function logs a warning and returns early if missing.

API Request Structure

The moderationPayload maps our post data to Outharm's expected format:

  • schema_id - Which moderation schema to use
  • content.title - Array containing the post title
  • content.content - Array containing the post body

Arrays allow multiple content pieces per field, useful for comment threads or multi-part content.

Critical: Submission ID Storage

We store result.submission_id in our database. This ID is essential for two things: escalating to manual review later and correlating webhook callbacks back to the right post.

Immediate Action on Harmful Content

If result.is_harmful is true, we delete the post immediately. No waiting for webhooks - harmful content gets removed fast to limit exposure.

⚡ Speed is Critical

Automated moderation gives instant results. Harmful content is deleted within seconds of the API call, minimizing the time it's visible to users.

Manual Escalation

The highlighted sendToManualModeration function escalates content to human reviewers. This happens when content passes AI moderation but gets 6 reports total.

Using Existing Submission

Instead of creating a new submission, we escalate the existing one usingPOST /submissions/[submissionId]/manual. This is more efficient because human reviewers can see the AI's decision and context.

Status Tracking

We update manualModerationStatus to 'moderating' to indicate the post is under human review. This helps your UI show the right status to users and admins.

Human review results come back via webhooks, usually within 24-48 hours. The webhook handler processes these results and takes final action.

Webhook Security

The highlighted webhook endpoint receives moderation results from Outharm. Security validation happens before any database operations to prevent fake moderation results.

Signature Verification

We check req.headers['outharm-webhook-secret'] against OUTHARM_WEBHOOK_SECRET. If they don't match, we return 401 Unauthorized and ignore the request.

🚨 Why Security Matters

Without validation, attackers could send fake webhooks to delete legitimate posts with forged "harmful" results or bypass moderation by marking harmful content as "safe". This simple check prevents most attacks.

Event Filtering

We only process moderation.manual.completed events. This prevents errors from unexpected event types and ensures our handler only runs when human review is finished.

The data object contains the moderation result, including submission_id and is_harmfulwhich we'll use to take action on the right post.

Processing Moderation Results

The highlighted handleManualModerationResult function processes human review results. It uses the submission ID to find the right post and takes action based on the verdict.

Finding the Post

prisma.post.findFirst() with a where clause finds the post that matches the webhook's submission ID. This links the moderation result back to the right content.

Handling Harmful Content

If data.is_harmful is true, we delete the post with prisma.post.delete(). The cascade delete in our schema automatically removes associated reports too.

Handling Safe Content

For safe content, we update manualModerationStatus to 'not-harmful'. This marks the post as human-approved and can help inform future moderation decisions.

💡 Why This Works

Storing submission IDs during initial moderation creates a reliable link between webhook events and your posts. No guesswork or complex matching needed.

Complete API & Testing

The highlighted code completes our API with a posts listing endpoint and server startup. The GET /posts endpoint includes report counts and moderation status for full pipeline visibility.

Posts Listing with Moderation Data

The query includes reports and report counts to fetch both individual reports and the total count. This shows the complete moderation state for each post.

Testing Your Moderation System

Start your server with npm run dev and test the flow:

  1. Create posts via POST /posts
  2. Report the same post 3 times to trigger AI moderation
  3. Check Outharm console for submission
  4. Report 6 times total to trigger human review
  5. Use ngrok for webhook testing: ngrok http 3000

🎉 You're Done!

Your moderation system handles reports intelligently, uses AI for fast decisions, escalates edge cases to humans, and prevents race conditions. It's ready for production scaling.

🚀Ready to Get Started?

Congratulations! You've built a complete content moderation system. Here are some recommended next steps to get the most out of Outharm:

Related Documentation

1{
2 "name": "content-moderation-example",
3 "version": "1.0.0",
4 "type": "module",
5 "scripts": {
6 "dev": "nodemon app.js",
7 "start": "node app.js",
8 "db:push": "prisma db push",
9 "db:generate": "prisma generate"
10 },
11 "dependencies": {
12 "express": "^4.18.2",
13 "@prisma/client": "^5.7.1",
14 "dotenv": "^16.3.1"
15 },
16 "devDependencies": {
17 "prisma": "^5.7.1",
18 "nodemon": "^3.0.2"
19 }
20}