Categories
Define what types of harmful content to detect in your moderation
Categories are the types of harmful content Outharm can detect. You choose which categories matter to your platform - only selected categories will be analyzed. This applies to both automated AI moderation and manual human review.
How Categories Work
Categories act as filters for what content gets flagged as harmful. Outharm only tries to detect categories you've enabled - disabled categories are completely ignored during moderation.
This selective approach means you can customize moderation to your platform's needs. A family-friendly app might enable all categories, while a mature gaming community might only flag extreme content like violence or harassment.
Category Selection
Enable/Disable Model
Categories use a simple on/off model. Each category can be either enabled (will be detected) or disabled (will be ignored). There are no thresholds or sensitivity levels - you choose which categories to care about for your platform.
Universal Application
Your category selection applies to both automated moderation (AI-powered instant decisions) and manual moderation (human review queue). This ensures consistent standards across your entire content moderation workflow.
Available Categories
Outharm provides a comprehensive set of categories covering both image and text content. Categories are organized by content type:
Mixed Categories (Text + Image)
Sexual
sexual
Content containing nudity, pornography, or sexually explicit material.
Fascism
fascism
Fascist symbols, Nazi propaganda, or authoritarian extremist political content.
Alcohol
alcohol
Content promoting alcohol consumption, drinking culture, or alcoholic beverages.
Smoking
smoking
Tobacco products, smoking, vaping, or substance use promotional content.
Image-Only Categories
Blood
blood
Graphic content showing blood, injuries, gore, or disturbing medical imagery.
Knife
knife
Images of knives, blades, sharp weapons, or cutting implements.
Gun
gun
Firearms, weapons, ammunition, or gun-related promotional content.
Insulting Gesture
insulting_gesture
Hand gestures, body language, or visual symbols intended to insult or offend.
Text-Only Categories
Self-Harm
self-harm
Content promoting, depicting, or encouraging self-injury, suicide, or self-destructive behavior.
Violence
violence
Violent content, fighting, aggressive behavior, or depictions of physical harm.
Hate
hate
Hate speech, discrimination, slurs, or content targeting specific groups based on identity.
Threatening
threatening
Direct threats, intimidation, menacing language, or content intended to frighten.
Harassment
harassment
Targeted abuse, personal attacks, stalking, or persistent unwanted communication.
Bullying
bullying
Systematic intimidation, abuse, exclusion, or repeated harmful behavior toward individuals.
Terrorism
terrorism
Content promoting, supporting, or glorifying terrorist activities, organizations, or attacks.
Extremism
extremism
Radical ideologies, violent extremist content, or radicalization materials beyond fascism.
CSAM
csam
Any reference to child sexual abuse material or exploitation of minors (zero tolerance policy).
Sexual Solicitation
sexual_solicitation
Requests, offers, or advertisements for sexual acts, services, or inappropriate contact.
Medical Misinformation
medical_misinformation
False, misleading, or dangerous health claims, fake cures, or harmful medical advice.
Political Misinformation
political_misinformation
Deceptive content about elections, voting, political processes, or democratic institutions.
Scams
scams
Fraudulent schemes, fake offers, phishing attempts, or deceptive money-making promises.
Fraud
fraud
Financial deception, identity theft attempts, fake services, or fraudulent business practices.
Doxxing
doxxing
Sharing private personal information without consent, including addresses, phone numbers, or documents.
Profanity
profanity
Vulgar language, obscene words, offensive expressions, or inappropriate verbal content.
Configuring Categories
Category configuration is done through the Console interface:
Navigate to Categories
Open your project in Console and go to Categories
Filter by Type
Use All, Text, or Image filters to focus on specific content types
Toggle Categories
Click on category cards to enable/disable them. Use bulk actions to select or deselect all
Save Changes
Click Save to apply your category selection to all future moderation
API Response Format
When content matches your enabled categories, the API returns detected categories using their internal names (shown in monospace font above). For example, if "Sexual" content is detected, it will appear as sexual
in the response.
Category Names in Responses
API responses always use the internal category names (like sexual
, violence
, medical_misinformation
) rather than display names. This ensures consistent parsing and avoids issues with spacing or special characters.
Best Practices
Choose Relevant Categories
Only enable categories that matter to your platform. A professional network might focus on harassment and misinformation, while a gaming platform might prioritize violence and hate speech.
Start Broad, Then Narrow
Begin with more categories enabled and disable ones that don't fit your content. It's easier to reduce false positives than to catch missed harmful content.
Consider Content Types
If your platform is text-only, you can safely ignore image categories. If it's image-focused, prioritize visual categories over text-based ones.
Review Regularly
Periodically review your category selection as your platform evolves. New features or user behaviors might require enabling additional categories.
Ready to Get Started?
Configure your content categories and start building your moderation setup with our comprehensive detection capabilities.
Console Walkthrough
Learn how to enable categories in the console
Automated Moderation
Configure thresholds for AI-powered detection
Related Documentation
- • Quick Start Guide - Get up and running quickly
- • Schemas & Components - Structure your content for moderation
- • Manual Moderation - Human review workflows
- • Console Walkthrough - Learn to use the management interface