Why I chose AWS for content moderation.
A first-person look at the three-layer content moderation pipeline behind Circus — hash tables, AWS Comprehend, and AWS Rekognition — and why I chose managed ML over building my own.
When I started designing Circus’s safety architecture, I faced the same question every platform builder eventually hits: do you build your own ML models, or do you use managed services?
For most early-stage companies, this is a false choice. Building accurate, production-grade ML for content classification takes years of training data, a dedicated ML team, and continuous iteration. I don’t have any of that. What I do have is a well-designed pipeline that routes content through the right tools at the right stage — and AWS gives me the accuracy I need for the volume I’m expecting at launch.
Layer 1 — Hash-table filtering
The first layer is the fastest and most deterministic: hash-table matching. Before any content touches a machine learning model, it passes through two hash sets.
The first is a profanity and slur list — a configurable table of known harmful terms with severity levels. Exact matches are blocked or flagged instantly, with no inference cost.
The second is a CSAM hash set. Every image and video uploaded to Circus is hashed and checked against a database of known child sexual abuse material. This is a legal requirement before accepting user-generated content in most jurisdictions, and it’s the most critical check in the entire pipeline. Hash matching catches known material with perfect accuracy — there’s no false-positive rate, no confidence threshold, no ambiguity. A hash either matches or it doesn’t.
Layer 1 is cheap to run, near-instant, and catches the clearest-cut violations before anything else happens. Everything it misses flows to layer 2.
Layer 2 — AWS Comprehend for text
AWS Comprehend is a managed natural language processing service that classifies text across a range of categories. For Circus, I’m using it for three things.
The first is PII detection — personally identifiable information like phone numbers, email addresses, credit card numbers, and government IDs. Content containing exposed PII is flagged for review before it can be published publicly.
The second is harmful language classification — hate speech, threats, harassment, and abusive content. Comprehend is trained on a broad corpus and supports multiple languages, which matters for a global platform. I don’t have to build separate models for each market.
The third is sentiment and toxicity scoring. High toxicity scores don’t automatically block content — they add weight to a queue priority score that determines how quickly a piece of content reaches human review. A post that scores high on toxicity but low on hate-speech classification might be heated community debate rather than harassment. That distinction matters, and it’s one the human review step is better positioned to make.
Layer 3 — AWS Rekognition for images and video
AWS Rekognition handles image and video scanning. Every image and video frame is passed through two checks: explicit content detection and violence detection.
Rekognition returns a confidence score and a category label — “Explicit Nudity”, “Suggestive”, “Violence”, “Visually Disturbing”, and so on. Rather than applying a single binary threshold, I map different confidence ranges to different actions. Content that scores above the high threshold (say, 95%+ confidence for explicit nudity) is automatically blocked. Content that scores in the mid range goes to the human review queue. Content below the low threshold is passed through.
The exact thresholds are something I’ll calibrate against real data after launch. Getting them right is an iterative process — too aggressive and you kill legitimate creative content, too lenient and harmful content gets through. I’d rather start tight and ease off than start loose and spend the first months in crisis mode.
How confidence scores flow into the review queue
One of the most important design decisions in the pipeline is how signals from all three layers aggregate into a single review priority.
A piece of content that hits a hash-table match gets blocked automatically — it never enters the queue. Everything else gets a composite score based on: the toxicity output from Comprehend, the moderation confidence from Rekognition, the report count from users, and the creator’s account history. Higher composite scores go to the top of the human review queue.
The human review team works the queue top-down. They see the ML signals, the content, and the context. They can confirm, override, or escalate. Their decisions feed back into the calibration data for the system over time.
Platform-trained models on top
AWS Comprehend and Rekognition are general-purpose models. They’re accurate across a wide range of content, but they don’t know what Circus looks like specifically — what’s normal for a sports fan community vs. what’s harassment, what’s a creator’s artistic work vs. explicit content, how satire looks versus targeted abuse.
Over time, the decisions made by the human review team will train platform-specific models that sit on top of the AWS base layer. These won’t replace Comprehend or Rekognition — they’ll supplement them with Circus-specific signal. The foundation has to be solid before the refinement is worth anything.
What this means in practice
For creators and fans, the goal of this system is invisible safety — harmful content that never reaches a feed, not harmful content that gets removed after someone sees it.
The pipeline runs before publication, not after. A post submitted to Circus is scanned before it goes live. If it clears all three layers quickly, it publishes in seconds. If it scores above a review threshold, it waits until a human has assessed it. That’s a deliberate trade-off: we accept a small latency cost for content in the gray zone in exchange for a platform where communities don’t have to see the worst of the internet first.
What’s next
The work that’s ahead of me — and the part I’m most interested in — is reducing false positives. A moderation system that blocks legitimate content erodes creator trust faster than anything else. Getting the confidence thresholds right, training the platform-specific layer, and building a clean appeals process for creators who think a decision was wrong are all on the roadmap.
I’ll keep writing about this as we go. If you’re building something similar, or you’ve worked through these same decisions at a different scale, I’d genuinely like to hear what you found. Reach me at rowan@circus.app.