I designed and built a 7-agent content production system with routing logic, scoring frameworks, feedback loops, and data-driven iteration. It runs in production daily, processing real content decisions.
Who: Short-form video creators on Chinese social platforms (Douyin, Xiaohongshu).
What they struggle with: Content creation involves 6–8 sequential decisions — what topic to pursue, whether it's worth making, what structure to use, how to open, what title to write, whether the writing sounds authentic, and what to learn from the results. Each decision requires judgment. Most creators rely on intuition, which doesn't scale and doesn't improve systematically.
The core product question: Can you design an AI agent system where each decision point has explicit criteria, where agents hand off to each other with structured data, and where the system improves its own decision quality over time through a feedback loop?
7 specialised agents, connected through a routing layer. Each agent has:
Parallel: Quality Check (22 writing fingerprints, callable from any stage)
Pure dispatcher. Identifies user intent in one sentence, routes to the correct agent.
Design decision: The routing layer prevents the system from trying to do everything at once. If multiple needs exist, it forces sequential resolution — a deliberate constraint that mirrors how product teams triage requests.
Binary go/no-go decision. Core belief: 80% of bad content comes from bad topics, not bad execution.
Why this matters: In any AI product, the most expensive mistake is building the wrong thing. This agent prevents the system from investing effort in content that will fail.
| Filter | What it checks | Pass | Kill |
|---|---|---|---|
| 1. Cognitive Gap | Does this version have a reason to exist? | First mover, clearer framework, or first-hand experience | Same as existing content |
| 2. Material Check | What raw material exists? (data, stories, quotes, failures) | 2+ material types | 0 materials = hard stop |
| 3. Three-Layer Test | Info → Framework replacement → Identity | All 3 layers answered | Pure information with no framework |
| 4. Methodology Validation | Matches proven formula? Hits banned type? | Verified formula with historical data | Banned type = stop with data citation |
Key design decision: Filters run sequentially with user check-ins between each — not batch processed. The agent is a filter, not an advocate. It will never help the user rationalise a passing grade.
Critical dependency: Filter 4 reads from the methodology file, which is updated by the Data Review agent. This is the feedback loop — the gate gets smarter over time.
Generate a complete ~2.5 min script after the topic passes the filter.
Most AI writing tools generate and forget. This agent treats every user edit as a signal. After 10+ revision cycles, the output converges toward the user's voice. This is the difference between a tool and a product.
Diagnose content quality first, then generate 10–15 opening options. 90% of bad openings come from bad content, not bad copywriting.
Any factor = 0 means the opening has no force:
| Factor | What it measures | Example |
|---|---|---|
| Prediction disruption | Does the opening break the viewer's default expectation? | During the first few seconds, the viewer can't predict what you're saying next |
| Reward or loss signal | Can the viewer state what they'll get (or miss) within 5 seconds? | "Watch this and you'll get X" / "Scroll past and you'll miss X" |
| Naming | Does the opening label a feeling the viewer has but couldn't articulate? | A new name for a vague feeling — the moment it's named, trust is built |
Why multiplicative, not additive: If prediction disruption is zero (the opening is predictable), it doesn't matter how strong the reward signal is — viewers have already scrolled past. All three must be non-zero.
Formula-driven title matching from 75 validated viral formulas. Every title has a formula number and traceability.
| Category | Mechanism |
|---|---|
| Cognitive Conflict (1–6) | Break existing belief |
| Curiosity Gap (7–12) | Information asymmetry |
| Fear / Loss (13–20) | "Not clicking = losing out" |
| Identity Injection (21–25) | "This is about me" |
| Number Anchoring (26–32) | Reduce cognitive load |
| Result Promise (33–40) | Concrete outcome + timeframe |
| + 6 more categories (Controversy, Scene/Condition, Action Call, Authority, Social Proof, Interaction) | |
Readiness gate: Checks if the production file has all three components (script + opening + title). All present → moves to filming queue. Any missing → blocks and reports.
Detect AI writing fingerprints. 22 patterns, 3 severity levels. Goal: "find your own voice."
Detection: 22 fingerprints (exhaustive counter-arguments, uniform parallel rhythms, zero hesitation, Chinese translation syntax). Each has genre-specific false-positive warnings.
Rewrite mode: Does not rewrite directly. Asks one targeted question per fingerprint: "Which of these parallel phrases is the one you most wanted to say?" The questions probe intent — so the user develops their voice rather than replacing AI patterns with different AI patterns.
This is where the system learns. Record data, run meta-review, extract rules, write back to methodology.
Only for notably above/below average results: phenomenon → content type → hypothesised cause → conclusion → next verification direction. Rules without a test are worthless.
This closes the loop: the methodology file is what the Topic Filter reads. Every published piece updates the criteria that gate the next piece. The system gets more precise over time.
This system runs in production daily.
Methodology file updated after every publish cycle. The system is still running and improving.
| AI Product Skill | Where it shows up |
|---|---|
| Agent architecture design | 7 agents with routing, handoff rules, and input/output contracts |
| Prompt engineering | Each agent has specialised prompt logic (scoring formulas, filters, templates) |
| Evaluation framework design | 3-factor multiplicative hook scoring, 4-layer topic filter, 75-formula title matching |
| A/B testing & experimentation | Structural A/B tests on content format, with controlled variables and metric-based conclusions |
| Feedback loop / iteration | Data Review → methodology write-back → Filter reads updated file |
| Data-driven decision making | 3 content types banned based on metrics, not intuition |
| User research thinking | Three-layer content test (information → framework replacement → identity) |
| Style learning / personalisation | Script agent diffs user edits and updates a persistent style profile |