← Back to portfolio
ENFR
AI Agent System Design

Designing a Multi-Agent AI System
That Learns From Its Own Output

I designed and built a 7-agent content production system with routing logic, scoring frameworks, feedback loops, and data-driven iteration. It runs in production daily, processing real content decisions.

Part 1

The Problem

Who: Short-form video creators on Chinese social platforms (Douyin, Xiaohongshu).

What they struggle with: Content creation involves 6–8 sequential decisions — what topic to pursue, whether it's worth making, what structure to use, how to open, what title to write, whether the writing sounds authentic, and what to learn from the results. Each decision requires judgment. Most creators rely on intuition, which doesn't scale and doesn't improve systematically.

The core product question: Can you design an AI agent system where each decision point has explicit criteria, where agents hand off to each other with structured data, and where the system improves its own decision quality over time through a feedback loop?


Part 2

System Architecture

Overview

7 specialised agents, connected through a routing layer. Each agent has:

The Pipeline

Record Filter Script Hook Title Publish Review Filter

Parallel: Quality Check (22 writing fingerprints, callable from any stage)

Agent 1
Router

Pure dispatcher. Identifies user intent in one sentence, routes to the correct agent.

Design decision: The routing layer prevents the system from trying to do everything at once. If multiple needs exist, it forces sequential resolution — a deliberate constraint that mirrors how product teams triage requests.

Agent 2 — The Gate
Topic Filter

Binary go/no-go decision. Core belief: 80% of bad content comes from bad topics, not bad execution.

Why this matters: In any AI product, the most expensive mistake is building the wrong thing. This agent prevents the system from investing effort in content that will fail.

FilterWhat it checksPassKill
1. Cognitive GapDoes this version have a reason to exist?First mover, clearer framework, or first-hand experienceSame as existing content
2. Material CheckWhat raw material exists? (data, stories, quotes, failures)2+ material types0 materials = hard stop
3. Three-Layer TestInfo → Framework replacement → IdentityAll 3 layers answeredPure information with no framework
4. Methodology ValidationMatches proven formula? Hits banned type?Verified formula with historical dataBanned type = stop with data citation

Key design decision: Filters run sequentially with user check-ins between each — not batch processed. The agent is a filter, not an advocate. It will never help the user rationalise a passing grade.

Critical dependency: Filter 4 reads from the methodology file, which is updated by the Data Review agent. This is the feedback loop — the gate gets smarter over time.

Agent 3
Script Generator

Generate a complete ~2.5 min script after the topic passes the filter.

5 Phases

  1. Re-confirm gate — Checks cognitive gap, material count, framework layer again. Hard stop if any fail.
  2. Content library retrieval — Searches concept library, quote library, and proven scripts before generating anything new.
  3. Script generation — Applies one of two templates:
    • Template A (Information Gap → Framework Replacement → Identity Close): Optimised for reach
    • Template B (Problem → AI Workflow → Visible Result): Optimised for saves
  4. Save to pipeline — Structured file with frontmatter (content type, cognitive gap, materials used).
  5. Style learning — Diffs user revisions across 4 dimensions (hook style, sentence feel, structure, CTA), updates a persistent style profile. Each revision is training data.

Most AI writing tools generate and forget. This agent treats every user edit as a signal. After 10+ revision cycles, the output converges toward the user's voice. This is the difference between a tool and a product.

Post-generation audit (7 checks)

  1. Opening 3-factor strength (multiplicative — any zero = rewrite)
  2. AI writing fingerprint detection
  3. Expression efficiency
  4. Layperson readability
  5. Unexplained jargon
  6. Causal chain between paragraphs
  7. Information gap (missing prerequisite knowledge)
Agent 4
Hook Optimiser

Diagnose content quality first, then generate 10–15 opening options. 90% of bad openings come from bad content, not bad copywriting.

Scoring Framework — 3 Factors, Multiplicative

Any factor = 0 means the opening has no force:

FactorWhat it measuresExample
Prediction disruptionDoes the opening break the viewer's default expectation?During the first few seconds, the viewer can't predict what you're saying next
Reward or loss signalCan the viewer state what they'll get (or miss) within 5 seconds?"Watch this and you'll get X" / "Scroll past and you'll miss X"
NamingDoes the opening label a feeling the viewer has but couldn't articulate?A new name for a vague feeling — the moment it's named, trust is built

Why multiplicative, not additive: If prediction disruption is zero (the opening is predictable), it doesn't matter how strong the reward signal is — viewers have already scrolled past. All three must be non-zero.

Generation Process

  1. Three-factor audit on existing opening
  2. Generate 10–15 options using 3 methods: material extraction, material supplementation, suspense creation
  3. Each option labelled with which factors it hits
Agent 5
Title Generator

Formula-driven title matching from 75 validated viral formulas. Every title has a formula number and traceability.

75 Formulas in 12 Categories

CategoryMechanism
Cognitive Conflict (1–6)Break existing belief
Curiosity Gap (7–12)Information asymmetry
Fear / Loss (13–20)"Not clicking = losing out"
Identity Injection (21–25)"This is about me"
Number Anchoring (26–32)Reduce cognitive load
Result Promise (33–40)Concrete outcome + timeframe
+ 6 more categories (Controversy, Scene/Condition, Action Call, Authority, Social Proof, Interaction)

Readiness gate: Checks if the production file has all three components (script + opening + title). All present → moves to filming queue. Any missing → blocks and reports.

Agent 6
Quality Check

Detect AI writing fingerprints. 22 patterns, 3 severity levels. Goal: "find your own voice."

Detection: 22 fingerprints (exhaustive counter-arguments, uniform parallel rhythms, zero hesitation, Chinese translation syntax). Each has genre-specific false-positive warnings.

Rewrite mode: Does not rewrite directly. Asks one targeted question per fingerprint: "Which of these parallel phrases is the one you most wanted to say?" The questions probe intent — so the user develops their voice rather than replacing AI patterns with different AI patterns.

Agent 7 — The Learning Engine
Data Review

This is where the system learns. Record data, run meta-review, extract rules, write back to methodology.

3 Mandatory Meta-Questions

  1. Was the cognitive gap judgment correct? Maps actual data against the claimed differentiation
  2. Was the content type right? E.g., if info-gap content got high bookmarks → probably should have been framework content
  3. Did the material choice produce the expected hook effect?

Rule Extraction

Only for notably above/below average results: phenomenon → content type → hypothesised cause → conclusion → next verification direction. Rules without a test are worthless.

Methodology Write-back

  • Reads methodology file
  • Appends validation evidence or counter-examples
  • Unexplainable result → new formula or "unvalidated"
  • Failure → considers adding to banned types

This closes the loop: the methodology file is what the Topic Filter reads. Every published piece updates the criteria that gate the next piece. The system gets more precise over time.


Part 3

Results

This system runs in production daily.

110K+
Douyin Views
3,607
Douyin Likes
7
Content Types Tested
3
Types Removed by System

Methodology file updated after every publish cycle. The system is still running and improving.


Part 4

What This Demonstrates

AI Product SkillWhere it shows up
Agent architecture design7 agents with routing, handoff rules, and input/output contracts
Prompt engineeringEach agent has specialised prompt logic (scoring formulas, filters, templates)
Evaluation framework design3-factor multiplicative hook scoring, 4-layer topic filter, 75-formula title matching
A/B testing & experimentationStructural A/B tests on content format, with controlled variables and metric-based conclusions
Feedback loop / iterationData Review → methodology write-back → Filter reads updated file
Data-driven decision making3 content types banned based on metrics, not intuition
User research thinkingThree-layer content test (information → framework replacement → identity)
Style learning / personalisationScript agent diffs user edits and updates a persistent style profile