AI Agent System Design

Part 2

System Architecture

Overview

7 specialised agents, connected through a routing layer. Each agent has:

A defined trigger (when it activates)
A specific input/output contract (what it receives, what it produces)
Diagnostic logic (how it evaluates quality)
Handoff rules (when it passes work to the next agent, or rejects it back)

The Pipeline — Agent Handoff Flow

7 agents, 3 connection types. Solid arrows = sequential handoff; gold/sage arrows = feedback loop through methodology.md; dashed slate lines = Quality Check called in parallel from any generator. The feedback loop is the reason the filter gets stricter over time — not better prompts.

Methodology Evolution — What the Feedback Loop Actually Changes

The claim "the system learns" is only meaningful if the structure of decisions changes over time. Below is the methodology.md file before and after three publish-and-review cycles — same structural shape, but the Banned Set (what the Topic Filter rejects) is grown by the Data Review agent, not by me editing the prompt.

Before — v1 baseline

# methodology.md
 
## Banned Set
(empty — no data yet)
 
## Validated formats
(empty — no data yet)

Filter approves anything that passes the structural sieves. No data to override intuition with.

After — v3 (post 12 publishes)

# methodology.md
 
## Banned Set
+ Pure information transfer
   (save>like = utility mode)
+ Tool install tutorials
   (no pain, no audience)
+ Lifestyle w/o AI frame
   (dilutes positioning)
 
## Validated formats
+ Time-sensitive × test × frame
+ Real pain × workflow × result
+ Counter-intuitive answer
+ Carousel · pain × research × fix

Filter now rejects 3 entire content categories before generation begins. The list grew only from Data Review writebacks — the prompt in the Topic Filter agent is unchanged from v1.

The prompt didn't get better. The file the prompt reads got more precise. This is the argument for structured state being the leverage point in multi-agent systems — more than prompt engineering, more than model upgrades.

Agent 1

Router

▼

Pure dispatcher. Identifies user intent in one sentence, routes to the correct agent.

Design decision: The routing layer prevents the system from trying to do everything at once. If multiple needs exist, it forces sequential resolution — a deliberate constraint that mirrors how product teams triage requests.

Agent 2 — The Gate

Topic Filter

▼

Binary go/no-go decision. Core belief: 80% of bad content comes from bad topics, not bad execution.

Why this matters: In any AI product, the most expensive mistake is building the wrong thing. This agent prevents the system from investing effort in content that will fail.

Filter	What it checks	Pass	Kill
1. Cognitive Gap	Does this version have a reason to exist?	First mover, clearer framework, or first-hand experience	Same as existing content
2. Material Check	What raw material exists? (data, stories, quotes, failures)	2+ material types	0 materials = hard stop
3. Three-Layer Test	Info → Framework replacement → Identity	All 3 layers answered	Pure information with no framework
4. Methodology Validation	Matches proven formula? Hits banned type?	Verified formula with historical data	Banned type = stop with data citation

Key design decision: Filters run sequentially with user check-ins between each — not batch processed. The agent is a filter, not an advocate. It will never help the user rationalise a passing grade.

Critical dependency: Filter 4 reads from the methodology file, which is updated by the Data Review agent. This is the feedback loop — the gate gets smarter over time.

Agent 3

Script Generator

▼

Generate a complete ~2.5 min script after the topic passes the filter.

5 Phases

Re-confirm gate — Checks cognitive gap, material count, framework layer again. Hard stop if any fail.
Content library retrieval (RAG pattern) — Retrieves from a structured corpus (concept library, quote library, proven scripts) before generating. Retrieval-before-generation is what prevents the model from hallucinating examples it hasn’t seen.
Script generation — Applies one of two templates:
- Template A (Information Gap → Framework Replacement → Identity Close): Optimised for reach
- Template B (Problem → AI Workflow → Visible Result): Optimised for saves
Save to pipeline — Structured file with frontmatter (content type, cognitive gap, materials used).
Style learning — Diffs user revisions across 4 dimensions (hook style, sentence feel, structure, CTA), updates a persistent style profile. Each revision is training data.

Most AI writing tools generate and forget. This agent treats every user edit as a signal. After 10+ revision cycles, the output converges toward the user's voice. This is the difference between a tool and a product.

Post-generation audit (7 checks)

Opening 3-factor strength (multiplicative — any zero = rewrite)
AI writing fingerprint detection
Expression efficiency
Layperson readability
Unexplained jargon
Causal chain between paragraphs
Information gap (missing prerequisite knowledge)

Agent 4

Hook Optimiser

▼

Diagnose content quality first, then generate 10–15 opening options. 90% of bad openings come from bad content, not bad copywriting.

Scoring Framework — 3 Factors, Multiplicative

Any factor = 0 means the opening has no force:

Factor	What it measures	Example
Prediction disruption	Does the opening break the viewer's default expectation?	During the first few seconds, the viewer can't predict what you're saying next
Reward or loss signal	Can the viewer state what they'll get (or miss) within 5 seconds?	"Watch this and you'll get X" / "Scroll past and you'll miss X"
Naming	Does the opening label a feeling the viewer has but couldn't articulate?	A new name for a vague feeling — the moment it's named, trust is built

Why multiplicative, not additive: If prediction disruption is zero (the opening is predictable), it doesn't matter how strong the reward signal is — viewers have already scrolled past. All three must be non-zero.

Generation Process

Three-factor audit on existing opening
Generate 10–15 options using 3 methods: material extraction, material supplementation, suspense creation
Each option labelled with which factors it hits

Agent 5

Title Generator

▼

Formula-driven title matching from 75 validated viral formulas. Every title has a formula number and traceability.

75 Formulas in 12 Categories

Category	Mechanism
Cognitive Conflict (1–6)	Break existing belief
Curiosity Gap (7–12)	Information asymmetry
Fear / Loss (13–20)	"Not clicking = losing out"
Identity Injection (21–25)	"This is about me"
Number Anchoring (26–32)	Reduce cognitive load
Result Promise (33–40)	Concrete outcome + timeframe
+ 6 more categories (Controversy, Scene/Condition, Action Call, Authority, Social Proof, Interaction)

Readiness gate: Checks if the production file has all three components (script + opening + title). All present → moves to filming queue. Any missing → blocks and reports.

Agent 6

Quality Check

▼

Detect AI writing fingerprints. 22 patterns, 3 severity levels. Goal: "find your own voice."

Detection: 22 fingerprints (exhaustive counter-arguments, uniform parallel rhythms, zero hesitation, Chinese translation syntax). Each has genre-specific false-positive warnings.

Rewrite mode: Does not rewrite directly. Asks one targeted question per fingerprint: "Which of these parallel phrases is the one you most wanted to say?" The questions probe intent — so the user develops their voice rather than replacing AI patterns with different AI patterns.

Agent 7 — The Learning Engine

Data Review

▼

This is where the system learns. Record data, run meta-review, extract rules, write back to methodology.

3 Mandatory Meta-Questions

Was the cognitive gap judgment correct? Maps actual data against the claimed differentiation
Was the content type right? E.g., if info-gap content got high bookmarks → probably should have been framework content
Did the material choice produce the expected hook effect?

Rule Extraction

Only for notably above/below average results: phenomenon → content type → hypothesised cause → conclusion → next verification direction. Rules without a test are worthless.

Methodology Write-back

Reads methodology file
Appends validation evidence or counter-examples
Unexplainable result → new formula or "unvalidated"
Failure → considers adding to banned types

This closes the loop: the methodology file is what the Topic Filter reads. Every published piece updates the criteria that gate the next piece. The system gets more precise over time.

AI Product Skill	Where it shows up
Agent architecture design	7 agents with routing, handoff rules, and input/output contracts
Prompt engineering	Each agent has specialised prompt logic (scoring formulas, filters, templates)
Evaluation framework design	3-factor multiplicative hook scoring, 4-layer topic filter, 75-formula title matching
A/B testing & experimentation	Structural A/B tests on content format, with controlled variables and metric-based conclusions
Feedback loop / iteration	Data Review → methodology write-back → Filter reads updated file
Data-driven decision making	3 content types banned based on metrics, not intuition
User research thinking	Three-layer content test (information → framework replacement → identity)
Style learning / personalisation	Script agent diffs user edits and updates a persistent style profile

Designing a Multi-Agent AI System
That Learns From Its Own Output

The Problem

System Architecture

Overview

The Pipeline — Agent Handoff Flow

Methodology Evolution — What the Feedback Loop Actually Changes

5 Phases

Post-generation audit (7 checks)

Scoring Framework — 3 Factors, Multiplicative

Generation Process

75 Formulas in 12 Categories

3 Mandatory Meta-Questions

Rule Extraction

Methodology Write-back

Results

What This Demonstrates

Designing a Multi-Agent AI SystemThat Learns From Its Own Output

The Problem

System Architecture

Overview

The Pipeline — Agent Handoff Flow

Methodology Evolution — What the Feedback Loop Actually Changes

5 Phases

Post-generation audit (7 checks)

Scoring Framework — 3 Factors, Multiplicative

Generation Process

75 Formulas in 12 Categories

3 Mandatory Meta-Questions

Rule Extraction

Methodology Write-back

Results

What This Demonstrates

Designing a Multi-Agent AI System
That Learns From Its Own Output