Architecture

The committed design of the agentic conversion system. method.md stays direction-focused; this file holds the decisions.

Topology

One orchestrator delegates to four persona subagents. The orchestrator owns the question, delegates analysis, synthesizes findings, and produces the diagnosis artifact. Subagents own narrow domains.

Agent Role Model
Orchestrator Owns the question; delegates; synthesizes; writes the run artifact Opus
Ads / Creative analyst Why traffic comes; what creative does on-site Sonnet
Funnel / Journey analyst Where users drop, who drops, before-purchase patterns Sonnet
Page / UX analyst What is wrong on the page itself; fans out to replay-analyzer for cohort replay coverage Sonnet
Code analyst What the code actually does, what shipped to production, when it shipped, where the spec and the implementation diverge — read-only across the frontend and analytics-MCP repos Sonnet
Replay analyzer One PostHog session replay, one observation; called in parallel by Page/UX across many sessions Haiku

PoC topology is flat at the persona level — orchestrator delegates to one of the persona subagents directly. The Page/UX persona may fan out to replay-analyzer subagents (Haiku) for per-session replay analysis whenever a cohort needs more than a handful of sessions reviewed. This is the only nested case and is justified by replay volume; other personas should not nest by default.

Sources per persona

Each persona has a primary lens (defines the kind of question it owns) and supporting sources.

Persona Primary Supporting
Ads / Creative Meta Ads + Meta creative library PostHog (ad-name attribution), GA4, Plausible
Funnel / Journey PostHog GA4, Plausible, Cloudflare, Data Master
Page / UX PostHog replays + live browser visit Plausible (path-level), Clarity (when ingestion lands)
Code AlaskanFishermanFrontend and workflow git repos (origin/main + git log + grep) semantic.md (the spec the code is checked against); deploy-history skill

The Page/UX agent uses Cloudflare Browser Run for live multi-device browser navigation and the analytics MCP’s PostHog replay tools for replay analysis. The analytics MCP is for analytics data sources only — operational tooling (browsers, screenshots, error logs) lives outside it.

Cloudflare Browser Run was chosen because the same primitive serves both runtimes: in Claude Code, point chrome-devtools-mcp at the CDP endpoint; in Bun/Mastra later, call chromium.connectOverCDP() against the same URL. No library swap, full Playwright device-emulation API, ~$0.09/browser-hour with the first 10 hours/month free. Fallback if setup hits friction: Steel.dev (100 browser-hours/month free, self-hostable).

Subagents call MCP *_query_capabilities tools to discover the current authoritative tool list rather than hard-coding tool names.

QA discipline

Baked into every subagent’s prompt rather than implemented as a separate agent. Each subagent must include in its output:

  • Confidence: — low / medium / high with one-line reason
  • Data gaps: — what is missing or unjoinable that affects the answer
  • Could-be-wrong-because: — an alternative explanation worth considering, mapped through the synthesis meta-checklist in semantic.md (data integrity / cohort identity / confounder)

A separate QA agent gets added only if subagent prompt discipline turns out not to be enough.

Output protocol

Subagents return structured findings to the orchestrator. The orchestrator synthesizes into the six-section diagnosis artifact:

  • Bottom line — 2–3 sentences with the strongest finding
  • Evidence — sourced facts; each cites tool + date window
  • Hypotheses — ranked, each naming the specific mechanism (page / creative / data / audience) and citing evidence for and against
  • Recommended test — change, target segment, primary metric, run length, stop rule, implementation notes
  • What to inspect manually — replay candidates, screenshots to compare, stakeholder questions
  • Data gaps — what blocked confidence; what tool / property / coverage gap to fix next

Replay-coverage convergence rule

Any run whose hypothesis space includes a page-level mechanism (layout, content, value-prop, mobile UX, price clarity, ad-to-page match) must reach ≥20 replay-analyzer observations across the relevant cohorts before the orchestrator converges. This is the line between “observation” and “evidence.”

  • The fan-out can be executed by page-ux-analyst (its default home) or by the orchestrator directly when session_ids are available and page-ux failed to execute it — see .claude/agents/conversion-orchestrator.md step 4.
  • A run that converges below 20 replays without an explicit “instrumentation prevents it” justification (e.g. PostHog replay sample-rate too low, cohort genuinely n<20 at meaningful volume) is not converged. Surface the unmet target as the headline data gap, not as a footnote, and either re-iterate or document the gap as the next-action requirement.

Run storage

Diagnosis runs land in ../runs/ — one folder per run, named <YYYY-MM-DD>-<short-slug>/. Per-run files:

  • question.md — what was asked (date window, segments, topology used)
  • findings.md — the diagnosis artifact
  • feedback.md — human reaction post-run (what was useful, wrong, vague)
  • inputs.json — machine-readable parameters for reproducibility

Append-only. Future runs grep prior runs to avoid repeating analyses and to follow up on past hypotheses. See ../runs/README.md for conventions.

Runtime

PoC: Claude Code sub-agents with custom agent definitions under ../.claude/agents/. Subscription-based (no per-token cost), fastest to ship, leverages existing MCP and skills.

Future migration target: Mastra.ai (TypeScript agent framework) — graph-based control, versioned in code, memory and observability. Decision deferred until the PoC validates the pattern.

Build approach

Each subagent ships in a deliberately minimal V1 — fewest tools, narrowest primer, simplest prompt that can produce a usable finding. The first run’s value is the end-to-end loop working, not depth in any single persona. After the first run, iterate on the persona that most underperformed; resist trying to improve all three in parallel.

Replay coverage target: for any high-volume cohort under investigation, the Page/UX persona must fan out to at least 20 replay-analyzer subagents in parallel, distributed across converter / near-converter / bouncer cohorts (at least 5 of each, more weight on the cohort the question anchors to). Three replays is observation; twenty is evidence. Cheap-Haiku fan-out makes this feasible at the cost of a single Sonnet call.

Stakeholder framing: lead with “this is a PoC of the loop; depth comes next.” Otherwise rough first-cut output gets read as the system’s ceiling.

First question

The PoC’s anchor question:

Why are visitors dropping at Choose-your-fish, and what should we test first to lift the pass-through 20%?

Strong because it targets the dominant leak per the leading hypothesis, doesn’t pre-commit to a cause, and the 20% lift target is concrete.


This site uses Just the Docs, a documentation theme for Jekyll.