Method

Direction for the agentic conversion system. Nothing here is a committed design — it states what we want to build and what we want to figure out by building it.

What we want to build

A pipeline that takes a conversion question, pulls relevant data from the sources mapped in semantic.md, surfaces what looks wrong, proposes hypotheses that pass the synthesis meta-checklist (data integrity / cohort identity / confounder), and produces concrete page / content / creative ideas to test on the new web.

The first versions can be rough. The point is a pipeline we can run, inspect, criticize, and improve — not a perfect first answer.

Pattern we are exploring

analytics → suggestions → execution

Analytics: what is happening in the funnel right now.
Suggestions: why it might be happening; what to test.
Execution: the concrete change to make and how to evaluate it.

This is the shape we are starting with. It may evolve as we learn what works.

Capabilities the system should grow into

In rough order:

Run a conversion question across the relevant sources without re-discovering the schema each time
Identify the biggest funnel leak and the worst segment at meaningful volume
Surface replay / session candidates worth a human review
Name the specific mechanism behind each finding (no canonical taxonomy — describe the thing in plain language)
Generate hypotheses with evidence for and against
Produce a test brief: change, segment, success metric, stop rule
Generate copy / page-section ideas concrete enough to hand to a designer or developer
Improve from explicit feedback after each run

Architecture

The committed design — agent topology, models, sources per persona, runtime, run storage, build approach — lives in architecture.md. This file stays direction-focused.

Tooling we need to build

Capabilities the system needs but does not yet have. New gaps are added when surfaced by a real run; entries are removed when delivered.

PostHog replay_recordings zero-result behavior — first PoC run found replay_recordings returned 0 sessions despite candidates being present in the same query window. Verify the tool’s filter shape against PostHog’s API and either fix the MCP wrapper or document the param shape that returns results. Surfaced from runs/2026-05-08-cyf-pass-through.
Cloudflare bot filtering — Cloudflare Bot Management API for filtering bot traffic from funnel reads. Named in week-4 as a likely contributor to direct-traffic inflation — promote when the direct share looks anomalous in a run.
Per-creative on-site progression — given a list of active Meta creatives, return on-site progression metrics (visit → product → order) per creative so the team can manually back-trace which creative concepts perform on the current web. Precondition for the creative-to-landing match analysis. Likely a PostHog query layered with Meta ad-name attribution.
Sentry — verify, complete, integrate (near-term). Three steps in order:
1. Verify Sentry is the right error-tracking tool, or whether we need error tracking at all on the frontend.
2. If yes: @sentry/react is installed in version1 but not version2; install on all variants so checkout and runtime errors in the variant under test are visible.
3. Expose recent errors as a tool inside the analytics MCP so the agent can query them when diagnosing technical drop-offs.
Microsoft 365 Connector verification — Data Master is accessible in Claude.ai chats via the Microsoft 365 Connector. Confirm a Claude Code subagent (not a claude.ai chat) can read the live SharePoint workbook through the connector; document the access path. For the Mastra runtime later this won’t work — a Microsoft Graph API integration in the analytics MCP will be needed at that point.
Cloudflare Browser Run setup — for the Page/UX subagent’s live-site navigation. Confirm Cloudflare account, configure CDP endpoint, plug chrome-devtools-mcp into the subagent. Same primitive will serve the Bun/Mastra runtime later via chromium.connectOverCDP().
Conversion-patterns skill — niche-specific named pattern library so the Page/UX agent can identify what’s missing on a page beyond generic UX heuristics. v0 (LLM synthesis) is generated; v1 will crawl reference pages with the Page/UX tooling (CEO input pending on brand list); v2 will be team-curated from patterns that recur in real runs. See ../skills/conversion-patterns/PLAN.md.
Frontend af_internal=agent opt-out — the Page/UX agent navigates the live site for diagnosis (~30 page-views per run); without an opt-out, that traffic pollutes PostHog / GTM / Plausible at a non-trivial share of low-volume days. Add a small handler in shared/src/utils/: when ?af_internal=agent is present on initial page load, call posthog.opt_out_capturing() and short-circuit any GTM dataLayer pushes for the session. Once shipped, the agent skill (conversion-page-inspection) gets a rule to append the param on every navigate_page.
Run-output indexing — once runs/ grows beyond ~50 entries, a flat index.md or lightweight DB will help. Defer.

Open questions

A few design questions remain — answered by running the system, not by writing them up front:

Whether named patterns emerge across runs that would be worth promoting into a small shared vocabulary (curation-driven, not pre-committed taxonomy)
Where execution stops being manual and starts being agent-generated

Principles for whatever we build

Facts and hypotheses are always separated; volatile numbers stay in the run output, not in the KB.
Agent outputs are hypotheses, not findings. Every claim is evaluated for plausibility, then either falsified by data or tested at small scale — never absorbed as truth.
The synthesis meta-checklist in semantic.md (data integrity / cohort identity / confounder) is applied before any finding is finalised — “data is misleading” is always on the table.
Stakeholder-facing language stays plain.
Rough is fine; opaque is not — the pipeline must be inspectable.