Semantic Layer

How a business question maps to data: the funnel definition, event taxonomy, segments, source map, and the meta-checklist that gates orchestrator synthesis.

This file changes when the funnel structure, event names, or the synthesis meta-checklist change. Other layers inherit from it.

Funnel

The German paid funnel, in stages:

Visit — visitor lands on the site
Selection — visitor reaches the fish/product selection experience
Product — visitor reaches a product page or product-intent step
Cart / checkout start — visitor adds to cart or begins checkout
Payment info — visitor commits to payment
Purchase — visitor completes a paid order

Reason from semantic stages, not from URL paths. Page names and routes change; the stages do not.

Event taxonomy

The canonical event names are GA4-aligned. These are the only event names the KB references.

Event	Stage	Meaning
`page_view`	Visit	Visitor viewed a page or route in the GTM / GA4-shaped dataLayer
`view_item_list`	Selection	Visitor viewed a list of products / fish choices
`select_item`	Selection	Visitor selected a product from the list
`add_to_cart`	Cart	Visitor added a product / cart contents
`begin_checkout`	Checkout start	Visitor started checkout / order details
`add_payment_info`	Payment info	Visitor selected or submitted payment information
`purchase`	Purchase	Paid purchase confirmed

Stage-to-role intent

Each funnel stage has a user-journey role, not a fixed route. Roles change route slowly; routes change file path quickly. Diagnosis runs resolve role → current route by reading version1/src/router.tsx and the relevant page components on origin/main — never by looking up a route from the KB. The resolved binding lives in the run’s inputs.json under emission_map, not here.

Stage	Role	How to resolve to current route
Visit	Ad lander (paid traffic) + organic landings	Meta `get_meta_ad_*` destination URLs (paid); router `index` route or `$session_entry_pathname` in PostHog (organic)
Selection	Ad-lander product surface (where paid visitors first see products)	Whichever component the ad-lander route renders; grep for product-list / product-showcase sections inside it
Product	Product detail	`router.tsx` route whose component imports `useProductPageAnalytics` (or equivalent product-page analytics hook)
Cart	Cart / direct checkout entry	`router.tsx` route(s) whose components emit `add_to_cart` or `begin_checkout`
Payment info	Checkout form	`router.tsx` route(s) whose components emit `add_payment_info`
Purchase	Payment confirmation	`router.tsx` route(s) whose components emit `purchase`

When a funnel event is emitted from a route that does not fill the role its stage implies, that is an instrumentation mismatch and must be surfaced as a candidate finding before any volume-based hypothesis is formed on that event.

Instrumentation mismatches are per-event, not per-stage. A stage typically has more than one event; one being mismatched does not invalidate the others. Test each event independently against the role, and follow the navigation graph — if the lander’s onClick navigates to a route that itself emits a funnel event, that downstream event is a valid signal for the stage even though the originating route does not emit it. The canonical walk-through (both view_item_list mismatched and select_item not-mismatched on the same Selection stage) lives in skills/deploy-history/SKILL.md → “Build a current emission map”.

Event envelope

Every event carries an envelope. In PostHog these are exposed as flat snake_case super-properties (in GA4 as their web.* / page.* nested equivalents):

web_version, web_variant — which web build and which A/B variant served the event
experiment_id, experiment_name — the active experiment, if any
page_path, page_location, page_title, page_type — page metadata for path-level breakdowns

These let cohorts filter by variant, version, or path directly without joins.

PostHog pageview contract

PostHog has two pageview concepts in this workspace:

$pageview — PostHog-native pageview. Use this for PostHog DAU / WAU / retention widgets and PostHog product-analytics style “active users” reporting.
page_view — GA4/GTM-shaped dataLayer event. Use this for GTM Preview, GA4-shaped ecommerce funnel joins, and cross-source schema parity. Do not use it as the source for PostHog built-in DAU widgets.

The intended frontend contract after AlaskanFishermanFrontend branch fix/posthog-native-pageviews is merged on 2026-05-11: PostHog captures native $pageview via capture_pageview: 'history_change'; afDl.push('page_view') continues to publish the GTM/dataLayer page event but no longer mirrors that page event into PostHog. Ecommerce/business events still mirror into PostHog with the same names as the dataLayer events.

Required segments

Every conversion-diagnosis read should be segmentable by:

source / channel (Meta, Google, organic, direct, …)
Meta campaign / ad / ad name (where Meta is the source)
device (mobile, tablet, desktop)
web version / variant (version1 / version2 inside the new web)
new vs returning visitor — also a budget-allocation lever: repeat-converter dominance pushes retargeting weight up; first-visit converter dominance keeps the lever upstream (creative-to-landing match)
landing / entry path
checkout / payment method (where the question reaches that stage)

If a segment is not currently joinable in a given source, that is itself a finding — surface it as an instrumentation gap, not as zero.

Source map — where to query

Different questions go to different sources. The KB describes who answers what; current values are pulled from the tool, not stored here.

Source	Best for	Not for
Meta Ads	Spend, delivery, campaign / ad / creative performance, Meta-reported CPA	Onsite behaviour truth
PostHog	Funnel drop-off, user journeys, sessions before conversion, replay candidates, behaviour cohorts	Ad-platform delivery truth
GA4	Channel / acquisition / event analysis where the relevant events are reliably configured	Friction diagnosis; full journey when events are incomplete
Plausible	Lightweight traffic, page / source summaries, custom-property checks	Definitive ecommerce funnel diagnosis
Clarity	Aggregate friction signals, rage / dead clicks, page-level UX	Person-level journey
Data Master	Daily business KPI reporting, created / paid orders, stakeholder-readable funnel	Granular event or replay analysis
Cloudflare	Routing, bot / edge signals where exposed	Conversion source of truth

The analytics MCP server exposes tools for GA4, Meta, Plausible, Clarity, and PostHog. Use the MCP *_query_capabilities tools to discover the current authoritative tool list and parameter shapes per source — do not hard-code tool names in agent prompts.

PostHog tools cover funnel summary, paths, person journey, sessions / time before conversion, checkout drop-off, form friction, instrumentation health, and a family of replay tools (candidates, cohorts, analysis, recordings, stream, download plan). Connection setup is in ../skills/analytics-mcp/SKILL.md.

Source-selection rules

Use Meta for what Meta delivered and charged.
Use PostHog for onsite behaviour after PostHog launch.
Use GA4 when the question needs GA4 acquisition dimensions and the relevant events are reliably present.
Use Plausible for fast traffic / page / source reads.
Use Clarity only for aggregate friction and page-level UX signals; never for person-level journey claims.
Use Data Master for stakeholder-readable daily business KPIs.
Do not use noisy auto-captured form events as primary KPIs.

Cross-source workflows

Common diagnostic patterns that combine sources. Starter patterns, not fixed sequences — agents adapt the chain to the question.

Full funnel review — GA4 (traffic + acquisition) → PostHog (drop-off at each funnel stage) → Clarity (page-level friction on the worst-leak stage).
Campaign performance — Meta (spend, ROAS, creatives) → GA4 (sessions and conversions from paid) → PostHog (onsite progression of paid visitors, segmented by ad name).
Checkout investigation — PostHog checkout-dropoff tools (find the step) → Clarity (UX signals on that step) → PostHog replay candidates (sessions worth watching).
Variant comparison — segment GA4 and PostHog by variant. Never average across variants.
Default funnel review — when no specific question constrains it, the canonical pillar order is: (1) stage drop on lander → Choose-your-fish → product, (2) new vs returning split, (3) Meta-only channel slice.

Interpretation rules

Do not compare Meta-reported purchases directly to PostHog purchases without checking attribution window and instrumentation launch.
Do not average across web versions / variants or major flow changes without segmenting.
Created orders and paid orders answer different questions. CR and CPA are computed against paid orders for final reporting; created orders are operational signals.
CR and CPA are goals; intermediate funnel ratios are diagnostics.
Funnel drops are page-route transitions first, event counts second. When a question anchors on “biggest drop / between which stages”, lead with a route-to-route delta from get_posthog_paths_summary and back it with the event-based count. If the two disagree by more than ~2×, the route count is the user-presence floor and the event delta is the data gap — do not let an event-instrumentation gap hide a real route-to-route drop. The funnel being non-strict (visitors can reach begin_checkout without prior add_to_cart) is a special case of the same rule: trust the route, treat the event ratio as diagnostic.
Facebook vs Instagram separation can be imperfect; referrer, in-app browser, UTM, and privacy behaviour collapse some of it into a single Meta source or into direct.
If PostHog launched mid-window, treat that period as directional only.
A segment with too few sessions to be meaningful is itself a finding — surface the volume, do not silently report a ratio.
A reading more than ~30% off the rolling average is worth surfacing as an anomaly, not absorbing as the new normal.

Known instrumentation gaps

Data-quality limits that affect what the agentic system can answer. Updated when gaps are closed or new ones surface.

The funnel is non-strict. Visitors can reach begin_checkout without a prior add_to_cart — both because the checkout route can be reached directly (e.g. /shop/order-details-2) and because there are two separate begin_checkout firing paths (CartPage.tsx legacy and CheckoutPage.tsx structured) that aren’t aligned. Treat each stage as independent where useful.
Direct-traffic inflation. iOS strips referrer parameters, Google search no longer forwards query strings, Instagram routes via multiple subdomains, and in-app vs web-browser sessions surface differently — Meta and Google traffic regularly leak into “direct”. Cloudflare bot signals could recover some of this; without that, treat “direct” as a noisy bucket.
PostHog purchase count ≠ clean order count. PostHog includes test orders, duplicate fires, and repeat-buyer multi-counts. Reconcile against the Excel / Data Master clean count before reporting CR or CPA — difference is typically a few orders per day.
PostHog pageview transition window. PR #1 (feat/de-gtm-and-posthog-parity) merged to AlaskanFishermanFrontend origin/main at 2026-05-04 20:43 CET and changed PostHog init to capture_pageview: false while adding the custom page_view event. Result: May 4 is a partial custom-event day. The affected interval is 2026-05-05 through 2026-05-11, until the fix/posthog-native-pageviews fix is merged: during that interval, custom page_view is the reliable onsite funnel-entry event, while PostHog built-in DAU / WAU widgets based on $pageview undercount. Follow-up PR #3 (chore/posthog-acquisition-context, merged 2026-05-06) added UTM/Meta acquisition properties; PR #4 (feat/view-cart-datalayer, merged 2026-05-07) added view_cart; PR #7 (fix/internal-traffic-opt-out, merged 2026-05-09) removed agent/internal traffic from future analytics fires. The fix/posthog-native-pageviews frontend branch restores PostHog-native $pageview via SPA history_change mode and stops mirroring only the dataLayer page_view event into PostHog. For historical analysis, use custom page_view for onsite funnel entry during 2026-05-05..2026-05-11, and use $pageview again for PostHog DAU / WAU / retention after the fix merge.

Synthesis meta-checklist

There is no cause-bucket taxonomy. Buckets force premature labelling and invite hedging (“A4, secondary A3”); they crowd out the act of naming a specific, testable mechanism. Findings should describe what is wrong in plain language and at the smallest grain useful for action (e.g. “hero is 802px tall on a 390px viewport, pushing CTA below the fold”, not “A4”).

In place of buckets: before the orchestrator finalises any hypothesis, it must answer all three meta-checks below. Each is a forcing function against a real failure mode observed in past runs. A finding that doesn’t pass all three goes back for more evidence, not into the diagnosis.

Data integrity. Could the underlying numbers or measurements be wrong? Examples to rule out: instrumentation mismatch (event wired to a route paid traffic doesn’t visit), sampling / volume too small, attribution leakage (iOS referrer stripping, in-app browsers), event-id coverage gaps, mid-window instrumentation changes.
Cohort identity. Are the users in this analysis actually the population I’m reasoning about? Examples to rule out: replay-candidate filter returning the wrong cohort (e.g. returning-customer login flows masquerading as paid bouncers), new-vs-returning contamination, source-attribution collapsing Meta into direct, returning visitors carrying stale UTMs.
Confounder. Is there a third variable that could explain the finding equally well? Examples to rule out: audience tier vs creative (IG retargeting vs FB cold prospecting); CBO budget allocation vs creative quality; rollout-window timing vs underlying trend; page-redesign vs concurrent traffic-mix shift.

If a finding survives all three with positive evidence (not assertion), it earns a place in the diagnosis. If any one fails, the orchestrator either re-queries to close the gap or downgrades the finding to a “what we can’t tell yet” entry in Data gaps.

Useful questions the semantic layer is asked

Where is the largest absolute drop-off in the funnel?
Which segment has the worst drop-off at meaningful volume?
What specific page / creative / data mechanism is most consistent with the observed leak?
Do converters behave differently from non-converters before product selection?
How many sessions / days happen before purchase?
Which creative themes bring traffic that actually progresses?
Where is the data itself the problem (meta-check #1 fails)?