Semantic Layer

How a business question maps to data: the funnel definition, event taxonomy, segments, source map, and the meta-checklist that gates orchestrator synthesis.

This file changes when the funnel structure, event names, or the synthesis meta-checklist change. Other layers inherit from it.

Funnel

The German paid funnel, in stages:

  1. Visit — visitor lands on the site
  2. Selection — visitor reaches the fish/product selection experience
  3. Product — visitor reaches a product page or product-intent step
  4. Cart / checkout start — visitor adds to cart or begins checkout
  5. Payment info — visitor commits to payment
  6. Purchase — visitor completes a paid order

Reason from semantic stages, not from URL paths. Page names and routes change; the stages do not.

Event taxonomy

The canonical event names are GA4-aligned. These are the only event names the KB references.

Event Stage Meaning
page_view Visit Visitor viewed a page or route in the GTM / GA4-shaped dataLayer
view_item_list Selection Visitor viewed a list of products / fish choices
select_item Selection Visitor selected a product from the list
add_to_cart Cart Visitor added a product / cart contents
begin_checkout Checkout start Visitor started checkout / order details
add_payment_info Payment info Visitor selected or submitted payment information
purchase Purchase Paid purchase confirmed

Stage-to-role intent

Each funnel stage has a user-journey role, not a fixed route. Roles change route slowly; routes change file path quickly. Diagnosis runs resolve role → current route by reading version1/src/router.tsx and the relevant page components on origin/main — never by looking up a route from the KB. The resolved binding lives in the run’s inputs.json under emission_map, not here.

Stage Role How to resolve to current route
Visit Ad lander (paid traffic) + organic landings Meta get_meta_ad_* destination URLs (paid); router index route or $session_entry_pathname in PostHog (organic)
Selection Ad-lander product surface (where paid visitors first see products) Whichever component the ad-lander route renders; grep for product-list / product-showcase sections inside it
Product Product detail router.tsx route whose component imports useProductPageAnalytics (or equivalent product-page analytics hook)
Cart Cart / direct checkout entry router.tsx route(s) whose components emit add_to_cart or begin_checkout
Payment info Checkout form router.tsx route(s) whose components emit add_payment_info
Purchase Payment confirmation router.tsx route(s) whose components emit purchase

When a funnel event is emitted from a route that does not fill the role its stage implies, that is an instrumentation mismatch and must be surfaced as a candidate finding before any volume-based hypothesis is formed on that event.

Instrumentation mismatches are per-event, not per-stage. A stage typically has more than one event; one being mismatched does not invalidate the others. Test each event independently against the role, and follow the navigation graph — if the lander’s onClick navigates to a route that itself emits a funnel event, that downstream event is a valid signal for the stage even though the originating route does not emit it. The canonical walk-through (both view_item_list mismatched and select_item not-mismatched on the same Selection stage) lives in skills/deploy-history/SKILL.md → “Build a current emission map”.

Event envelope

Every event carries an envelope. In PostHog these are exposed as flat snake_case super-properties (in GA4 as their web.* / page.* nested equivalents):

  • web_version, web_variant — which web build and which A/B variant served the event
  • experiment_id, experiment_name — the active experiment, if any
  • page_path, page_location, page_title, page_type — page metadata for path-level breakdowns

These let cohorts filter by variant, version, or path directly without joins.

PostHog pageview contract

PostHog has two pageview concepts in this workspace:

  • $pageview — PostHog-native pageview. Use this for PostHog DAU / WAU / retention widgets and PostHog product-analytics style “active users” reporting.
  • page_view — GA4/GTM-shaped dataLayer event. Use this for GTM Preview, GA4-shaped ecommerce funnel joins, and cross-source schema parity. Do not use it as the source for PostHog built-in DAU widgets.

The intended frontend contract after AlaskanFishermanFrontend branch fix/posthog-native-pageviews is merged on 2026-05-11: PostHog captures native $pageview via capture_pageview: 'history_change'; afDl.push('page_view') continues to publish the GTM/dataLayer page event but no longer mirrors that page event into PostHog. Ecommerce/business events still mirror into PostHog with the same names as the dataLayer events.

Required segments

Every conversion-diagnosis read should be segmentable by:

  • source / channel (Meta, Google, organic, direct, …)
  • Meta campaign / ad / ad name (where Meta is the source)
  • device (mobile, tablet, desktop)
  • web version / variant (version1 / version2 inside the new web)
  • new vs returning visitor — also a budget-allocation lever: repeat-converter dominance pushes retargeting weight up; first-visit converter dominance keeps the lever upstream (creative-to-landing match)
  • landing / entry path
  • checkout / payment method (where the question reaches that stage)

If a segment is not currently joinable in a given source, that is itself a finding — surface it as an instrumentation gap, not as zero.

Source map — where to query

Different questions go to different sources. The KB describes who answers what; current values are pulled from the tool, not stored here.

Source Best for Not for
Meta Ads Spend, delivery, campaign / ad / creative performance, Meta-reported CPA Onsite behaviour truth
PostHog Funnel drop-off, user journeys, sessions before conversion, replay candidates, behaviour cohorts Ad-platform delivery truth
GA4 Channel / acquisition / event analysis where the relevant events are reliably configured Friction diagnosis; full journey when events are incomplete
Plausible Lightweight traffic, page / source summaries, custom-property checks Definitive ecommerce funnel diagnosis
Clarity Aggregate friction signals, rage / dead clicks, page-level UX Person-level journey
Data Master Daily business KPI reporting, created / paid orders, stakeholder-readable funnel Granular event or replay analysis
Cloudflare Routing, bot / edge signals where exposed Conversion source of truth

The analytics MCP server exposes tools for GA4, Meta, Plausible, Clarity, and PostHog. Use the MCP *_query_capabilities tools to discover the current authoritative tool list and parameter shapes per source — do not hard-code tool names in agent prompts.

PostHog tools cover funnel summary, paths, person journey, sessions / time before conversion, checkout drop-off, form friction, instrumentation health, and a family of replay tools (candidates, cohorts, analysis, recordings, stream, download plan). Connection setup is in ../skills/analytics-mcp/SKILL.md.

Source-selection rules

  • Use Meta for what Meta delivered and charged.
  • Use PostHog for onsite behaviour after PostHog launch.
  • Use GA4 when the question needs GA4 acquisition dimensions and the relevant events are reliably present.
  • Use Plausible for fast traffic / page / source reads.
  • Use Clarity only for aggregate friction and page-level UX signals; never for person-level journey claims.
  • Use Data Master for stakeholder-readable daily business KPIs.
  • Do not use noisy auto-captured form events as primary KPIs.

Cross-source workflows

Common diagnostic patterns that combine sources. Starter patterns, not fixed sequences — agents adapt the chain to the question.

  • Full funnel review — GA4 (traffic + acquisition) → PostHog (drop-off at each funnel stage) → Clarity (page-level friction on the worst-leak stage).
  • Campaign performance — Meta (spend, ROAS, creatives) → GA4 (sessions and conversions from paid) → PostHog (onsite progression of paid visitors, segmented by ad name).
  • Checkout investigation — PostHog checkout-dropoff tools (find the step) → Clarity (UX signals on that step) → PostHog replay candidates (sessions worth watching).
  • Variant comparison — segment GA4 and PostHog by variant. Never average across variants.
  • Default funnel review — when no specific question constrains it, the canonical pillar order is: (1) stage drop on lander → Choose-your-fish → product, (2) new vs returning split, (3) Meta-only channel slice.

Interpretation rules

  • Do not compare Meta-reported purchases directly to PostHog purchases without checking attribution window and instrumentation launch.
  • Do not average across web versions / variants or major flow changes without segmenting.
  • Created orders and paid orders answer different questions. CR and CPA are computed against paid orders for final reporting; created orders are operational signals.
  • CR and CPA are goals; intermediate funnel ratios are diagnostics.
  • Funnel drops are page-route transitions first, event counts second. When a question anchors on “biggest drop / between which stages”, lead with a route-to-route delta from get_posthog_paths_summary and back it with the event-based count. If the two disagree by more than ~2×, the route count is the user-presence floor and the event delta is the data gap — do not let an event-instrumentation gap hide a real route-to-route drop. The funnel being non-strict (visitors can reach begin_checkout without prior add_to_cart) is a special case of the same rule: trust the route, treat the event ratio as diagnostic.
  • Facebook vs Instagram separation can be imperfect; referrer, in-app browser, UTM, and privacy behaviour collapse some of it into a single Meta source or into direct.
  • If PostHog launched mid-window, treat that period as directional only.
  • A segment with too few sessions to be meaningful is itself a finding — surface the volume, do not silently report a ratio.
  • A reading more than ~30% off the rolling average is worth surfacing as an anomaly, not absorbing as the new normal.

Known instrumentation gaps

Data-quality limits that affect what the agentic system can answer. Updated when gaps are closed or new ones surface.

  • The funnel is non-strict. Visitors can reach begin_checkout without a prior add_to_cart — both because the checkout route can be reached directly (e.g. /shop/order-details-2) and because there are two separate begin_checkout firing paths (CartPage.tsx legacy and CheckoutPage.tsx structured) that aren’t aligned. Treat each stage as independent where useful.
  • Direct-traffic inflation. iOS strips referrer parameters, Google search no longer forwards query strings, Instagram routes via multiple subdomains, and in-app vs web-browser sessions surface differently — Meta and Google traffic regularly leak into “direct”. Cloudflare bot signals could recover some of this; without that, treat “direct” as a noisy bucket.
  • PostHog purchase count ≠ clean order count. PostHog includes test orders, duplicate fires, and repeat-buyer multi-counts. Reconcile against the Excel / Data Master clean count before reporting CR or CPA — difference is typically a few orders per day.
  • PostHog pageview transition window. PR #1 (feat/de-gtm-and-posthog-parity) merged to AlaskanFishermanFrontend origin/main at 2026-05-04 20:43 CET and changed PostHog init to capture_pageview: false while adding the custom page_view event. Result: May 4 is a partial custom-event day. The affected interval is 2026-05-05 through 2026-05-11, until the fix/posthog-native-pageviews fix is merged: during that interval, custom page_view is the reliable onsite funnel-entry event, while PostHog built-in DAU / WAU widgets based on $pageview undercount. Follow-up PR #3 (chore/posthog-acquisition-context, merged 2026-05-06) added UTM/Meta acquisition properties; PR #4 (feat/view-cart-datalayer, merged 2026-05-07) added view_cart; PR #7 (fix/internal-traffic-opt-out, merged 2026-05-09) removed agent/internal traffic from future analytics fires. The fix/posthog-native-pageviews frontend branch restores PostHog-native $pageview via SPA history_change mode and stops mirroring only the dataLayer page_view event into PostHog. For historical analysis, use custom page_view for onsite funnel entry during 2026-05-05..2026-05-11, and use $pageview again for PostHog DAU / WAU / retention after the fix merge.

Synthesis meta-checklist

There is no cause-bucket taxonomy. Buckets force premature labelling and invite hedging (“A4, secondary A3”); they crowd out the act of naming a specific, testable mechanism. Findings should describe what is wrong in plain language and at the smallest grain useful for action (e.g. “hero is 802px tall on a 390px viewport, pushing CTA below the fold”, not “A4”).

In place of buckets: before the orchestrator finalises any hypothesis, it must answer all three meta-checks below. Each is a forcing function against a real failure mode observed in past runs. A finding that doesn’t pass all three goes back for more evidence, not into the diagnosis.

  1. Data integrity. Could the underlying numbers or measurements be wrong? Examples to rule out: instrumentation mismatch (event wired to a route paid traffic doesn’t visit), sampling / volume too small, attribution leakage (iOS referrer stripping, in-app browsers), event-id coverage gaps, mid-window instrumentation changes.
  2. Cohort identity. Are the users in this analysis actually the population I’m reasoning about? Examples to rule out: replay-candidate filter returning the wrong cohort (e.g. returning-customer login flows masquerading as paid bouncers), new-vs-returning contamination, source-attribution collapsing Meta into direct, returning visitors carrying stale UTMs.
  3. Confounder. Is there a third variable that could explain the finding equally well? Examples to rule out: audience tier vs creative (IG retargeting vs FB cold prospecting); CBO budget allocation vs creative quality; rollout-window timing vs underlying trend; page-redesign vs concurrent traffic-mix shift.

If a finding survives all three with positive evidence (not assertion), it earns a place in the diagnosis. If any one fails, the orchestrator either re-queries to close the gap or downgrades the finding to a “what we can’t tell yet” entry in Data gaps.

Useful questions the semantic layer is asked

  • Where is the largest absolute drop-off in the funnel?
  • Which segment has the worst drop-off at meaningful volume?
  • What specific page / creative / data mechanism is most consistent with the observed leak?
  • Do converters behave differently from non-converters before product selection?
  • How many sessions / days happen before purchase?
  • Which creative themes bring traffic that actually progresses?
  • Where is the data itself the problem (meta-check #1 fails)?

This site uses Just the Docs, a documentation theme for Jekyll.