Semantic Layer
How a business question maps to data: the funnel definition, event taxonomy, segments, source map, and the meta-checklist that gates orchestrator synthesis.
This file changes when the funnel structure, event names, or the synthesis meta-checklist change. Other layers inherit from it.
Funnel
The German paid funnel, in stages:
- Visit — visitor lands on the site
- Selection — visitor reaches the fish/product selection experience
- Product — visitor reaches a product page or product-intent step
- Cart / checkout start — visitor adds to cart or begins checkout
- Payment info — visitor commits to payment
- Purchase — visitor completes a paid order
Reason from semantic stages, not from URL paths. Page names and routes change; the stages do not.
Event taxonomy
The canonical event names are GA4-aligned. These are the only event names the KB references.
| Event | Stage | Meaning |
|---|---|---|
page_view |
Visit | Visitor viewed a page or route in the GTM / GA4-shaped dataLayer |
view_item_list |
Selection | Visitor viewed a list of products / fish choices |
select_item |
Selection | Visitor selected a product from the list |
add_to_cart |
Cart | Visitor added a product / cart contents |
begin_checkout |
Checkout start | Visitor started checkout / order details |
add_payment_info |
Payment info | Visitor selected or submitted payment information |
purchase |
Purchase | Paid purchase confirmed |
Stage-to-role intent
Each funnel stage has a user-journey role, not a fixed route. Roles change route slowly; routes change file path quickly. Diagnosis runs resolve role → current route by reading version1/src/router.tsx and the relevant page components on origin/main — never by looking up a route from the KB. The resolved binding lives in the run’s inputs.json under emission_map, not here.
| Stage | Role | How to resolve to current route |
|---|---|---|
| Visit | Ad lander (paid traffic) + organic landings | Meta get_meta_ad_* destination URLs (paid); router index route or $session_entry_pathname in PostHog (organic) |
| Selection | Ad-lander product surface (where paid visitors first see products) | Whichever component the ad-lander route renders; grep for product-list / product-showcase sections inside it |
| Product | Product detail | router.tsx route whose component imports useProductPageAnalytics (or equivalent product-page analytics hook) |
| Cart | Cart / direct checkout entry | router.tsx route(s) whose components emit add_to_cart or begin_checkout |
| Payment info | Checkout form | router.tsx route(s) whose components emit add_payment_info |
| Purchase | Payment confirmation | router.tsx route(s) whose components emit purchase |
When a funnel event is emitted from a route that does not fill the role its stage implies, that is an instrumentation mismatch and must be surfaced as a candidate finding before any volume-based hypothesis is formed on that event.
Instrumentation mismatches are per-event, not per-stage. A stage typically has more than one event; one being mismatched does not invalidate the others. Test each event independently against the role, and follow the navigation graph — if the lander’s onClick navigates to a route that itself emits a funnel event, that downstream event is a valid signal for the stage even though the originating route does not emit it. The canonical walk-through (both view_item_list mismatched and select_item not-mismatched on the same Selection stage) lives in skills/deploy-history/SKILL.md → “Build a current emission map”.
Event envelope
Every event carries an envelope. In PostHog these are exposed as flat snake_case super-properties (in GA4 as their web.* / page.* nested equivalents):
web_version,web_variant— which web build and which A/B variant served the eventexperiment_id,experiment_name— the active experiment, if anypage_path,page_location,page_title,page_type— page metadata for path-level breakdowns
These let cohorts filter by variant, version, or path directly without joins.
PostHog pageview contract
PostHog has two pageview concepts in this workspace:
$pageview— PostHog-native pageview. Use this for PostHog DAU / WAU / retention widgets and PostHog product-analytics style “active users” reporting.page_view— GA4/GTM-shaped dataLayer event. Use this for GTM Preview, GA4-shaped ecommerce funnel joins, and cross-source schema parity. Do not use it as the source for PostHog built-in DAU widgets.
The intended frontend contract after AlaskanFishermanFrontend branch fix/posthog-native-pageviews is merged on 2026-05-11: PostHog captures native $pageview via capture_pageview: 'history_change'; afDl.push('page_view') continues to publish the GTM/dataLayer page event but no longer mirrors that page event into PostHog. Ecommerce/business events still mirror into PostHog with the same names as the dataLayer events.
Required segments
Every conversion-diagnosis read should be segmentable by:
- source / channel (Meta, Google, organic, direct, …)
- Meta campaign / ad / ad name (where Meta is the source)
- device (mobile, tablet, desktop)
- web version / variant (
version1/version2inside the new web) - new vs returning visitor — also a budget-allocation lever: repeat-converter dominance pushes retargeting weight up; first-visit converter dominance keeps the lever upstream (creative-to-landing match)
- landing / entry path
- checkout / payment method (where the question reaches that stage)
If a segment is not currently joinable in a given source, that is itself a finding — surface it as an instrumentation gap, not as zero.
Source map — where to query
Different questions go to different sources. The KB describes who answers what; current values are pulled from the tool, not stored here.
| Source | Best for | Not for |
|---|---|---|
| Meta Ads | Spend, delivery, campaign / ad / creative performance, Meta-reported CPA | Onsite behaviour truth |
| PostHog | Funnel drop-off, user journeys, sessions before conversion, replay candidates, behaviour cohorts | Ad-platform delivery truth |
| GA4 | Channel / acquisition / event analysis where the relevant events are reliably configured | Friction diagnosis; full journey when events are incomplete |
| Plausible | Lightweight traffic, page / source summaries, custom-property checks | Definitive ecommerce funnel diagnosis |
| Clarity | Aggregate friction signals, rage / dead clicks, page-level UX | Person-level journey |
| Data Master | Daily business KPI reporting, created / paid orders, stakeholder-readable funnel | Granular event or replay analysis |
| Cloudflare | Routing, bot / edge signals where exposed | Conversion source of truth |
The analytics MCP server exposes tools for GA4, Meta, Plausible, Clarity, and PostHog. Use the MCP *_query_capabilities tools to discover the current authoritative tool list and parameter shapes per source — do not hard-code tool names in agent prompts.
PostHog tools cover funnel summary, paths, person journey, sessions / time before conversion, checkout drop-off, form friction, instrumentation health, and a family of replay tools (candidates, cohorts, analysis, recordings, stream, download plan). Connection setup is in ../skills/analytics-mcp/SKILL.md.
Source-selection rules
- Use Meta for what Meta delivered and charged.
- Use PostHog for onsite behaviour after PostHog launch.
- Use GA4 when the question needs GA4 acquisition dimensions and the relevant events are reliably present.
- Use Plausible for fast traffic / page / source reads.
- Use Clarity only for aggregate friction and page-level UX signals; never for person-level journey claims.
- Use Data Master for stakeholder-readable daily business KPIs.
- Do not use noisy auto-captured form events as primary KPIs.
Cross-source workflows
Common diagnostic patterns that combine sources. Starter patterns, not fixed sequences — agents adapt the chain to the question.
- Full funnel review — GA4 (traffic + acquisition) → PostHog (drop-off at each funnel stage) → Clarity (page-level friction on the worst-leak stage).
- Campaign performance — Meta (spend, ROAS, creatives) → GA4 (sessions and conversions from paid) → PostHog (onsite progression of paid visitors, segmented by ad name).
- Checkout investigation — PostHog checkout-dropoff tools (find the step) → Clarity (UX signals on that step) → PostHog replay candidates (sessions worth watching).
- Variant comparison — segment GA4 and PostHog by variant. Never average across variants.
- Default funnel review — when no specific question constrains it, the canonical pillar order is: (1) stage drop on lander → Choose-your-fish → product, (2) new vs returning split, (3) Meta-only channel slice.
Interpretation rules
- Do not compare Meta-reported purchases directly to PostHog purchases without checking attribution window and instrumentation launch.
- Do not average across web versions / variants or major flow changes without segmenting.
- Created orders and paid orders answer different questions. CR and CPA are computed against paid orders for final reporting; created orders are operational signals.
- CR and CPA are goals; intermediate funnel ratios are diagnostics.
- Funnel drops are page-route transitions first, event counts second. When a question anchors on “biggest drop / between which stages”, lead with a route-to-route delta from
get_posthog_paths_summaryand back it with the event-based count. If the two disagree by more than ~2×, the route count is the user-presence floor and the event delta is the data gap — do not let an event-instrumentation gap hide a real route-to-route drop. The funnel being non-strict (visitors can reachbegin_checkoutwithout prioradd_to_cart) is a special case of the same rule: trust the route, treat the event ratio as diagnostic. - Facebook vs Instagram separation can be imperfect; referrer, in-app browser, UTM, and privacy behaviour collapse some of it into a single Meta source or into direct.
- If PostHog launched mid-window, treat that period as directional only.
- A segment with too few sessions to be meaningful is itself a finding — surface the volume, do not silently report a ratio.
- A reading more than ~30% off the rolling average is worth surfacing as an anomaly, not absorbing as the new normal.
Known instrumentation gaps
Data-quality limits that affect what the agentic system can answer. Updated when gaps are closed or new ones surface.
- The funnel is non-strict. Visitors can reach
begin_checkoutwithout a prioradd_to_cart— both because the checkout route can be reached directly (e.g./shop/order-details-2) and because there are two separatebegin_checkoutfiring paths (CartPage.tsxlegacy andCheckoutPage.tsxstructured) that aren’t aligned. Treat each stage as independent where useful. - Direct-traffic inflation. iOS strips referrer parameters, Google search no longer forwards query strings, Instagram routes via multiple subdomains, and in-app vs web-browser sessions surface differently — Meta and Google traffic regularly leak into “direct”. Cloudflare bot signals could recover some of this; without that, treat “direct” as a noisy bucket.
- PostHog purchase count ≠ clean order count. PostHog includes test orders, duplicate fires, and repeat-buyer multi-counts. Reconcile against the Excel / Data Master clean count before reporting CR or CPA — difference is typically a few orders per day.
- PostHog pageview transition window. PR #1 (
feat/de-gtm-and-posthog-parity) merged toAlaskanFishermanFrontend origin/mainat 2026-05-04 20:43 CET and changed PostHog init tocapture_pageview: falsewhile adding the custompage_viewevent. Result: May 4 is a partial custom-event day. The affected interval is 2026-05-05 through 2026-05-11, until thefix/posthog-native-pageviewsfix is merged: during that interval, custompage_viewis the reliable onsite funnel-entry event, while PostHog built-in DAU / WAU widgets based on$pageviewundercount. Follow-up PR #3 (chore/posthog-acquisition-context, merged 2026-05-06) added UTM/Meta acquisition properties; PR #4 (feat/view-cart-datalayer, merged 2026-05-07) addedview_cart; PR #7 (fix/internal-traffic-opt-out, merged 2026-05-09) removed agent/internal traffic from future analytics fires. Thefix/posthog-native-pageviewsfrontend branch restores PostHog-native$pageviewvia SPAhistory_changemode and stops mirroring only the dataLayerpage_viewevent into PostHog. For historical analysis, use custompage_viewfor onsite funnel entry during 2026-05-05..2026-05-11, and use$pageviewagain for PostHog DAU / WAU / retention after the fix merge.
Synthesis meta-checklist
There is no cause-bucket taxonomy. Buckets force premature labelling and invite hedging (“A4, secondary A3”); they crowd out the act of naming a specific, testable mechanism. Findings should describe what is wrong in plain language and at the smallest grain useful for action (e.g. “hero is 802px tall on a 390px viewport, pushing CTA below the fold”, not “A4”).
In place of buckets: before the orchestrator finalises any hypothesis, it must answer all three meta-checks below. Each is a forcing function against a real failure mode observed in past runs. A finding that doesn’t pass all three goes back for more evidence, not into the diagnosis.
- Data integrity. Could the underlying numbers or measurements be wrong? Examples to rule out: instrumentation mismatch (event wired to a route paid traffic doesn’t visit), sampling / volume too small, attribution leakage (iOS referrer stripping, in-app browsers), event-id coverage gaps, mid-window instrumentation changes.
- Cohort identity. Are the users in this analysis actually the population I’m reasoning about? Examples to rule out: replay-candidate filter returning the wrong cohort (e.g. returning-customer login flows masquerading as paid bouncers), new-vs-returning contamination, source-attribution collapsing Meta into direct, returning visitors carrying stale UTMs.
- Confounder. Is there a third variable that could explain the finding equally well? Examples to rule out: audience tier vs creative (IG retargeting vs FB cold prospecting); CBO budget allocation vs creative quality; rollout-window timing vs underlying trend; page-redesign vs concurrent traffic-mix shift.
If a finding survives all three with positive evidence (not assertion), it earns a place in the diagnosis. If any one fails, the orchestrator either re-queries to close the gap or downgrades the finding to a “what we can’t tell yet” entry in Data gaps.
Useful questions the semantic layer is asked
- Where is the largest absolute drop-off in the funnel?
- Which segment has the worst drop-off at meaningful volume?
- What specific page / creative / data mechanism is most consistent with the observed leak?
- Do converters behave differently from non-converters before product selection?
- How many sessions / days happen before purchase?
- Which creative themes bring traffic that actually progresses?
- Where is the data itself the problem (meta-check #1 fails)?