Toward AI-Native Analytics for Personal Publishing

I've used Google Analytics since it was called Urchin — the web analytics software Google acquired in 2005. (I'm not sure how many people realize that UTM stands for Urchin Tracking Module.) Before the rebrand, before every marketing team on earth had a gtag.js snippet in their <head>. So when I finally ripped it out of this site — gone in one commit — I expected to feel something. Loss, maybe. Nostalgia for the old real-time dashboard with the live world map.

I felt relief.

GA4 had been dragging my Lighthouse score into the mid-80s. I even tried offloading it to a service worker with Partytown — limited improvement. The analytics tool was the biggest performance problem on the site. There's something deeply ironic about your measurement system being the thing that degrades the experience you're trying to measure.

But it wasn't just the performance tax. It was the realization that I'd been collecting data I never looked at, for an audience model that doesn't apply to a personal blog, using an interface redesigned for enterprise marketing teams who need attribution funnels and audience segments. I don't have funnels. I have essays.

I want a small, legible system that tells me what landed, what didn't, and what changed — without turning my site into spyware or slowing it down. This is a draft spec for that system.

#Problem framing

Most analytics tools optimize for one of two extremes:

Marketing surveillance suites (powerful, heavy, identity-centric)
Minimal privacy counters (lightweight, sometimes too shallow)

A personal blog needs a third thing:

privacy-first,
edge-native,
cheap enough to run free,
and queryable in natural language without outsourcing judgment.

The goal is not more dashboards. The goal is better questions.

#First principles

Signal over exhaust. What posts are actually read? Which referrers produce meaningful attention?
Trend shape over user identity. Momentum, decay curves, evergreen vs spike.
No trust tax. No fingerprinting, no cross-site identifiers, no covert profiling.
No performance tax. No Lighthouse regressions, no render-path blockers. The irony of GA failing this one still stings.
No ops tax. Runs on Cloudflare's free tier for low/moderate traffic. Deploys via GitHub Actions on push. No servers to maintain, no containers to manage, no uptime to monitor.
AI as interface, not authority. Conversational query layer over deterministic metrics.

#Prior art (and what to steal)

I looked at everything. The privacy-native tools — Plausible, Umami, GoatCounter, Ackee — get the lightweight collection right: small scripts, no cookies, respect for the reader. Plausible in particular would be the obvious choice if I didn't want to build my own. But I do. Matomo and PostHog are impressive and wildly overkill for a solo blog.

The interesting steals are from adjacent systems:

OpenTelemetry: not the SDK (enormous) or the collector (a whole deployment), but the naming discipline. http.request.method, url.path, user_agent.original — standardized field names that make data portable without tribal knowledge. If the schema looks like OTel, any tool can query it later.
Axiom.co: the ingest-first philosophy. Raw-ish event logs in a columnar format you can query ad-hoc. This maps directly to the R2 monthly exports — NDJSON that DuckDB, BigQuery, or Axiom itself can ingest. If this system outgrows D1, Axiom is the natural escape hatch.
Sentry: not the 180KB SDK, but the pattern. window.onerror + onunhandledrejection as a ~15-line error signal. Errors are analytics too — "this page throws a TypeError on Safari mobile" is useful data.

The synthesis: steal from the lightweight tools for collection posture, from the heavy tools for naming conventions and philosophy. Skip everyone's infrastructure.

#Architecture judgment

The system should reflect how I build things, not just what it does.

TypeScript, Bun. Type safety everywhere. If a function can return the wrong shape, it will — at 3am, in production.
Pure functions and composition over classes. classifyReferrer(host) → Source is a function. ReferrerClassifier is a class that exists so someone can write a constructor. Prefer the function.
Never write directly to the database from the hot path. Every event goes through the DO aggregator first. The DO is the write buffer. D1 only sees batched, coalesced flushes — never raw request-time writes. If the flush fails, the DO retries. If D1 is contended, the DO backs off. The database is never the thing that breaks under load.
Backpressure is not optional. If ingestion outpaces aggregation, the DO queues. If aggregation outpaces D1, the DO backs off exponentially. If everything's on fire, governor mode kicks in and sheds load gracefully. No silent data loss. No "it seemed fine in dev."
Modularity through boundaries, not abstractions. Ingestion, aggregation, storage, query, and AI are separate Workers/DOs with typed interfaces between them. Not a monolith with "clean architecture" folders. Actual deployment boundaries.
Idempotent everything. Every flush has a batch ID. Every write is safe to retry. The system should be correct after a crash, not just during normal operation.
Append-only where it matters, mutable where it serves. Raw events are append-only at the ingestion boundary — never mutate what the client sent. Aggregates are mutable by design (upserts that coalesce counters). R2 exports are immutable — write-once NDJSON, never overwritten. Three different postures for three different concerns: integrity at the edge, efficiency in the middle, portability at the exit.

#Assumptions

Cloudflare Workers + D1 + Durable Objects are available
JavaScript may be blocked — baseline still works server-side for pageviews where possible
Coarse-grained analytics (daily/hourly aggregates) are enough
No individual user/session re-identification. Period
Reading quality and trend direction matter more than ad attribution precision
Script budget: ≤1.5KB gzipped (aspirational), hard cap ≤2KB

#Requirements

Because this is a design exercise as much as a product spec, I want to be explicit about what this system must do versus how it must feel doing it. These aren't aspirational — they're the acceptance criteria I'll hold myself to.

#Functional requirements

These are the capabilities. What the system does when everything's working.

ID	Requirement	Acceptance
FR-1	Pageview collection — capture every non-bot page load with path, referrer, and timestamp	Ingestion Worker returns 204; fact table increments within flush interval
FR-2	Engagement scoring — composite signal from scroll depth, dwell time, and outbound clicks	engaged event fires once per pageview when any threshold is met
FR-3	Scroll milestone tracking — quartile depth markers (25/50/75/100%) per article	Four IntersectionObserver instances; each fires once and disconnects
FR-4	Outbound click tracking — capture external link clicks from article body	outbound event with href and anchor text in meta
FR-5	Referrer classification — normalize raw referrers into a bounded enum (≤15 sources)	classifyReferrer() maps hostnames; unknown → "other"
FR-6	Traffic classification — categorize requests as human/bot/AI crawler	5-class system using cf.botManagement.score + UA pattern matching
FR-7	Core Web Vitals — LCP, INP, CLS at p75 per path per device class	web-vitals library → custom events → DDSketch aggregation in DO
FR-8	JS error capture — window.onerror and onunhandledrejection as lightweight error signal	Error count per path per day; filename-only source (no full URLs)
FR-9	Ranked queries — top posts, top referrers, period comparison, timeseries	Typed API endpoints returning deterministic results
FR-10	AI query layer — natural language questions answered with cited evidence	POST /api/ask → planner → ≤8 tool calls → evidence-backed answer
FR-11	MCP tools — agent-queryable analytics surface	6 typed tools matching the query API
FR-12	Snapshot page — glanceable /analytics dashboard (auth-gated)	4 cards: 7-day pageviews, top posts, top referrers, week-over-week delta
FR-13	Data export — monthly NDJSON snapshots to R2	OTel-aligned field names; importable by DuckDB/BigQuery/Axiom

#Nonfunctional requirements

These are the constraints. How the system behaves under pressure, at the edges, and over time — the stuff that determines whether you actually trust it.

ID	Requirement	Target	Enforcement
NFR-1	Client weight	≤2KB gzipped (hard cap)	CI size gate fails the build
NFR-2	Lighthouse impact	Zero measurable regression in LCP/INP/CLS p75	No synchronous XHR, no DOM writes, no layout-affecting elements
NFR-3	Privacy	No cookies, no localStorage IDs, no fingerprinting, no raw IP storage	Enforced at ingestion — geo derived server-side, then discarded
NFR-4	Write integrity	Every flush is idempotent and transactional	flush_batch table; duplicate batch_id → skip
NFR-5	Backpressure	Graceful degradation under load (governor mode)	4-step ladder: disable custom events → drop breakdowns → sample → daily-only
NFR-6	Cardinality bounds	No unbounded dimension growth	Hard limits: 10K paths, 1K referrers, 10 meta keys per event
NFR-7	Storage cost	Proportional to content count, not traffic volume	rrdtool-inspired rollup: hourly → daily → core; Cron-enforced retention
NFR-8	Free-tier operation	Runs entirely on Cloudflare's free plan at low/moderate traffic	Aggregate-first writes, bounded dimensions, query caching
NFR-9	Validation	Every boundary has a runtime schema	Zod at ingestion, DO, and query layer; types derived from validators
NFR-10	Portability	Data is always exportable in a standard format	Monthly R2 NDJSON with OTel-aligned naming
NFR-11	Ops overhead	Zero servers to maintain	Deploys via wrangler in CI; no containers, no uptime monitoring
NFR-12	Recovery	Correct after crash, not just during normal operation	Idempotent flushes, forward-only migrations, append-only raw events

A few things to notice about this split: the FRs are roughly ordered by collection → query → export (data flows downhill). The NFRs are roughly ordered by what your reader cares about (weight, speed, privacy) → what your system cares about (integrity, pressure, bounds) → what future you cares about (portability, ops, recovery). That ordering is intentional — it mirrors who screams first when something breaks.

#Non-goals (v1)

Session replay, heatmaps
Fingerprinting
Cross-device identity stitching
Real-time per-user journey visualization
Freeform SQL generated by the LLM
Return visitor tracking — can't do it honestly without cookies or fingerprinting, and the proxy signals (growing direct traffic, RSS subscribers, stable engagement rates across posts) already tell you whether you have an audience. A post with 65% completion doesn't care whether the reader was new or returning

#RFC: AI-Native Analytics for Blogs (v0.1)

#1) High-level architecture

View source

graph TD
  subgraph Client
    A["a.js (≤2KB)"]
  end

  subgraph Ingestion
    B[Worker: /a/collect]
  end

  subgraph Aggregation
    C[Durable Object] --> D[(D1)]
    D -.->|monthly snapshot| E[(R2 exports)]
  end

  subgraph Query
    F[Worker: /api/metrics]
    G[MCP Tools]
    H["analytics snapshot"]
  end

  A -->|sendBeacon| B
  B -->|validate + normalize| C
  D --> F & G & H

Ingestion plane

Worker endpoint: POST /a/collect
Accepts compact events (pv, engaged, outbound, optional custom)
Validates payload via Zod, normalizes dimensions, forwards to per-site aggregator DO

Aggregation plane

Durable Object per site_id
Maintains short-lived counters in memory
Flushes batched aggregates to D1 (interval and size thresholds)
rrdtool-inspired: coalesce first, persist later

Storage plane

D1 stores aggregate facts + compact dimensions
Fixed-size storage inspired by rrdtool/Graphite whisper (see below)
Optional: R2 monthly snapshots / exports
Optional: KV for query cache

Query plane

Worker endpoint: GET /api/metrics/*
Serves typed metric queries from D1
Caches common dashboard reads

AI plane

POST /api/ask with constrained query planner
MCP server exposes typed analytics tools
LLM summarizes deterministic outputs only

#2) Collection protocol

Event envelope

export type EventKind = "pv" | "engaged" | "outbound" | "error" | "vital" | "custom";

export interface CollectEvent {
  kind: EventKind;
  ts: number; // epoch ms
  path: string; // normalized pathname
  ref?: string; // raw referrer (optional)
  title?: string; // optional page title hashable
  utm?: {
    source?: string;
    medium?: string;
    campaign?: string;
  };
  meta?: Record<string, string | number | boolean | null>;
}

export interface CollectRequest {
  site: string;         // public site key
  v: 1;                 // protocol version
  events: CollectEvent[];
}

Default events (fire automatically)

The client script emits these without any configuration:

Event	When	Data
pv	DOMContentLoaded	path, ref, title
engaged	Composite trigger (scroll/dwell/click)	maxDepth (0-100)
outbound	Click on with external href	path, meta.href, meta.text
milestone	Each scroll quartile crossed	meta.depth (25/50/75/100)
error	window.onerror / onunhandledrejection	meta.message, meta.source, meta.line, meta.col
vital	web-vitals callback	meta.vital (lcp/inp/cls), meta.value

JS error capture (Sentry-lite)

Not a full error monitoring product. Just a signal: "something broke on this page."

window.onerror = (message, source, line, col) => {
  queue({
    kind: "error" as EventKind,
    ts: Date.now(),
    path: location.pathname,
    meta: {
      message: String(message).slice(0, 256),
      source: source?.split("/").pop() ?? "unknown", // filename only, no full URLs
      line: line ?? 0,
      col: col ?? 0,
    },
  });
};

window.onunhandledrejection = (event) => {
  queue({
    kind: "error" as EventKind,
    ts: Date.now(),
    path: location.pathname,
    meta: {
      message: String(event.reason).slice(0, 256),
      source: "unhandled_rejection",
    },
  });
};

What this answers:

"Is my site broken on Safari mobile?" — check error rate by device class
"Did my last deploy introduce a regression?" — compare error rates across dates
"Is this a real problem or a browser extension?" — source filename distinguishes your code from injected scripts

What this doesn't do: stack traces, source maps, breadcrumbs, user replay. For that, use Sentry. For a blog, error rate is the signal. Error detail is a debugging tool you reach for when the rate spikes.

Stored as custom events, aggregated into a simple error counter on fact_daily:

ALTER TABLE fact_daily ADD COLUMN error_count INTEGER NOT NULL DEFAULT 0;

Custom events (custom) require explicit instrumentation:

// Only if someone wants to track something specific
window.__a?.track("signup_click", { cta: "header" });

The API surface is one function. If window.__a doesn't exist (script blocked, not loaded yet), the call silently no-ops.

OTel-inspired semantic naming

We don't use the OpenTelemetry SDK. But we steal its naming discipline. When data leaves this system — via R2 export, MCP tool output, or the ask endpoint — field names should be recognizable to anyone who's worked with observability tooling.

Our field	OTel semantic convention	Why it matters
path	url.path	Portable to any analytics tool
ref	http.request.header.referer	Standard, not invented
device_class	user_agent.device.type	DuckDB/BigQuery can join on this
country_code	client.geo.country_iso_code	ISO 3166-1 alpha-2, universally understood
traffic_class	http.request.bot_score_class	Our addition - no OTel equivalent yet

The R2 monthly exports use OTel-aligned field names even if the internal D1 schema uses shorter column names. Export is the portability boundary — that's where naming discipline pays off.

Lightweight RUM (Real User Monitoring)

Core Web Vitals are too useful to skip and too small to justify skipping. Google's web-vitals library is ~1.5KB and gives you LCP, INP, and CLS — the three metrics that determine whether your analytics script is harming the site it's measuring.

import { onLCP, onINP, onCLS } from "web-vitals"; // tree-shaken

onLCP((metric) => queue({ kind: "custom", ts: Date.now(), path, meta: { vital: "lcp", value: metric.value } }));
onINP((metric) => queue({ kind: "custom", ts: Date.now(), path, meta: { vital: "inp", value: metric.value } }));
onCLS((metric) => queue({ kind: "custom", ts: Date.now(), path, meta: { vital: "cls", value: metric.value } }));

This isn't a full RUM product. It's a canary:

"Did my last deploy regress LCP?" — check the trend
"Is the analytics script itself causing CLS?" — it better not be
"What's my real INP on mobile?" — useful for any interactive elements

Stored as custom events, aggregated into a fact_daily_vitals table:

CREATE TABLE IF NOT EXISTS fact_daily_vitals (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  device_class TEXT NOT NULL DEFAULT 'unknown',
  lcp_p75 REAL,     -- milliseconds
  inp_p75 REAL,     -- milliseconds
  cls_p75 REAL,     -- unitless score
  sample_count INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, device_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

p75 aggregation happens in the DO — not by storing raw values, but by maintaining a DDSketch or simpler quantile estimator in memory and flushing the percentile. Fixed memory regardless of sample count.

Budget impact: web-vitals is ~1.5KB gzipped. Combined with the core client (~500 bytes), total stays under the 2KB hard cap.

Protocol compatibility:

server currently accepts v=1
unknown fields are ignored (forward-compatible reads)
future protocol bumps should be explicit (v=2 handling path + migration notes)

Engagement definition

An engaged event is a composite boolean, true when any of these conditions hold:

Reader crosses 60% article depth (single IntersectionObserver marker)
Reader keeps tab visible for >= 20s on article route
Reader clicks an outbound link from the article body

Why not only time-on-page? It lies. A tab left open in the background inflates engagement. Why not only scroll depth? It can miss accessibility and non-scroll reading patterns. Composite engagement is still simple, but less brittle.

This signal also helps bot filtering, but it is not the only bot defense.

Scroll milestones

Beyond the binary engagement signal, we track how far readers get. Four IntersectionObserver instances on invisible markers at article quartiles (25%, 50%, 75%, 100%). Each fires exactly once and disconnects.

const MILESTONES = [25, 50, 75, 100] as const;
type ScrollMilestone = typeof MILESTONES[number];

function observeScrollDepth(article: HTMLElement) {
  const height = article.offsetHeight;
  let maxDepth = 0;

  for (const pct of MILESTONES) {
    const marker = document.createElement("div");
    marker.style.cssText =
      "height:1px;width:1px;position:absolute;pointer-events:none;";
    marker.style.top = `${(pct / 100) * height}px`;
    article.style.position ||= "relative";
    article.appendChild(marker);

    const observer = new IntersectionObserver(
      ([entry]) => {
        if (entry.isIntersecting) {
          maxDepth = Math.max(maxDepth, pct);
          observer.disconnect();
        }
      },
      { threshold: 0 },
    );
    observer.observe(marker);
  }

  return () => maxDepth;
}

Why IntersectionObserver instead of scroll listeners:

Zero main-thread work during scroll. Runs on the compositor thread — can't cause jank.
No throttle/debounce decision to get wrong. The browser tells you when the marker enters the viewport.
No getBoundingClientRect() calls. No forced reflow, no layout thrashing.
Four callbacks total for the entire page visit. Then silence.

What this unlocks for a writer:

"My analytics RFC has 90% reach-25 but only 30% reach-75 — people bail at the schema section"
"The pervasive AI essay has 65% completion rate — the long version holds better than expected"
"Posts under 1,500 words: 80% completion. Over 2,500: 45%. But the 45% that finish have 3x the outbound click rate"

The engagement event becomes: { kind: "engaged", maxDepth: 75, ... } instead of just a boolean.

Add max_depth to the daily fact:

ALTER TABLE fact_daily ADD COLUMN avg_max_depth REAL; -- 0-100
ALTER TABLE fact_daily ADD COLUMN completions INTEGER NOT NULL DEFAULT 0; -- reached 100%

Transport

let pending: CollectEvent[] = [];
let flushTimer: number | null = null;

function queue(event: CollectEvent) {
  pending.push(event);
  if (!flushTimer) {
    flushTimer = setTimeout(flush, 2000); // 2s debounce
  }
  if (pending.length >= 10) flush(); // size cap
}

function flush() {
  if (!pending.length) return;
  navigator.sendBeacon(
    "/a/collect",
    JSON.stringify({ site: SITE_KEY, v: 1, events: pending }),
  );
  pending = [];
  if (flushTimer) clearTimeout(flushTimer);
  flushTimer = null;
}

document.addEventListener("visibilitychange", () => {
  if (document.visibilityState === "hidden") flush();
});

sendBeacon on visibilitychange with fetch({ keepalive: true }) fallback
2-second debounce to batch rapid events (pageview + milestone + engagement)
Hard cap at 10 events per batch
Never blocks navigation

Why no Service Worker

A Service Worker could queue failed beacons offline and retry via Background Sync. Sounds responsible. Not worth it.

sendBeacon already survives page navigation — the only failure case is total network loss, and losing one pageview during a subway tunnel is not a data integrity problem for a blog
Service Workers add a whole lifecycle (install, activate, update, cache versioning) for a client that's supposed to be <2KB and understandable in 60 seconds
Service Workers can't call sendBeacon — you'd reimplement the same reliability with fetch, wrapping a reliable API in a more complex one for the same result
The analytics script is <2KB with Cache-Control: immutable — the browser's HTTP cache handles this without a SW cache layer
Cloudflare Workers already are the server-side "service worker" — intelligence on both ends creates two places to debug the same problem

The entire client architecture:

a.js (< 2KB gzipped)
├── IntersectionObserver × 4 (scroll milestones)
├── event queue (plain array)
├── debounced flush (2s / 10 events)
├── sendBeacon on visibilitychange
└── that's it

No Service Worker. No IndexedDB. No Background Sync. No scroll listeners. Understandable in 60 seconds.

#3) Privacy model

No cookies by default
No localStorage identifiers
No fingerprinting
No raw IP storage (derive coarse geo server-side then discard)
Configurable retention by table
Optional per-site "strict mode" that disables custom events entirely

Collection integrity (anti-poisoning):

signed payload mode for first-party script (HMAC(body + ts))
replay window enforcement (reject stale timestamps)
optional origin allowlist for collection endpoint

#4) Data model (D1)

Schema strategy is intentionally split:

v1: optimize for low cardinality and cheap queries (one row per site,date,path)
v2: add breakdown tables for referrer/country/device when needed

-- Sites
CREATE TABLE IF NOT EXISTS site (
  id TEXT PRIMARY KEY,
  slug TEXT NOT NULL UNIQUE,
  created_at INTEGER NOT NULL
);

-- Normalized dimensions
CREATE TABLE IF NOT EXISTS dim_path (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  site_id TEXT NOT NULL,
  path TEXT NOT NULL,
  path_hash TEXT NOT NULL,
  UNIQUE(site_id, path_hash),
  FOREIGN KEY(site_id) REFERENCES site(id)
);

CREATE TABLE IF NOT EXISTS dim_referrer (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  site_id TEXT NOT NULL,
  source TEXT NOT NULL,      -- normalized enum (see below)
  raw_host TEXT,             -- optional debug host
  referrer_query TEXT,       -- AI search query context (Phase 2, nullable)
  UNIQUE(site_id, source, raw_host),
  FOREIGN KEY(site_id) REFERENCES site(id)
);

-- Referrer source enum (≤15 values, normalize at ingestion):
-- direct | google | bing | duckduckgo | x | hn | reddit | linkedin
-- mastodon | bluesky | newsletter | rss | github | chatgpt | perplexity | other
--
-- Normalization: classifyReferrer(host) maps hostnames to sources.
-- e.g. t.co, twitter.com, x.com → "x"
-- e.g. news.ycombinator.com → "hn"
-- e.g. chatgpt.com, chat.openai.com → "chatgpt"
-- e.g. perplexity.ai → "perplexity"
-- Unknown hosts → "other".
-- raw_host is retained only for bounded debugging/reporting, not unbounded growth.
--
-- AI search referrer enrichment (Phase 2):
-- Some AI search engines pass query context in URL params (?q=, ?query=, etc.)
-- When referrer is an AI search source AND query params are present,
-- extract and store as referrer_query in dim_referrer.
-- This answers "why did someone land here?" - not just "where from?"
-- Bounded: only captured for AI search sources, not all referrers.

-- v1 core daily fact (low cardinality)
CREATE TABLE IF NOT EXISTS fact_daily (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,    -- YYYY-MM-DD
  path_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  outbound_clicks INTEGER NOT NULL DEFAULT 0,
  custom_events INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- v2 optional breakdown tables
CREATE TABLE IF NOT EXISTS fact_daily_referrer (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  referrer_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, referrer_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id),
  FOREIGN KEY(referrer_id) REFERENCES dim_referrer(id)
);

CREATE TABLE IF NOT EXISTS fact_daily_country (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  country_code TEXT NOT NULL DEFAULT 'XX',
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, country_code),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

CREATE TABLE IF NOT EXISTS fact_daily_device (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  device_class TEXT NOT NULL DEFAULT 'unknown', -- mobile|desktop|tablet|unknown
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, device_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- Hourly optional short window (e.g. last 7-14 days)
CREATE TABLE IF NOT EXISTS fact_hourly (
  site_id TEXT NOT NULL,
  hour_utc TEXT NOT NULL,    -- YYYY-MM-DDTHH
  path_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, hour_utc, path_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- Flush idempotency
CREATE TABLE IF NOT EXISTS flush_batch (
  batch_id TEXT PRIMARY KEY,
  site_id TEXT NOT NULL,
  created_at INTEGER NOT NULL,
  FOREIGN KEY(site_id) REFERENCES site(id)
);

CREATE INDEX IF NOT EXISTS idx_fact_daily_site_date ON fact_daily(site_id, date_utc);
CREATE INDEX IF NOT EXISTS idx_fact_daily_path ON fact_daily(site_id, path_id, date_utc);
CREATE INDEX IF NOT EXISTS idx_fact_hourly_site_hour ON fact_hourly(site_id, hour_utc);

Idempotency rule: every aggregator flush includes a batch_id; writes occur in one transaction that first inserts into flush_batch. If batch_id already exists, skip (already applied).

#5) Durable Object aggregator contract

export interface AggregateKey {
  siteId: string;
  dateUtc: string;
  hourUtc?: string;
  pathKey: string;
  refKey?: string;
  countryCode?: string;
  deviceClass?: "mobile" | "desktop" | "tablet" | "unknown";
}

export interface AggregateDelta {
  pageviews?: number;
  engagedVisits?: number;
  outboundClicks?: number;
  customEvents?: number;
}

export interface AggregatorMessage {
  key: AggregateKey;
  delta: AggregateDelta;
}

export interface FlushPolicy {
  maxBuffered: number;   // e.g. 1_000
  flushEveryMs: number;  // e.g. 10_000
}

Behavior:

batch id required per flush (idempotent apply via flush_batch)
coalesce in memory
periodic/alarm-based flush to D1
backpressure mode: if D1 contention, exponential backoff + queue in DO
flush is transactional: register batch -> upsert facts -> commit

#6) Query API interfaces

export interface Period {
  from: string; // YYYY-MM-DD
  to: string;   // YYYY-MM-DD
}

export interface TopPost {
  path: string;
  pageviews: number;
  engagedVisits: number;
  engagementRate: number;
}

export interface TopPostsResponse {
  period: Period;
  rows: TopPost[];
}

export interface ReferrerRow {
  source: string;
  pageviews: number;
  engagedVisits: number;
}

export interface ComparePeriodsResponse {
  current: {
    pageviews: number;
    engagedVisits: number;
  };
  previous: {
    pageviews: number;
    engagedVisits: number;
  };
  delta: {
    pageviewsPct: number;
    engagedVisitsPct: number;
  };
}

Endpoints:

GET /api/metrics/top-posts?from=...&to=...&limit=...
GET /api/metrics/referrers?from=...&to=...
GET /api/metrics/compare?from=...&to=...&prevFrom=...&prevTo=...
GET /api/metrics/timeseries?path=...&granularity=day|hour

API constraints (hard limits):

max query range: 365 days
max limit for ranked endpoints: 200
hourly granularity range cap: 14 days
per-IP/token rate limiting on analytics read and ask endpoints
cache-by-shape with TTL for repeated queries

#7) AI query layer

The AI calls typed metric functions. It does not write SQL.

Deterministic analysis functions

export interface AnalyticsService {
  topPosts(period: Period, limit: number): Promise<TopPostsResponse>;
  referrerBreakdown(period: Period): Promise<ReferrerRow[]>;
  comparePeriods(current: Period, previous: Period): Promise<ComparePeriodsResponse>;
  postTrend(path: string, period: Period): Promise<Array<{ date: string; pageviews: number }>>;
  evergreenScore(path: string, trailingDays: number): Promise<number>;
}

Ask endpoint

export interface AskRequest {
  question: string;
  siteId: string;
}

export interface AskResponse {
  answer: string;
  evidence: Array<{
    metric: string;
    value: number | string;
    source: string; // function + params
  }>;
}

Guardrails:

planner maps question -> allowed function calls
result summaries must cite evidence blocks
no speculative claims without corresponding metric
answer claims are tagged with confidence classes (high, medium, low) based on evidence directness

#8) MCP surface

Tools for agent clients:

analytics.get_top_posts
analytics.get_referrer_breakdown
analytics.compare_periods
analytics.get_post_trend
analytics.get_evergreen_score
analytics.ask (optional convenience wrapper)

Tool outputs should be machine-readable first, narrative second.

#9) Performance / Lighthouse constraints

Hard requirements:

Script is defer and non-blocking
No synchronous XHR
No DOM writes from analytics script
No layout-affecting injected elements
Beacon dispatch on visibilitychange and pagehide fallback
JS parse/execute budget tracked in CI

Perf acceptance targets:

No measurable regression in LCP/INP/CLS p75 beyond noise
CLS impact from analytics: zero

#10) Free-tier operating mode

Staying on the free tier:

aggregate-first writes (no raw event warehouse)
bounded cardinality dimensions
query caching for common dashboard reads
optional sampling when approaching quota
hourly table retention cap (e.g. 14 days)

Safety valve: "governor mode" step-down ladder under sustained high volume:

disable custom event ingestion
drop country/device breakdown writes
increase sampling factor (for pageviews only)
disable hourly writes (daily only)

Each transition emits an ops event so reduced fidelity is explicit, not silent.

#11) Migration strategy

Phase 1: pageviews + engaged + top posts + referrers + MCP tools + tiny /analytics snapshot Phase 2: period compare + trend series + cache/rate-limit hardening Phase 3: conversational ask with evidence citations Phase 4: optional exports (R2) + scheduled digests

#12) Enforced constraints (not advisory)

Advisory constraints get ignored under pressure. These are enforced in CI, at build time, or at runtime — not in a README.

Schema validation (Zod)

Every boundary gets a Zod schema. Not just "for documentation" — for runtime parsing.

const CollectEventSchema = z.object({
  kind: z.enum(["pv", "engaged", "outbound", "custom"]),
  ts: z.number().int().positive(),
  path: z.string().min(1).max(2048).startsWith("/"),
  ref: z.string().max(2048).optional(),
  meta: z.record(z.union([z.string(), z.number(), z.boolean(), z.null()])).optional(),
});

const CollectRequestSchema = z.object({
  site: z.string().min(1).max(64),
  v: z.literal(1),
  events: z.array(CollectEventSchema).min(1).max(50),
});

The ingestion Worker parse()s with Zod before anything else. Malformed payloads get a 400 and a counter bump — never silent drops, never partial ingestion.

Linting and formatting

Biome. Not ESLint. Single tool for lint + format. Runs in CI as a gate, not a suggestion. Zero warnings policy — if it warns, either fix it or configure it out. No // biome-ignore without a comment explaining why.

Type boundaries

Every Worker ↔ DO ↔ D1 boundary has an explicit type. AggregatorMessage isn't just an interface in a file — it's the actual shape that gets validated at the DO's fetch() handler. If the ingestion Worker sends the wrong shape, it fails loudly at the boundary, not silently deep inside the aggregator.

// This is a type AND a runtime validator
const AggregatorMessageSchema = z.object({ ... });
type AggregatorMessage = z.infer<typeof AggregatorMessageSchema>;

Types and validators derived from the same source. No drift.

Client script budget

The analytics client (a.js) has a hard size gate in CI:

# In CI pipeline
GZIP_SIZE=$(gzip -c dist/a.js | wc -c)
if [ "$GZIP_SIZE" -gt 2048 ]; then
  echo "❌ a.js exceeds 2KB gzipped ($GZIP_SIZE bytes)"
  exit 1
fi

Not aspirational. A gate. If the script grows past 2KB gzipped, the build fails.

#13) Development process

Testing strategy

Three layers, each with a clear job:

Layer	Tool	What it tests	When it runs
Unit	Vitest	Pure functions: classifyReferrer(), coalesceDeltas(), computeEngagement()	Every commit (CI)
Integration	Miniflare	Worker ↔ DO ↔ D1 round-trips: "event in → aggregate out → query returns correct count"	Every PR
Contract	Zod assertions	Boundary shapes: "the DO accepts exactly this shape and rejects everything else"	Every commit (CI)

No E2E browser tests for v1. The client script is small enough to unit test the beacon logic. The server is tested through integration. Browser testing is cost that doesn't pay for itself at this scale.

What doesn't get tested: D1 SQL syntax (trust SQLite), Cloudflare routing config (tested by deployment), UI layout (there's barely any UI).

Environments

Env	Purpose	D1 instance	URL
local	Development + integration tests	Miniflare (in-memory)	localhost:8787
production	Live	Production D1	analytics.bristanback.com

Why not ephemeral deploys per branch? Solo developer, one blog. Miniflare replicates D1/DO behavior well enough for integration tests. Ephemeral preview environments are team infrastructure — real value when multiple people need to review deployments, overhead when it's just you. If this grows beyond a solo project, add a preview env with branch-scoped D1. The wrangler --env flag makes it trivial to add later.

CI pipeline

biome check → tsc --noEmit → vitest run → size gate (a.js) →
  wrangler deploy --env preview (on PR)
  wrangler deploy --env production (on merge to main)

Every step is a gate. No "allowed to fail" steps.

Secrets management

No .env files in the repo. Ever. Wrangler secrets for API keys. Environment-specific config (site IDs, rate limits) via wrangler.jsonc env blocks. The only thing that differs between preview and production is the D1 binding and the domain — everything else is code.

#14) Data evolution and governance

Schema migrations

D1 doesn't have a migration runner. So we build one — minimal:

const MIGRATIONS = [
  { version: 1, sql: `CREATE TABLE IF NOT EXISTS site ...` },
  { version: 2, sql: `CREATE TABLE IF NOT EXISTS fact_daily_referrer ...` },
  // ...
] as const;

async function migrate(db: D1Database): Promise<void> {
  await db.exec(`CREATE TABLE IF NOT EXISTS _migrations (
    version INTEGER PRIMARY KEY,
    applied_at INTEGER NOT NULL
  )`);
  const applied = await db.prepare(`SELECT version FROM _migrations`).all();
  const appliedSet = new Set(applied.results.map(r => r.version));
  for (const m of MIGRATIONS) {
    if (!appliedSet.has(m.version)) {
      await db.exec(m.sql);
      await db.prepare(`INSERT INTO _migrations (version, applied_at) VALUES (?, ?)`)
        .bind(m.version, Date.now()).run();
    }
  }
}

Runs on Worker startup (cached after first run). Forward-only — no down migrations. If a migration is wrong, ship a new forward migration that fixes it.

The rrdtool principle: fixed storage, automatic rollup

The oldest good idea in time-series storage is rrdtool's Round Robin Database (1999): allocate fixed space, and as data ages, roll it into coarser resolution. Graphite's whisper format does the same thing. The insight is that you never need per-minute granularity from six months ago — you need the shape, not the points.

This system applies the same principle to D1:

Resolution tiers:
  0-14 days  → hourly   (fact_hourly)
  0-90 days  → daily + breakdowns (fact_daily_referrer, _country, _device)
  0-∞        → daily core (fact_daily) — just path × pageviews × engaged

As data ages, it loses dimensions but never disappears. The Cron Trigger isn't just "deleting old data" — it's the rollup engine. Before deleting fact_hourly rows, it verifies they've been absorbed into fact_daily. Before deleting breakdown tables, the core fact still holds the totals.

The result: storage cost is roughly proportional to the number of paths (content), not the number of events (traffic). A blog with 50 posts and 5 years of history fits comfortably in D1's free tier. That's the rrdtool promise: time passes, storage doesn't grow.

Retention policy

Table	Retention	Rollup target	Enforcement
fact_hourly	14 days	fact_daily	Cron: verify daily absorbed, then delete
fact_daily_referrer	90 days	fact_daily (totals preserved)	Cron trigger
fact_daily_country	90 days	fact_daily (totals preserved)	Cron trigger
fact_daily_device	90 days	fact_daily (totals preserved)	Cron trigger
fact_daily_vitals	90 days	- (historical vitals less useful)	Cron trigger
fact_daily	Indefinite	-	Fixed cost ≈ rows × paths
dim_referrer.raw_host	30 days	source enum preserved	Cron nullifies raw_host
flush_batch	7 days	-	Cron trigger

Retention is enforced by a scheduled Worker (Cron Trigger), not by hope. Each purge logs what it deleted for auditability. The Cron runs daily and is itself idempotent — safe to re-run, safe to miss a day.

Cardinality governance

Unbounded dimensions are how analytics systems die. Hard limits:

dim_path: max 10,000 per site (after that, new paths map to /other)
dim_referrer: max 1,000 per site (unknown hosts → other)
custom event meta keys: max 10 per event, values max 256 chars

These aren't advisory. The ingestion Worker enforces them at write time. If a dimension table is full, new values get bucketed into the catch-all. An ops event fires so you know it happened.

Traffic classification

Cloudflare's Bot Management gives us a cf.botManagement.score (0-99) on every request. We should use it, but not as a binary filter — as a dimension.

CREATE TABLE IF NOT EXISTS fact_daily_traffic_class (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  traffic_class TEXT NOT NULL, -- human | likely_human | likely_bot | verified_bot | ai_crawler
  pageviews INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, traffic_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

Classification logic:

type TrafficClass = "human" | "likely_human" | "likely_bot" | "verified_bot" | "ai_crawler";

function classifyTraffic(req: Request): TrafficClass {
  const score = req.cf?.botManagement?.score ?? 50;
  const verifiedBot = req.cf?.botManagement?.verifiedBot ?? false;
  const ua = req.headers.get("user-agent") ?? "";

  // AI crawlers: GPTBot, ClaudeBot, Google-Extended, Bytespider, CCBot, etc.
  if (/GPTBot|ClaudeBot|Claude-Web|Google-Extended|Bytespider|CCBot|PerplexityBot|Amazonbot/i.test(ua)) {
    return "ai_crawler";
  }
  if (verifiedBot) return "verified_bot";   // Googlebot, Bingbot, etc.
  if (score >= 80) return "human";
  if (score >= 30) return "likely_human";
  return "likely_bot";
}

This gives you answers to questions that actually matter in 2026:

"How much of my traffic is AI crawlers vs humans?"
"Is GPTBot reading my posts? Which ones?"
"What's my real human readership after filtering bots?"
"Are search engines still sending me traffic, or has AI eaten that?"

The traffic_class dimension is a v2 breakdown table — same pattern as referrer/country/device. Not in the core fact table, not on the hot path, but available when you want to ask the question.

Data export and portability

Monthly R2 snapshots (Phase 4) export the full D1 state as NDJSON. This is your escape hatch — if you outgrow D1, if Cloudflare changes pricing, if you want to move to ClickHouse someday. The data is always yours, always portable, always in a format that any system can ingest.

#Decisions

Engagement: Composite boolean (scroll depth OR visible dwell OR outbound click), emitted once per pageview. (See collection protocol.)
Bot filtering: Edge heuristics first (Cloudflare bot score + UA), engagement as secondary quality signal. No custom bot signature maintenance.
Referrer normalization: Small enum (≤15 values), classified at ingestion. (See data model.)
Multi-site: Shared DB with site_id partitioning. One blog now, column's already there if that changes.
Fact cardinality: v1 uses low-cardinality core fact table; breakdowns are separate optional tables.
Write integrity: Aggregator flushes are idempotent and transactional via flush_batch.
Validation: Zod at every boundary. Types and validators derived from the same schema — no drift.
Testing: Unit (pure functions) + integration (Miniflare round-trips) + contract (Zod). No E2E browser tests at this scale.
Environments: Miniflare for local dev and integration tests. No ephemeral deploys until the project outgrows solo development.
Migrations: Forward-only, versioned, runs on startup. No down migrations.
Traffic classification: 5-class system (human → ai_crawler) using cf.botManagement.score + UA patterns. Stored as a v2 breakdown dimension, not in core fact.
Data portability: Monthly NDJSON exports to R2. The data is always yours.

#Open questions

How much hourly granularity is genuinely useful for bloggers? (14-day window feels right — enough for "is this post spiking?" without long-term storage cost.)
~~Should ask run only against precomputed views to prevent query abuse?~~ Decided: Hard budget of 8 tool calls per ask request. The AI analyst gets a fixed number of moves — enough for compare + breakdown + follow-up, not enough to accidentally run up the D1 bill. If it can't answer in 8 calls, the question is too vague; ask the human to narrow it.
Dashboard surface: should this be MCP-only, or include a web UI?

Decision: MCP-first + tiny snapshot page.

Phase 1 ships MCP tools as the primary interface (Telegram/OpenClaw queries)
Also ship a minimal /analytics snapshot page (auth-gated) with 4 cards:
- 7-day pageviews
- top posts
- top referrers
- week-over-week delta

Rationale: MCP-only is fastest and matches current workflow, but a tiny snapshot preserves glanceability without adding a heavy dashboard surface. Both MCP and snapshot use the same typed metrics/query layer, so no duplicate logic.

#What the client actually looks like

<script defer src="/a.js" data-site="bri-blog"></script>

a.js does three things: capture a pageview, queue an engagement event, flush via beacon when the tab hides. Nothing else.

The point of this system is not to know everything.

It's to know enough, quickly, privately, and truthfully — then use AI to ask better questions over that evidence.

Not surveillance. Not theater. Just instrumentation with taste.

One of the first things I ever built for the web was a hit counter — Perl/CGI, flat file, incremented on every page load. I still get emails about it. Twenty-something years later, after the entire Google Analytics era — Urchin to UA to GA4 to ripping it all out — I'm back to building my own way to answer "did anyone read this?" The tools are different. The question hasn't changed.

#A note on scale and honesty

This system is for a personal blog that gets maybe 200 visits a day. At that volume, most of this architecture is overkill. You could write directly to D1 from the ingestion Worker with upserts and it'd be fine for years. And yes — writing a 3,000-word spec with Zod schemas, a Mermaid diagram, non-goals, and a migration plan for something most people would solve with Plausible and a weekend is peak RFC-maxxing. Noted.

I know that. The Durable Object aggregation layer, the governor mode, the flush batching — none of it is justified by current traffic. A blog doesn't need backpressure. It needs a database and a query.

But this is also a learning exercise. I want to understand Cloudflare's primitives — DOs, D1, Workers, Cron Triggers, R2 — by building something real with them, not by reading docs. And "real" means making architectural decisions that would matter at scale, even if they don't matter yet. The DO isn't here because I need a write buffer. It's here because I want to understand how write buffers behave on this platform — what the alarm API feels like, how DO ↔ D1 coordination works under contention, what happens when you actually need backpressure.

One known tradeoff: DO cold starts. At low traffic, the DO will frequently be evicted and wake cold for the first event of a session — maybe ~50ms of latency. But the client fires beacons and doesn't wait for responses, so the user never feels it. The cost is invisible; the buffer is just slightly lazier waking up.

The tradeoff is honest: I'm trading simplicity for education. The system is designed to handle tens to hundreds of thousands of events responsibly and resource-efficiently — not because my blog will generate them, but because the next thing I build on this stack might, and I'd rather learn the failure modes now with low stakes.

Where this gets dangerous is if the learning exercise calcifies into production complexity. The commitment: if the DO layer turns out to be pure overhead with no educational payoff, rip it out and go direct-to-D1. Architecture should serve the system, not the architect's curiosity. But right now, the curiosity is the point.

This is a working spec — part of the workshop. I'll follow up with what actually survived contact with reality once it's built.