Skip to content

Toward AI-Native Analytics for Personal Publishing

By Bri Stanback 20 min read

I've used Google Analytics since it was called Urchin — the web analytics software Google acquired in 2005. (I'm not sure how many people realize that UTM stands for Urchin Tracking Module.) Before the rebrand, before every marketing team on earth had a gtag.js snippet in their <head>. So when I finally ripped it out of this site — gone in one commit — I expected to feel something. Loss, maybe. Nostalgia for the old real-time dashboard with the live world map.

I felt relief.

GA4 had been dragging my Lighthouse score into the mid-80s. I even tried offloading it to a service worker with Partytown — limited improvement. The analytics tool was the biggest performance problem on the site. There's something deeply ironic about your measurement system being the thing that degrades the experience you're trying to measure.

But it wasn't just the performance tax. It was the realization that I'd been collecting data I never looked at, for an audience model that doesn't apply to a personal blog, using an interface redesigned for enterprise marketing teams who need attribution funnels and audience segments. I don't have funnels. I have essays.

I want a small, legible system that tells me what landed, what didn't, and what changed — without turning my site into spyware or slowing it down. This is a draft spec for that system.


#Problem framing

Most analytics tools optimize for one of two extremes:

  1. Marketing surveillance suites (powerful, heavy, identity-centric)
  2. Minimal privacy counters (lightweight, sometimes too shallow)

A personal blog needs a third thing:

The goal is not more dashboards. The goal is better questions.


#First principles

  1. Signal over exhaust. What posts are actually read? Which referrers produce meaningful attention?
  2. Trend shape over user identity. Momentum, decay curves, evergreen vs spike.
  3. No trust tax. No fingerprinting, no cross-site identifiers, no covert profiling.
  4. No performance tax. No Lighthouse regressions, no render-path blockers. The irony of GA failing this one still stings.
  5. No ops tax. Runs on Cloudflare's free tier for low/moderate traffic. Deploys via GitHub Actions on push. No servers to maintain, no containers to manage, no uptime to monitor.
  6. AI as interface, not authority. Conversational query layer over deterministic metrics.

#Prior art (and what to steal)

I looked at everything. The privacy-native tools — Plausible, Umami, GoatCounter, Ackee — get the lightweight collection right: small scripts, no cookies, respect for the reader. Plausible in particular would be the obvious choice if I didn't want to build my own. But I do. Matomo and PostHog are impressive and wildly overkill for a solo blog.

The interesting steals are from adjacent systems:

The synthesis: steal from the lightweight tools for collection posture, from the heavy tools for naming conventions and philosophy. Skip everyone's infrastructure.


#Architecture judgment

The system should reflect how I build things, not just what it does.


#Assumptions

  1. Cloudflare Workers + D1 + Durable Objects are available
  2. JavaScript may be blocked — baseline still works server-side for pageviews where possible
  3. Coarse-grained analytics (daily/hourly aggregates) are enough
  4. No individual user/session re-identification. Period
  5. Reading quality and trend direction matter more than ad attribution precision
  6. Script budget: ≤1.5KB gzipped (aspirational), hard cap ≤2KB

#Requirements

Because this is a design exercise as much as a product spec, I want to be explicit about what this system must do versus how it must feel doing it. These aren't aspirational — they're the acceptance criteria I'll hold myself to.

#Functional requirements

These are the capabilities. What the system does when everything's working.

IDRequirementAcceptance
FR-1Pageview collection — capture every non-bot page load with path, referrer, and timestampIngestion Worker returns 204; fact table increments within flush interval
FR-2Engagement scoring — composite signal from scroll depth, dwell time, and outbound clicksengaged event fires once per pageview when any threshold is met
FR-3Scroll milestone tracking — quartile depth markers (25/50/75/100%) per articleFour IntersectionObserver instances; each fires once and disconnects
FR-4Outbound click tracking — capture external link clicks from article bodyoutbound event with href and anchor text in meta
FR-5Referrer classification — normalize raw referrers into a bounded enum (≤15 sources)classifyReferrer() maps hostnames; unknown → "other"
FR-6Traffic classification — categorize requests as human/bot/AI crawler5-class system using cf.botManagement.score + UA pattern matching
FR-7Core Web Vitals — LCP, INP, CLS at p75 per path per device classweb-vitals library → custom events → DDSketch aggregation in DO
FR-8JS error capture — window.onerror and onunhandledrejection as lightweight error signalError count per path per day; filename-only source (no full URLs)
FR-9Ranked queries — top posts, top referrers, period comparison, timeseriesTyped API endpoints returning deterministic results
FR-10AI query layer — natural language questions answered with cited evidencePOST /api/ask → planner → ≤8 tool calls → evidence-backed answer
FR-11MCP tools — agent-queryable analytics surface6 typed tools matching the query API
FR-12Snapshot page — glanceable /analytics dashboard (auth-gated)4 cards: 7-day pageviews, top posts, top referrers, week-over-week delta
FR-13Data export — monthly NDJSON snapshots to R2OTel-aligned field names; importable by DuckDB/BigQuery/Axiom

#Nonfunctional requirements

These are the constraints. How the system behaves under pressure, at the edges, and over time — the stuff that determines whether you actually trust it.

IDRequirementTargetEnforcement
NFR-1Client weight≤2KB gzipped (hard cap)CI size gate fails the build
NFR-2Lighthouse impactZero measurable regression in LCP/INP/CLS p75No synchronous XHR, no DOM writes, no layout-affecting elements
NFR-3PrivacyNo cookies, no localStorage IDs, no fingerprinting, no raw IP storageEnforced at ingestion — geo derived server-side, then discarded
NFR-4Write integrityEvery flush is idempotent and transactionalflush_batch table; duplicate batch_id → skip
NFR-5BackpressureGraceful degradation under load (governor mode)4-step ladder: disable custom events → drop breakdowns → sample → daily-only
NFR-6Cardinality boundsNo unbounded dimension growthHard limits: 10K paths, 1K referrers, 10 meta keys per event
NFR-7Storage costProportional to content count, not traffic volumerrdtool-inspired rollup: hourly → daily → core; Cron-enforced retention
NFR-8Free-tier operationRuns entirely on Cloudflare's free plan at low/moderate trafficAggregate-first writes, bounded dimensions, query caching
NFR-9ValidationEvery boundary has a runtime schemaZod at ingestion, DO, and query layer; types derived from validators
NFR-10PortabilityData is always exportable in a standard formatMonthly R2 NDJSON with OTel-aligned naming
NFR-11Ops overheadZero servers to maintainDeploys via wrangler in CI; no containers, no uptime monitoring
NFR-12RecoveryCorrect after crash, not just during normal operationIdempotent flushes, forward-only migrations, append-only raw events

A few things to notice about this split: the FRs are roughly ordered by collection → query → export (data flows downhill). The NFRs are roughly ordered by what your reader cares about (weight, speed, privacy) → what your system cares about (integrity, pressure, bounds) → what future you cares about (portability, ops, recovery). That ordering is intentional — it mirrors who screams first when something breaks.

#Non-goals (v1)


#RFC: AI-Native Analytics for Blogs (v0.1)

#1) High-level architecture

AggregationQueryIngestionClientmonthly snapshotsendBeaconvalidate + normalizea.js (≤2KB)Worker: /a/collectDurable ObjectD1R2 exportsWorker: /api/metricsMCP Toolsanalytics snapshot
AggregationQueryIngestionClientmonthly snapshotsendBeaconvalidate + normalizea.js (≤2KB)Worker: /a/collectDurable ObjectD1R2 exportsWorker: /api/metricsMCP Toolsanalytics snapshot
View source
graph TD
  subgraph Client
    A["a.js (≤2KB)"]
  end

  subgraph Ingestion
    B[Worker: /a/collect]
  end

  subgraph Aggregation
    C[Durable Object] --> D[(D1)]
    D -.->|monthly snapshot| E[(R2 exports)]
  end

  subgraph Query
    F[Worker: /api/metrics]
    G[MCP Tools]
    H["analytics snapshot"]
  end

  A -->|sendBeacon| B
  B -->|validate + normalize| C
  D --> F & G & H

Ingestion plane

Aggregation plane

Storage plane

Query plane

AI plane


#2) Collection protocol

Event envelope

export type EventKind = "pv" | "engaged" | "outbound" | "error" | "vital" | "custom";

export interface CollectEvent {
  kind: EventKind;
  ts: number; // epoch ms
  path: string; // normalized pathname
  ref?: string; // raw referrer (optional)
  title?: string; // optional page title hashable
  utm?: {
    source?: string;
    medium?: string;
    campaign?: string;
  };
  meta?: Record<string, string | number | boolean | null>;
}

export interface CollectRequest {
  site: string;         // public site key
  v: 1;                 // protocol version
  events: CollectEvent[];
}

Default events (fire automatically)

The client script emits these without any configuration:

EventWhenData
pvDOMContentLoadedpath, ref, title
engagedComposite trigger (scroll/dwell/click)maxDepth (0-100)
outboundClick on with external hrefpath, meta.href, meta.text
milestoneEach scroll quartile crossedmeta.depth (25/50/75/100)
errorwindow.onerror / onunhandledrejectionmeta.message, meta.source, meta.line, meta.col
vitalweb-vitals callbackmeta.vital (lcp/inp/cls), meta.value

JS error capture (Sentry-lite)

Not a full error monitoring product. Just a signal: "something broke on this page."

window.onerror = (message, source, line, col) => {
  queue({
    kind: "error" as EventKind,
    ts: Date.now(),
    path: location.pathname,
    meta: {
      message: String(message).slice(0, 256),
      source: source?.split("/").pop() ?? "unknown", // filename only, no full URLs
      line: line ?? 0,
      col: col ?? 0,
    },
  });
};

window.onunhandledrejection = (event) => {
  queue({
    kind: "error" as EventKind,
    ts: Date.now(),
    path: location.pathname,
    meta: {
      message: String(event.reason).slice(0, 256),
      source: "unhandled_rejection",
    },
  });
};

What this answers:

What this doesn't do: stack traces, source maps, breadcrumbs, user replay. For that, use Sentry. For a blog, error rate is the signal. Error detail is a debugging tool you reach for when the rate spikes.

Stored as custom events, aggregated into a simple error counter on fact_daily:

ALTER TABLE fact_daily ADD COLUMN error_count INTEGER NOT NULL DEFAULT 0;

Custom events (custom) require explicit instrumentation:

// Only if someone wants to track something specific
window.__a?.track("signup_click", { cta: "header" });

The API surface is one function. If window.__a doesn't exist (script blocked, not loaded yet), the call silently no-ops.

OTel-inspired semantic naming

We don't use the OpenTelemetry SDK. But we steal its naming discipline. When data leaves this system — via R2 export, MCP tool output, or the ask endpoint — field names should be recognizable to anyone who's worked with observability tooling.

Our fieldOTel semantic conventionWhy it matters
pathurl.pathPortable to any analytics tool
refhttp.request.header.refererStandard, not invented
device_classuser_agent.device.typeDuckDB/BigQuery can join on this
country_codeclient.geo.country_iso_codeISO 3166-1 alpha-2, universally understood
traffic_classhttp.request.bot_score_classOur addition - no OTel equivalent yet

The R2 monthly exports use OTel-aligned field names even if the internal D1 schema uses shorter column names. Export is the portability boundary — that's where naming discipline pays off.

Lightweight RUM (Real User Monitoring)

Core Web Vitals are too useful to skip and too small to justify skipping. Google's web-vitals library is ~1.5KB and gives you LCP, INP, and CLS — the three metrics that determine whether your analytics script is harming the site it's measuring.

import { onLCP, onINP, onCLS } from "web-vitals"; // tree-shaken

onLCP((metric) => queue({ kind: "custom", ts: Date.now(), path, meta: { vital: "lcp", value: metric.value } }));
onINP((metric) => queue({ kind: "custom", ts: Date.now(), path, meta: { vital: "inp", value: metric.value } }));
onCLS((metric) => queue({ kind: "custom", ts: Date.now(), path, meta: { vital: "cls", value: metric.value } }));

This isn't a full RUM product. It's a canary:

Stored as custom events, aggregated into a fact_daily_vitals table:

CREATE TABLE IF NOT EXISTS fact_daily_vitals (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  device_class TEXT NOT NULL DEFAULT 'unknown',
  lcp_p75 REAL,     -- milliseconds
  inp_p75 REAL,     -- milliseconds
  cls_p75 REAL,     -- unitless score
  sample_count INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, device_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

p75 aggregation happens in the DO — not by storing raw values, but by maintaining a DDSketch or simpler quantile estimator in memory and flushing the percentile. Fixed memory regardless of sample count.

Budget impact: web-vitals is ~1.5KB gzipped. Combined with the core client (~500 bytes), total stays under the 2KB hard cap.

Protocol compatibility:

Engagement definition

An engaged event is a composite boolean, true when any of these conditions hold:

  1. Reader crosses 60% article depth (single IntersectionObserver marker)
  2. Reader keeps tab visible for >= 20s on article route
  3. Reader clicks an outbound link from the article body

Why not only time-on-page? It lies. A tab left open in the background inflates engagement. Why not only scroll depth? It can miss accessibility and non-scroll reading patterns. Composite engagement is still simple, but less brittle.

This signal also helps bot filtering, but it is not the only bot defense.

Scroll milestones

Beyond the binary engagement signal, we track how far readers get. Four IntersectionObserver instances on invisible markers at article quartiles (25%, 50%, 75%, 100%). Each fires exactly once and disconnects.

const MILESTONES = [25, 50, 75, 100] as const;
type ScrollMilestone = typeof MILESTONES[number];

function observeScrollDepth(article: HTMLElement) {
  const height = article.offsetHeight;
  let maxDepth = 0;

  for (const pct of MILESTONES) {
    const marker = document.createElement("div");
    marker.style.cssText =
      "height:1px;width:1px;position:absolute;pointer-events:none;";
    marker.style.top = `${(pct / 100) * height}px`;
    article.style.position ||= "relative";
    article.appendChild(marker);

    const observer = new IntersectionObserver(
      ([entry]) => {
        if (entry.isIntersecting) {
          maxDepth = Math.max(maxDepth, pct);
          observer.disconnect();
        }
      },
      { threshold: 0 },
    );
    observer.observe(marker);
  }

  return () => maxDepth;
}

Why IntersectionObserver instead of scroll listeners:

What this unlocks for a writer:

The engagement event becomes: { kind: "engaged", maxDepth: 75, ... } instead of just a boolean.

Add max_depth to the daily fact:

ALTER TABLE fact_daily ADD COLUMN avg_max_depth REAL; -- 0-100
ALTER TABLE fact_daily ADD COLUMN completions INTEGER NOT NULL DEFAULT 0; -- reached 100%

Transport

let pending: CollectEvent[] = [];
let flushTimer: number | null = null;

function queue(event: CollectEvent) {
  pending.push(event);
  if (!flushTimer) {
    flushTimer = setTimeout(flush, 2000); // 2s debounce
  }
  if (pending.length >= 10) flush(); // size cap
}

function flush() {
  if (!pending.length) return;
  navigator.sendBeacon(
    "/a/collect",
    JSON.stringify({ site: SITE_KEY, v: 1, events: pending }),
  );
  pending = [];
  if (flushTimer) clearTimeout(flushTimer);
  flushTimer = null;
}

document.addEventListener("visibilitychange", () => {
  if (document.visibilityState === "hidden") flush();
});

Why no Service Worker

A Service Worker could queue failed beacons offline and retry via Background Sync. Sounds responsible. Not worth it.

The entire client architecture:

a.js (< 2KB gzipped)
├── IntersectionObserver × 4 (scroll milestones)
├── event queue (plain array)
├── debounced flush (2s / 10 events)
├── sendBeacon on visibilitychange
└── that's it

No Service Worker. No IndexedDB. No Background Sync. No scroll listeners. Understandable in 60 seconds.


#3) Privacy model

Collection integrity (anti-poisoning):


#4) Data model (D1)

Schema strategy is intentionally split:

-- Sites
CREATE TABLE IF NOT EXISTS site (
  id TEXT PRIMARY KEY,
  slug TEXT NOT NULL UNIQUE,
  created_at INTEGER NOT NULL
);

-- Normalized dimensions
CREATE TABLE IF NOT EXISTS dim_path (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  site_id TEXT NOT NULL,
  path TEXT NOT NULL,
  path_hash TEXT NOT NULL,
  UNIQUE(site_id, path_hash),
  FOREIGN KEY(site_id) REFERENCES site(id)
);

CREATE TABLE IF NOT EXISTS dim_referrer (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  site_id TEXT NOT NULL,
  source TEXT NOT NULL,      -- normalized enum (see below)
  raw_host TEXT,             -- optional debug host
  referrer_query TEXT,       -- AI search query context (Phase 2, nullable)
  UNIQUE(site_id, source, raw_host),
  FOREIGN KEY(site_id) REFERENCES site(id)
);

-- Referrer source enum (≤15 values, normalize at ingestion):
-- direct | google | bing | duckduckgo | x | hn | reddit | linkedin
-- mastodon | bluesky | newsletter | rss | github | chatgpt | perplexity | other
--
-- Normalization: classifyReferrer(host) maps hostnames to sources.
-- e.g. t.co, twitter.com, x.com → "x"
-- e.g. news.ycombinator.com → "hn"
-- e.g. chatgpt.com, chat.openai.com → "chatgpt"
-- e.g. perplexity.ai → "perplexity"
-- Unknown hosts → "other".
-- raw_host is retained only for bounded debugging/reporting, not unbounded growth.
--
-- AI search referrer enrichment (Phase 2):
-- Some AI search engines pass query context in URL params (?q=, ?query=, etc.)
-- When referrer is an AI search source AND query params are present,
-- extract and store as referrer_query in dim_referrer.
-- This answers "why did someone land here?" - not just "where from?"
-- Bounded: only captured for AI search sources, not all referrers.

-- v1 core daily fact (low cardinality)
CREATE TABLE IF NOT EXISTS fact_daily (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,    -- YYYY-MM-DD
  path_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  outbound_clicks INTEGER NOT NULL DEFAULT 0,
  custom_events INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- v2 optional breakdown tables
CREATE TABLE IF NOT EXISTS fact_daily_referrer (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  referrer_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, referrer_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id),
  FOREIGN KEY(referrer_id) REFERENCES dim_referrer(id)
);

CREATE TABLE IF NOT EXISTS fact_daily_country (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  country_code TEXT NOT NULL DEFAULT 'XX',
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, country_code),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

CREATE TABLE IF NOT EXISTS fact_daily_device (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  device_class TEXT NOT NULL DEFAULT 'unknown', -- mobile|desktop|tablet|unknown
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, device_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- Hourly optional short window (e.g. last 7-14 days)
CREATE TABLE IF NOT EXISTS fact_hourly (
  site_id TEXT NOT NULL,
  hour_utc TEXT NOT NULL,    -- YYYY-MM-DDTHH
  path_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, hour_utc, path_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- Flush idempotency
CREATE TABLE IF NOT EXISTS flush_batch (
  batch_id TEXT PRIMARY KEY,
  site_id TEXT NOT NULL,
  created_at INTEGER NOT NULL,
  FOREIGN KEY(site_id) REFERENCES site(id)
);

CREATE INDEX IF NOT EXISTS idx_fact_daily_site_date ON fact_daily(site_id, date_utc);
CREATE INDEX IF NOT EXISTS idx_fact_daily_path ON fact_daily(site_id, path_id, date_utc);
CREATE INDEX IF NOT EXISTS idx_fact_hourly_site_hour ON fact_hourly(site_id, hour_utc);

Idempotency rule: every aggregator flush includes a batch_id; writes occur in one transaction that first inserts into flush_batch. If batch_id already exists, skip (already applied).


#5) Durable Object aggregator contract

export interface AggregateKey {
  siteId: string;
  dateUtc: string;
  hourUtc?: string;
  pathKey: string;
  refKey?: string;
  countryCode?: string;
  deviceClass?: "mobile" | "desktop" | "tablet" | "unknown";
}

export interface AggregateDelta {
  pageviews?: number;
  engagedVisits?: number;
  outboundClicks?: number;
  customEvents?: number;
}

export interface AggregatorMessage {
  key: AggregateKey;
  delta: AggregateDelta;
}

export interface FlushPolicy {
  maxBuffered: number;   // e.g. 1_000
  flushEveryMs: number;  // e.g. 10_000
}

Behavior:


#6) Query API interfaces

export interface Period {
  from: string; // YYYY-MM-DD
  to: string;   // YYYY-MM-DD
}

export interface TopPost {
  path: string;
  pageviews: number;
  engagedVisits: number;
  engagementRate: number;
}

export interface TopPostsResponse {
  period: Period;
  rows: TopPost[];
}

export interface ReferrerRow {
  source: string;
  pageviews: number;
  engagedVisits: number;
}

export interface ComparePeriodsResponse {
  current: {
    pageviews: number;
    engagedVisits: number;
  };
  previous: {
    pageviews: number;
    engagedVisits: number;
  };
  delta: {
    pageviewsPct: number;
    engagedVisitsPct: number;
  };
}

Endpoints:

API constraints (hard limits):


#7) AI query layer

The AI calls typed metric functions. It does not write SQL.

Deterministic analysis functions

export interface AnalyticsService {
  topPosts(period: Period, limit: number): Promise<TopPostsResponse>;
  referrerBreakdown(period: Period): Promise<ReferrerRow[]>;
  comparePeriods(current: Period, previous: Period): Promise<ComparePeriodsResponse>;
  postTrend(path: string, period: Period): Promise<Array<{ date: string; pageviews: number }>>;
  evergreenScore(path: string, trailingDays: number): Promise<number>;
}

Ask endpoint

export interface AskRequest {
  question: string;
  siteId: string;
}

export interface AskResponse {
  answer: string;
  evidence: Array<{
    metric: string;
    value: number | string;
    source: string; // function + params
  }>;
}

Guardrails:


#8) MCP surface

Tools for agent clients:

Tool outputs should be machine-readable first, narrative second.


#9) Performance / Lighthouse constraints

Hard requirements:

  1. Script is defer and non-blocking
  2. No synchronous XHR
  3. No DOM writes from analytics script
  4. No layout-affecting injected elements
  5. Beacon dispatch on visibilitychange and pagehide fallback
  6. JS parse/execute budget tracked in CI

Perf acceptance targets:


#10) Free-tier operating mode

Staying on the free tier:

Safety valve: "governor mode" step-down ladder under sustained high volume:

  1. disable custom event ingestion
  2. drop country/device breakdown writes
  3. increase sampling factor (for pageviews only)
  4. disable hourly writes (daily only)

Each transition emits an ops event so reduced fidelity is explicit, not silent.


#11) Migration strategy

Phase 1: pageviews + engaged + top posts + referrers + MCP tools + tiny /analytics snapshot Phase 2: period compare + trend series + cache/rate-limit hardening Phase 3: conversational ask with evidence citations Phase 4: optional exports (R2) + scheduled digests


#12) Enforced constraints (not advisory)

Advisory constraints get ignored under pressure. These are enforced in CI, at build time, or at runtime — not in a README.

Schema validation (Zod)

Every boundary gets a Zod schema. Not just "for documentation" — for runtime parsing.

const CollectEventSchema = z.object({
  kind: z.enum(["pv", "engaged", "outbound", "custom"]),
  ts: z.number().int().positive(),
  path: z.string().min(1).max(2048).startsWith("/"),
  ref: z.string().max(2048).optional(),
  meta: z.record(z.union([z.string(), z.number(), z.boolean(), z.null()])).optional(),
});

const CollectRequestSchema = z.object({
  site: z.string().min(1).max(64),
  v: z.literal(1),
  events: z.array(CollectEventSchema).min(1).max(50),
});

The ingestion Worker parse()s with Zod before anything else. Malformed payloads get a 400 and a counter bump — never silent drops, never partial ingestion.

Linting and formatting

Biome. Not ESLint. Single tool for lint + format. Runs in CI as a gate, not a suggestion. Zero warnings policy — if it warns, either fix it or configure it out. No // biome-ignore without a comment explaining why.

Type boundaries

Every Worker ↔ DO ↔ D1 boundary has an explicit type. AggregatorMessage isn't just an interface in a file — it's the actual shape that gets validated at the DO's fetch() handler. If the ingestion Worker sends the wrong shape, it fails loudly at the boundary, not silently deep inside the aggregator.

// This is a type AND a runtime validator
const AggregatorMessageSchema = z.object({ ... });
type AggregatorMessage = z.infer<typeof AggregatorMessageSchema>;

Types and validators derived from the same source. No drift.

Client script budget

The analytics client (a.js) has a hard size gate in CI:

# In CI pipeline
GZIP_SIZE=$(gzip -c dist/a.js | wc -c)
if [ "$GZIP_SIZE" -gt 2048 ]; then
  echo "❌ a.js exceeds 2KB gzipped ($GZIP_SIZE bytes)"
  exit 1
fi

Not aspirational. A gate. If the script grows past 2KB gzipped, the build fails.


#13) Development process

Testing strategy

Three layers, each with a clear job:

LayerToolWhat it testsWhen it runs
UnitVitestPure functions: classifyReferrer(), coalesceDeltas(), computeEngagement()Every commit (CI)
IntegrationMiniflareWorker ↔ DO ↔ D1 round-trips: "event in → aggregate out → query returns correct count"Every PR
ContractZod assertionsBoundary shapes: "the DO accepts exactly this shape and rejects everything else"Every commit (CI)

No E2E browser tests for v1. The client script is small enough to unit test the beacon logic. The server is tested through integration. Browser testing is cost that doesn't pay for itself at this scale.

What doesn't get tested: D1 SQL syntax (trust SQLite), Cloudflare routing config (tested by deployment), UI layout (there's barely any UI).

Environments

EnvPurposeD1 instanceURL
localDevelopment + integration testsMiniflare (in-memory)localhost:8787
productionLiveProduction D1analytics.bristanback.com

Why not ephemeral deploys per branch? Solo developer, one blog. Miniflare replicates D1/DO behavior well enough for integration tests. Ephemeral preview environments are team infrastructure — real value when multiple people need to review deployments, overhead when it's just you. If this grows beyond a solo project, add a preview env with branch-scoped D1. The wrangler --env flag makes it trivial to add later.

CI pipeline

biome check → tsc --noEmit → vitest run → size gate (a.js) →
  wrangler deploy --env preview (on PR)
  wrangler deploy --env production (on merge to main)

Every step is a gate. No "allowed to fail" steps.

Secrets management

No .env files in the repo. Ever. Wrangler secrets for API keys. Environment-specific config (site IDs, rate limits) via wrangler.jsonc env blocks. The only thing that differs between preview and production is the D1 binding and the domain — everything else is code.


#14) Data evolution and governance

Schema migrations

D1 doesn't have a migration runner. So we build one — minimal:

const MIGRATIONS = [
  { version: 1, sql: `CREATE TABLE IF NOT EXISTS site ...` },
  { version: 2, sql: `CREATE TABLE IF NOT EXISTS fact_daily_referrer ...` },
  // ...
] as const;

async function migrate(db: D1Database): Promise<void> {
  await db.exec(`CREATE TABLE IF NOT EXISTS _migrations (
    version INTEGER PRIMARY KEY,
    applied_at INTEGER NOT NULL
  )`);
  const applied = await db.prepare(`SELECT version FROM _migrations`).all();
  const appliedSet = new Set(applied.results.map(r => r.version));
  for (const m of MIGRATIONS) {
    if (!appliedSet.has(m.version)) {
      await db.exec(m.sql);
      await db.prepare(`INSERT INTO _migrations (version, applied_at) VALUES (?, ?)`)
        .bind(m.version, Date.now()).run();
    }
  }
}

Runs on Worker startup (cached after first run). Forward-only — no down migrations. If a migration is wrong, ship a new forward migration that fixes it.

The rrdtool principle: fixed storage, automatic rollup

The oldest good idea in time-series storage is rrdtool's Round Robin Database (1999): allocate fixed space, and as data ages, roll it into coarser resolution. Graphite's whisper format does the same thing. The insight is that you never need per-minute granularity from six months ago — you need the shape, not the points.

This system applies the same principle to D1:

Resolution tiers:
  0-14 days  → hourly   (fact_hourly)
  0-90 days  → daily + breakdowns (fact_daily_referrer, _country, _device)
  0-∞        → daily core (fact_daily) — just path × pageviews × engaged

As data ages, it loses dimensions but never disappears. The Cron Trigger isn't just "deleting old data" — it's the rollup engine. Before deleting fact_hourly rows, it verifies they've been absorbed into fact_daily. Before deleting breakdown tables, the core fact still holds the totals.

The result: storage cost is roughly proportional to the number of paths (content), not the number of events (traffic). A blog with 50 posts and 5 years of history fits comfortably in D1's free tier. That's the rrdtool promise: time passes, storage doesn't grow.

Retention policy

TableRetentionRollup targetEnforcement
fact_hourly14 daysfact_dailyCron: verify daily absorbed, then delete
fact_daily_referrer90 daysfact_daily (totals preserved)Cron trigger
fact_daily_country90 daysfact_daily (totals preserved)Cron trigger
fact_daily_device90 daysfact_daily (totals preserved)Cron trigger
fact_daily_vitals90 days- (historical vitals less useful)Cron trigger
fact_dailyIndefinite-Fixed cost ≈ rows × paths
dim_referrer.raw_host30 dayssource enum preservedCron nullifies raw_host
flush_batch7 days-Cron trigger

Retention is enforced by a scheduled Worker (Cron Trigger), not by hope. Each purge logs what it deleted for auditability. The Cron runs daily and is itself idempotent — safe to re-run, safe to miss a day.

Cardinality governance

Unbounded dimensions are how analytics systems die. Hard limits:

These aren't advisory. The ingestion Worker enforces them at write time. If a dimension table is full, new values get bucketed into the catch-all. An ops event fires so you know it happened.

Traffic classification

Cloudflare's Bot Management gives us a cf.botManagement.score (0-99) on every request. We should use it, but not as a binary filter — as a dimension.

CREATE TABLE IF NOT EXISTS fact_daily_traffic_class (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  traffic_class TEXT NOT NULL, -- human | likely_human | likely_bot | verified_bot | ai_crawler
  pageviews INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, traffic_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

Classification logic:

type TrafficClass = "human" | "likely_human" | "likely_bot" | "verified_bot" | "ai_crawler";

function classifyTraffic(req: Request): TrafficClass {
  const score = req.cf?.botManagement?.score ?? 50;
  const verifiedBot = req.cf?.botManagement?.verifiedBot ?? false;
  const ua = req.headers.get("user-agent") ?? "";

  // AI crawlers: GPTBot, ClaudeBot, Google-Extended, Bytespider, CCBot, etc.
  if (/GPTBot|ClaudeBot|Claude-Web|Google-Extended|Bytespider|CCBot|PerplexityBot|Amazonbot/i.test(ua)) {
    return "ai_crawler";
  }
  if (verifiedBot) return "verified_bot";   // Googlebot, Bingbot, etc.
  if (score >= 80) return "human";
  if (score >= 30) return "likely_human";
  return "likely_bot";
}

This gives you answers to questions that actually matter in 2026:

The traffic_class dimension is a v2 breakdown table — same pattern as referrer/country/device. Not in the core fact table, not on the hot path, but available when you want to ask the question.

Data export and portability

Monthly R2 snapshots (Phase 4) export the full D1 state as NDJSON. This is your escape hatch — if you outgrow D1, if Cloudflare changes pricing, if you want to move to ClickHouse someday. The data is always yours, always portable, always in a format that any system can ingest.


#Decisions

  1. Engagement: Composite boolean (scroll depth OR visible dwell OR outbound click), emitted once per pageview. (See collection protocol.)
  2. Bot filtering: Edge heuristics first (Cloudflare bot score + UA), engagement as secondary quality signal. No custom bot signature maintenance.
  3. Referrer normalization: Small enum (≤15 values), classified at ingestion. (See data model.)
  4. Multi-site: Shared DB with site_id partitioning. One blog now, column's already there if that changes.
  5. Fact cardinality: v1 uses low-cardinality core fact table; breakdowns are separate optional tables.
  6. Write integrity: Aggregator flushes are idempotent and transactional via flush_batch.
  7. Validation: Zod at every boundary. Types and validators derived from the same schema — no drift.
  8. Testing: Unit (pure functions) + integration (Miniflare round-trips) + contract (Zod). No E2E browser tests at this scale.
  9. Environments: Miniflare for local dev and integration tests. No ephemeral deploys until the project outgrows solo development.
  10. Migrations: Forward-only, versioned, runs on startup. No down migrations.
  11. Traffic classification: 5-class system (human → ai_crawler) using cf.botManagement.score + UA patterns. Stored as a v2 breakdown dimension, not in core fact.
  12. Data portability: Monthly NDJSON exports to R2. The data is always yours.

#Open questions

  1. How much hourly granularity is genuinely useful for bloggers? (14-day window feels right — enough for "is this post spiking?" without long-term storage cost.)
  2. Should ask run only against precomputed views to prevent query abuse? Decided: Hard budget of 8 tool calls per ask request. The AI analyst gets a fixed number of moves — enough for compare + breakdown + follow-up, not enough to accidentally run up the D1 bill. If it can't answer in 8 calls, the question is too vague; ask the human to narrow it.
  3. Dashboard surface: should this be MCP-only, or include a web UI?

Decision: MCP-first + tiny snapshot page.

Rationale: MCP-only is fastest and matches current workflow, but a tiny snapshot preserves glanceability without adding a heavy dashboard surface. Both MCP and snapshot use the same typed metrics/query layer, so no duplicate logic.


#What the client actually looks like

<script defer src="/a.js" data-site="bri-blog"></script>

a.js does three things: capture a pageview, queue an engagement event, flush via beacon when the tab hides. Nothing else.


The point of this system is not to know everything.

It's to know enough, quickly, privately, and truthfully — then use AI to ask better questions over that evidence.

Not surveillance. Not theater. Just instrumentation with taste.

One of the first things I ever built for the web was a hit counter — Perl/CGI, flat file, incremented on every page load. I still get emails about it. Twenty-something years later, after the entire Google Analytics era — Urchin to UA to GA4 to ripping it all out — I'm back to building my own way to answer "did anyone read this?" The tools are different. The question hasn't changed.

#A note on scale and honesty

This system is for a personal blog that gets maybe 200 visits a day. At that volume, most of this architecture is overkill. You could write directly to D1 from the ingestion Worker with upserts and it'd be fine for years. And yes — writing a 3,000-word spec with Zod schemas, a Mermaid diagram, non-goals, and a migration plan for something most people would solve with Plausible and a weekend is peak RFC-maxxing. Noted.

I know that. The Durable Object aggregation layer, the governor mode, the flush batching — none of it is justified by current traffic. A blog doesn't need backpressure. It needs a database and a query.

But this is also a learning exercise. I want to understand Cloudflare's primitives — DOs, D1, Workers, Cron Triggers, R2 — by building something real with them, not by reading docs. And "real" means making architectural decisions that would matter at scale, even if they don't matter yet. The DO isn't here because I need a write buffer. It's here because I want to understand how write buffers behave on this platform — what the alarm API feels like, how DO ↔ D1 coordination works under contention, what happens when you actually need backpressure.

One known tradeoff: DO cold starts. At low traffic, the DO will frequently be evicted and wake cold for the first event of a session — maybe ~50ms of latency. But the client fires beacons and doesn't wait for responses, so the user never feels it. The cost is invisible; the buffer is just slightly lazier waking up.

The tradeoff is honest: I'm trading simplicity for education. The system is designed to handle tens to hundreds of thousands of events responsibly and resource-efficiently — not because my blog will generate them, but because the next thing I build on this stack might, and I'd rather learn the failure modes now with low stakes.

Where this gets dangerous is if the learning exercise calcifies into production complexity. The commitment: if the DO layer turns out to be pure overhead with no educational payoff, rip it out and go direct-to-D1. Architecture should serve the system, not the architect's curiosity. But right now, the curiosity is the point.

This is a working spec — part of the workshop. I'll follow up with what actually survived contact with reality once it's built.

Tagged

  • architecture
  • ai
  • systems
  • tools
Last updated: February 27, 2026
On the trail: Building in Public