<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Bri Stanback</title>
    <link>https://bristanback.com</link>
    <description>A digital atelier at the intersection of technology, design, and the human experience of building.</description>
    <language>en</language>
    <lastBuildDate>Mon, 02 Mar 2026 03:00:00 GMT</lastBuildDate>
    <atom:link href="https://bristanback.com/rss.xml" rel="self" type="application/rss+xml"/>
    <item>
      <title>Trails: A Pattern for Navigating Ideas</title>
      <link>https://bristanback.com/notes/trails/</link>
      <guid isPermaLink="true">https://bristanback.com/notes/trails/</guid>
      <pubDate>Mon, 02 Mar 2026 03:00:00 GMT</pubDate>
      <atom:updated>2026-03-02T03:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>Why I replaced a knowledge graph with curated reading paths, and what I found when I looked for prior art.</description>
      <content:encoded><![CDATA[<p>I spent a Saturday afternoon building a knowledge graph for this site. You know the ones — that constellation of dots connected by lines, bobbing around in a force-directed simulation like a screensaver with ambitions. Every digital garden has one. Obsidian has one built in. They look beautiful in a screenshot and tell you almost nothing.</p>
<p>I loved it for about twenty minutes. Then I tried to actually <em>use</em> it. Click a dot, squint at the tiny label, maybe follow a link, lose my place, start over. My daughter wandered in barefoot, tugging at my sleeve, and tilted her head at the screen. &quot;What are all the bouncing dots?&quot; I said they were a map of my ideas. She giggled. &quot;They look like bubbles.&quot; And honestly — yeah. The whole thing felt fragile, pretty, and a little bit pointless.</p>
<p>The graph was decoration pretending to be navigation. If someone wanted to find all my pieces about identity, or trace how my thinking about agent design developed over three months, they were better off just scrolling the post list. That&#39;s not an implementation failure. That&#39;s a <em>concept</em> failure.</p>
<p>So I ripped it out and built something else: <strong>trails</strong>.</p>
<h2>What a trail is</h2>
<p>A trail is a curated reading path — a sequence of posts and notes ordered by how the thinking builds, not by when each piece was published. The same piece can appear on multiple trails. Not every piece needs a trail.</p>
<p>You can see the current set on the <a href="https://bristanback.com/trails/">trails page</a>. They shift as I write more — some will split, some will merge, new ones will appear.</p>
<p>The data lives in a single JSON file. Each trail has a title, a description, and an ordered list of URLs. At the bottom of every post, a small &quot;On the trail:&quot; badge links back. That&#39;s the whole system. No framework, no plugin, no ceremony.</p>
<h2>Why not a graph</h2>
<p>I like Obsidian. Most people use it as a clean markdown editor with good search — local files, no lock-in, a plugin ecosystem for whatever else you need. The graph view is there, and it&#39;s fun to look at for about a week, but it&#39;s not why people stay. When you&#39;re inside the tool navigating your own notes, the graph occasionally surfaces something useful — a cluster forming around a concept you didn&#39;t realize you&#39;d been circling, an orphan node that should connect to something. It&#39;s a <em>thinking</em> tool, not really a navigation tool.</p>
<p>But as a published artifact — as something a stranger encounters on your blog — the graph has problems.</p>
<p>It&#39;s visually overwhelming and informationally sparse. The spatial layout is random (force-directed, which means driven by physics simulation, not meaning). It privileges connection over sequence. And it tells you nothing about where to <em>start</em>.</p>
<p>Graphs are for the writer. Trails are for the reader. </p>
<p>A graph says &quot;here&#39;s everything, good luck.&quot; A trail says &quot;I&#39;ve been through this material — here&#39;s a path that makes sense, and I&#39;ll walk it with you.&quot;</p>
<h2>Prior art (or lack of it)</h2>
<p>I went looking for this pattern on other sites. It&#39;s not as common as I expected. I spent an evening down a rabbit hole of personal blogs, digital gardens, indie publishing tools — the kind of browsing where you open forty tabs and close them one by one, slightly disappointed each time.</p>
<p>The default buckets for blog navigation are: <strong>chronological archives</strong> (WordPress, Ghost — simple, doesn&#39;t surface idea progression), <strong>tags and categories</strong> (common but flat), <strong>digital gardens</strong> (Obsidian Publish, Foam, Quartz — interconnected webs with graphs and backlinks), and <strong>series</strong> (numbered multi-parters on a single topic).</p>
<p>None of these do quite what trails do — though I should be honest: trails are <em>close</em> to series. If you squint at the trails page and say &quot;those are just series with better marketing,&quot; you&#39;re not wrong. The real difference is that series are planned upfront. You sit down to write Part 1 of 5. Trails are discovered after the fact — you write pieces independently over months, then notice they were always going somewhere. A piece can live on multiple trails. The format doesn&#39;t have to be uniform. And &quot;start anywhere&quot; actually means it, because no piece was written assuming you&#39;d read the one before it.</p>
<p>Tags are unordered. Digital gardens have the right instinct — ideas connected across time — but they default to emergent structure rather than curated paths. The graph says &quot;look, connections!&quot; but doesn&#39;t say &quot;start here, then go there.&quot;</p>
<p>The closest matches I found:</p>
<p><strong>LessWrong&#39;s Sequences</strong> — Eliezer Yudkowsky&#39;s long essay sequences are the closest analog. Curated, ordered, conceptually building. But they&#39;re massive (book-length), didactic, and serve one author&#39;s philosophy. Trails are lighter, more modular, more &quot;here&#39;s how I got here&quot; than &quot;here&#39;s what you need to learn.&quot;</p>
<p><strong>Mandy Brown&#39;s A Working Library</strong> — Has thematic clusters and threads that build over time. More reading-focused (books and responses) than original writing, but the spirit is similar. There&#39;s a warmth to how she connects ideas that I keep coming back to.</p>
<p><strong>Tom Critchlow and Maggie Appleton&#39;s digital gardens</strong> — Emphasis on guided discovery and idea evolution, but still more graph-like than path-like.</p>
<p><strong>Substack &quot;canon&quot; posts</strong> — Some newsletter writers create &quot;start here&quot; collections for new subscribers. Useful, but static — a greatest-hits list frozen in amber, not an evolving path.</p>
<p>The word itself has a lineage. Vannevar Bush described &quot;trails&quot; in his 1945 essay <a href="https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/">&quot;As We May Think&quot;</a>, imagining a machine called the Memex that would let users create personal paths through linked information. His trails were associative and personal — less like a machine&#39;s memory, more like notes left in the margins of a borrowed book. Traces of someone who walked this way before and cared enough to mark it. Eighty years later, that still feels more alive to me than any glowing graph.</p>
<h2>What I considered instead</h2>
<p><strong>Vector search.</strong> Embed every post, let readers search by vibes instead of keywords. This is genuinely useful — and I&#39;ll probably add it eventually to improve related posts — but it doesn&#39;t solve navigation. You still need to know what you&#39;re looking for. Discovery and search are different problems.</p>
<p><strong>Unsupervised clustering.</strong> Let the machine find the themes. K-means or topic modeling over the corpus, auto-generate groupings. The results would probably be reasonable. But &quot;reasonable&quot; isn&#39;t the point. The value of a trail is that <em>I</em> chose the order — that the sequence reflects how the thinking actually developed, not how an algorithm would bucket it. There&#39;s something I&#39;m not willing to hand over there.</p>
<p><strong>More tags and wikilinks.</strong> I already use both. Tags give you flat sets. Wikilinks give you associative connections between specific pieces. Both are useful infrastructure — trails build on top of them rather than replacing them. Tags tell you &quot;these share a topic.&quot; Wikilinks tell you &quot;this references that.&quot; Trails tell you &quot;read these in this order and the ideas compound.&quot;</p>
<p>Each layer does something different. Tags are metadata. Wikilinks are citations. Trails are curation.</p>
<h2>What makes this different</h2>
<p>Trails aren&#39;t chronological, but they&#39;re not random either. They follow conceptual buildup — &quot;I thought about A, which opened up B, which broke open C.&quot; The order comes from how the thinking developed, not when the posts were published.</p>
<p>Pieces can appear on multiple trails. The SOUL.md post lives on the Identity trail <em>and</em> could reasonably live on the Building trail. That overlap is a feature — it&#39;s where trails cross, and crossing points are where the interesting connections live. Like running into someone you know in a neighborhood you didn&#39;t expect them.</p>
<p>And the whole thing is author-curated. Not algorithmically generated, not emergent from link structure. I decided the order. I wrote the descriptions. That&#39;s the point: it&#39;s a person saying &quot;here&#39;s how this fits together,&quot; not a system inferring connections from metadata. Maybe that doesn&#39;t scale. Maybe it doesn&#39;t need to.</p>
<h2>What I&#39;m still figuring out</h2>
<p>How long should a trail get before you split it? Right now I&#39;m thinking eight to ten pieces is a soft ceiling. What happens when two trails want to merge? What&#39;s the right visual language — I&#39;ve got a vertical line and dots, but it could be more expressive. Should trails have their own RSS feeds? I keep turning these over.</p>
<p>Honestly, the trails are a little bit for me too. Not just navigation for a reader — a way to notice what I&#39;ve been thinking about without meaning to.</p>
<p>I spent a Saturday ripping out the graph because it made me feel scattered — like I was performing seriousness instead of practicing it. Maybe that&#39;s the real lesson. Maybe the goal isn&#39;t to <em>look</em> connected. It&#39;s to <em>be</em> hospitable. To say: I&#39;ve thought about this, and I think you might care about it too, and here&#39;s the door.</p>
<p>The pattern is young. I&#39;m building it in public, which means watching it break.</p>
<p>If you&#39;ve seen something like this elsewhere — actual curated reading paths, not just tags or series — I&#39;d genuinely love to hear about it. The scarcity of prior art either means I&#39;ve found a gap or I&#39;ve missed something obvious. Both are interesting.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/trails-hero.webp" medium="image" type="image/webp" />
      <category>design</category>
      <category>reflection</category>
      <category>building</category>
    </item>
    <item>
      <title>What&apos;s Left: Software Engineering in the Agent Era</title>
      <link>https://bristanback.com/posts/software-engineering-agent-era/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/software-engineering-agent-era/</guid>
      <pubDate>Sat, 28 Feb 2026 20:00:00 GMT</pubDate>
      <atom:updated>2026-03-06T03:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>When anyone can spin up a coding agent and ship something workable, what actually matters? Not the word soup — the real answer.</description>
      <content:encoded><![CDATA[<p>I know someone who just got laid off from Amazon. He was a contractor — did real work, computer vision, the kind of engineering that used to be its own moat. Now he&#39;s searching for work as an &quot;AI engineer,&quot; which is what you call yourself in 2026 when you&#39;re a software engineer who wants to get hired.</p>
<p><a href="https://www.onwardsearch.com/blog/2026/01/top-ai-jobs/">Job postings with the title rose 143% last year</a>. At senior levels, the two letters come with an <a href="https://www.levels.fyi/blog/ai-engineer-compensation-trends-q3-2025.html">18% salary premium</a>. He&#39;s not gaming anything. He&#39;s reading the market correctly.</p>
<p>I just don&#39;t have anything useful to offer him that fits on a LinkedIn headline.</p>
<p>I keep hearing the same reassuring phrases: <em>judgment</em>, <em>taste</em>, <em>systems thinking</em>, <em>the human in the loop</em>. I searched a few job boards to see how companies are actually hiring for this moment, and the postings read like a different language — &quot;analytical thinking,&quot; &quot;problem-solving,&quot; &quot;collaboration.&quot; Skills so generic they could describe a golden retriever. Neither the tech platitudes nor the HR buzzwords are wrong, exactly. They&#39;re just... insufficient. They sound like what people say when they don&#39;t want to say &quot;I don&#39;t know either.&quot;</p>
<p>So let me try to say something more honest.</p>
<hr>
<h2>The Uncomfortable Middle</h2>
<p>In late February 2026, <a href="https://www.theguardian.com/technology/2026/feb/27/block-ai-layoffs-jack-dorsey">Block cut 40% of its workforce</a> — more than 4,000 people. Jack Dorsey said &quot;intelligence tool capabilities are compounding faster every week.&quot; The stock went up 20%.</p>
<p>I read that on my phone while my daughter was eating breakfast. She was concentrating on getting yogurt from the bowl to her mouth with a spoon — that full-body focus three-year-olds have where the rest of the world disappears. And I sat there doing the math on what 40% of my own company would look like. Trying to keep my face normal. The yogurt hit the table instead of her mouth and she laughed, and I laughed, and the market was up and 4,000 people were updating their LinkedIn profiles. That&#39;s what this moment feels like from the inside. Two things at once that don&#39;t fit in the same frame.</p>
<p>We&#39;re not in abundance and we&#39;re not in apocalypse. We&#39;re in the uncomfortable middle where the tools are good enough to make a lot of existing work optional but not good enough to make the people who do that work unnecessary. Yet.</p>
<p>Anyone at a company can now fire up a coding agent and build something that <em>works</em>. Not something beautiful. Not something maintainable at scale. But something that runs, does the thing, and passes a demo. That was a six-figure job two years ago.</p>
<p>This doesn&#39;t mean engineers are done. It means the floor dropped. The minimum viable skill to produce working software just fell through the basement. And when floors drop, the interesting question isn&#39;t &quot;does the building still stand?&quot; — it&#39;s &quot;which floors still matter?&quot;</p>
<hr>
<h2>The Scoreboard</h2>
<p>Salesforce <a href="https://www.reuters.com/business/world-at-work/salesforce-cuts-less-than-1000-jobs-business-insider-reports-2026-02-10/">eliminated 4,000 support roles</a> through AI agents — cut their support staff from 9,000 to 5,000. Benioff said half the work at Salesforce was being done by AI. Then, quietly, Salesforce executives <a href="https://timesofindia.indiatimes.com/technology/tech-news/after-laying-off-4000-employees-and-automating-with-ai-agents-salesforce-executives-admit-we-were-more-confident-about-/articleshow/126121875.cms">admitted</a> they were &quot;more confident about the results than the results justified.&quot; That&#39;s a hell of an epitaph for 4,000 jobs.</p>
<p>And then there&#39;s Klarna. They <a href="https://www.theguardian.com/business/2025/nov/18/buy-now-pay-later-klarna-ai-helped-halve-staff-boost-pay">cut headcount from 5,527 to 2,907</a> since 2022. Revenue per employee nearly doubled to <a href="https://techcrunch.com/2025/05/19/klarnas-revenue-per-employee-soars-to-nearly-1m-thanks-to-ai-efficiency-push/">$1 million</a>. Revenue up 108% over three years. The dashboards glowed green. Then <a href="https://mvidmar.substack.com/p/klarna-ai-60-million-saved-rehire-humans-2026">repeat customer contacts jumped 25%</a>. One in four customers was coming back because their issue hadn&#39;t actually been resolved. Klarna had to <a href="https://www.fastcompany.com/91468582/klarna-tried-to-replace-its-workforce-with-ai">rehire humans</a>. Their CEO now talks about a &quot;hybrid approach&quot; and says customers need &quot;a clear path to a human.&quot;</p>
<p>The pattern keeps repeating: cut aggressively, claim victory, discover the gaps, quietly rehire.</p>
<p>But here&#39;s the thing I keep wrestling with: how much of this is actually AI, and how much is pandemic hangover with better PR?</p>
<p>Tech companies <a href="https://www.nytimes.com/2026/02/01/business/layoffs-ai-washing.html">hired recklessly during the pandemic</a> — 700,000+ cuts globally since 2022 according to Layoffs.fyi, and the NYT notes much of that was a correction for overhiring, not AI displacement. IBM&#39;s CEO Arvind Krishna <a href="https://www.salesforceben.com/how-bad-were-tech-layoffs-in-2025-and-what-can-we-expect-next-year/">called it outright</a>: &quot;a natural correction,&quot; not AI. And even Sam Altman — the person with the most to gain from the &quot;AI is changing everything&quot; narrative — <a href="https://fortune.com/2026/02/19/sam-altman-confirms-ai-washing-job-displacement-layoffs/">admitted</a> that some companies are &quot;AI-washing,&quot; blaming artificial intelligence for layoffs they would have made regardless.</p>
<p>The <a href="https://www.theguardian.com/us-news/2026/feb/08/ai-washing-job-losses-artificial-intelligence">Guardian called it out</a>: CEOs saying &quot;we&#39;re integrating the newest technology&quot; when what they mean is &quot;we overhired and margins are tight.&quot; AI makes the cuts sound visionary instead of embarrassing.</p>
<p>So the honest answer is: it&#39;s both. Some jobs are genuinely being displaced by AI. Some companies are using AI as cover for a correction they needed to make anyway. And the really uncomfortable part is that it doesn&#39;t matter much to the person who lost their job which category they&#39;re in.</p>
<p>A <a href="https://mvidmar.substack.com/p/klarna-ai-60-million-saved-rehire-humans-2026">Harvard Business Review survey</a> from December 2025 found that 60% of organizations had already reduced headcount <em>in anticipation</em> of AI. Not in response to proven results — in anticipation. That&#39;s a bet, not a conclusion. And some of those bets are already losing.</p>
<p>But let&#39;s not sugarcoat the other side. <a href="https://www.techshotsapp.com/business/telegrams-30-billion-success-with-just-30-employees">Telegram runs on ~30 employees</a>. A billion users. $30 billion valuation. No HR department. No physical headquarters. Durov described it as &quot;a Navy SEAL team.&quot; They built this <em>before</em> the current AI wave. AI-native startups are now averaging <a href="https://www.qualtrics.com/articles/experience-management/how-businesses-use-ai-2025/">$3.48 million in revenue per employee</a> — six times traditional SaaS. (I wrote about this disruption from the enterprise side in <a href="https://bristanback.com/posts/saaspocalypse/">The SaaSpocalypse</a> — Jefferies literally used that word when they downgraded Workday and DocuSign last week.) The Klarna boomerang doesn&#39;t invalidate the trend. It just means the trend has teeth and some of those teeth bite back.</p>
<hr>
<h2>What I Actually See Changing</h2>
<p>I don&#39;t really write code much anymore. I don&#39;t look at code much. I have different agents evaluate it, and I know enough from twelve years of doing it that I can provide good guidance. But honestly — just articulating what you want clearly, without being prescriptive, gets you to roughly the same place. Maybe it costs a few extra tokens or an extra back-and-forth versus me <em>knowing</em> the answer. The delta is shrinking.</p>
<p>That&#39;s the quiet part that nobody in my position wants to say out loud. What this actually looks like day to day — the agent doesn&#39;t replace you, it changes what &quot;you&quot; means in the workflow — is something I explored in <a href="https://bristanback.com/posts/pervasive-ai-beyond-chat-window/">Pervasive AI</a>.</p>
<p><strong>The boilerplate layer is gone.</strong> Not going — gone. CRUD apps, standard API endpoints, form validation, data migrations, config files, CI pipelines. If it can be described in a sentence, an agent can build it. I used to pride myself on how fast I could scaffold a new service. That speed is now free.</p>
<p><strong>The integration layer is compressing.</strong> Stitching together three APIs, handling auth flows, managing state across services — this used to be &quot;senior engineer&quot; territory. Agents are getting decent at it. Decent enough that a product manager with a coding agent can get 70% of the way there. The last 30% is where things get expensive, and that gap is real — but it&#39;s also shrinking.</p>
<p><strong>The architecture layer is holding.</strong> Deciding <em>what</em> to build, how systems talk to each other at scale, what fails gracefully versus what fails catastrophically, where to put the boundaries. This still requires the scar tissue. For now. I want to be honest that &quot;for now&quot; is doing a lot of work in that sentence.</p>
<p><strong>The taste layer is... complicated.</strong> Everyone says taste matters more. I think that&#39;s true but not in the way people mean. It&#39;s the ability to look at something an agent produced and know — in your body, not your head — that it&#39;s wrong. That the abstraction is leaky. That the error handling looks complete but misses the failure mode that&#39;ll wake you up at 3am. You know the feeling, right? That low-grade unease when a PR looks clean but something&#39;s off and you can&#39;t articulate what yet? I still have that when I review agent-generated code. But I got it from years of being the person who got woken up. If you skip the being-woken-up part, do you still develop the flinch?</p>
<hr>
<h2>Where You Sit Changes What You See</h2>
<p>I&#39;ve been at small companies my entire career — under 50 people, since I was fifteen. Never worked at a FAANG. Actively avoided enterprise. What I see depends on where I&#39;m standing, and I&#39;m standing in a pretty specific spot. All of this is colored by that.</p>
<p><strong>Big tech:</strong> Still hiring, but the ratio shifts. Fewer engineers, more leverage per engineer. The &quot;staff+&quot; tier gets more important — people who can evaluate what agents produce, set architectural constraints, own system-level decisions. Junior headcount shrinks. The intern pipeline narrows. This is already happening, and the people making the cuts aren&#39;t the ones who&#39;ll feel the talent gap five years from now.</p>
<p><strong>Enterprise:</strong> Slower to change, as always. Compliance, security, legacy systems — these are moats against pure agent-driven development. But they&#39;re eroding moats. The engineers who thrive here will be the ones who understand the <em>regulatory</em> and <em>organizational</em> constraints, not just the technical ones. Knowing how to navigate a SOC 2 audit or talk a VP out of a bad architecture decision — that&#39;s engineering now, whether or not it involves code.</p>
<p><strong>Mid-size companies:</strong> This is where it gets brutal, and it&#39;s where a lot of my friends work. A team of 5 engineers with agents might output what a team of 20 did in 2024. That&#39;s transformative for the companies and devastating for the people who made up the other 15. The &quot;solid mid-level generalist&quot; — the backbone of every engineering org I&#39;ve ever worked in — is the role most under pressure. These are good engineers. They&#39;re not doing anything wrong.</p>
<p><strong>Startups:</strong> The golden window. A technical founder with agents can build and ship a real product without a team. Right now, that&#39;s a superpower. But the window might be short — because if <em>you</em> can do it, so can everyone else. The moat isn&#39;t the software anymore. It&#39;s the distribution, the relationships, the domain knowledge the software encodes.</p>
<p>I keep coming back to Fred Brooks. He ran IBM&#39;s OS/360 project in the 1960s, and in 1975 he wrote <em>The Mythical Man-Month</em> — still one of the best books about software — arguing that adding people to a late project makes it later. The communication overhead compounds faster than the productivity gains. The agent-era version might be: adding agents to a bad architecture makes it worse faster. I&#39;ve seen this. The fundamental insight is the same — more labor doesn&#39;t fix a clarity problem. It amplifies it.</p>
<hr>
<h2>The Knowledge Moat Dissolves</h2>
<p>About eight years ago, I made changes to the implementation of RFC 3489-compatible full-cone SNAT — a Linux kernel module. I had no business working on kernel code. But with enough research, enough fiddling, enough stubborn persistence and late nights reading man pages that hadn&#39;t been updated since 2009, I got it working. That experience always felt like proof that a motivated generalist could go deep on almost anything given enough time.</p>
<p>AI just compressed the time.</p>
<p>I was interviewing someone recently whose son was into 3D printing. The kid used AI to generate STL files — skipped all the painful CAD fundamentals. Parametric constraints, tolerancing for real-world fit, designing for the limitations of the machine that&#39;s actually going to make the thing. The stuff that takes years of failed prints and jammed assemblies to internalize. He just described what he wanted and iterated. He didn&#39;t learn CAD. He learned to <em>make things</em>.</p>
<p>So is deep expertise still a moat? I&#39;m genuinely not sure. If anyone can go deep on anything with agent assistance, the thing that differentiates people isn&#39;t what they know — it&#39;s what they choose to do with access to everything.</p>
<p>Stephen Covey wrote <em>The 7 Habits of Highly Effective People</em> in 1989 — one of those books that sounds like airport self-help until you actually read it. He had this line: it doesn&#39;t matter how fast you climb the ladder if it&#39;s leaning against the wrong wall. Maybe the real skill now isn&#39;t climbing — it&#39;s knowing which wall matters. Strategy. Synthesis. The ability to hold the business problem, the technical constraints, and the human dynamics in your head simultaneously and make a call that accounts for all three. Divergent thinking — looking at a problem and seeing an approach nobody proposed. Radical candor — telling your team the architecture is wrong before six months of momentum makes it politically impossible.</p>
<p>These aren&#39;t engineering skills, strictly. They&#39;re judgment skills that happen to be useful in engineering contexts. And they&#39;ve never been taught through repetition or bootcamps or documentation. They come from exposure to complex situations where you had enough trust to make a consequential call and enough honesty to admit when you got it wrong.</p>
<hr>
<h2>The Part Nobody Wants to Say</h2>
<p>Software engineering as a <em>career category</em> might be contracting even as software itself eats more of the world. More software, fewer people writing it. That&#39;s the tension.</p>
<p>The optimistic read: engineers move up the stack. Less typing, more thinking. More architects, fewer coders. More product engineers who understand the <em>why</em>, fewer pure implementers.</p>
<p>The honest read: &quot;move up the stack&quot; assumes the stack has room at the top, and it doesn&#39;t — not for everyone. There are only so many architect roles. Only so many &quot;taste&quot; positions. The pyramid doesn&#39;t invert just because the base shrinks.</p>
<p>I think we&#39;re in the &quot;fast enough to be painful, slow enough to be deniable&quot; zone. The worst zone. Fast enough that people are losing jobs right now. Slow enough that executives can still say &quot;we&#39;re investing in our people&quot; while cutting 40% of them. I&#39;ve sat in those meetings. The language is always optimistic. The spreadsheet isn&#39;t.</p>
<hr>
<h2>The Access Question</h2>
<p>AI abundance feels like <a href="https://bristanback.com/posts/raising-humans-in-ai-world/#the-inheritance-problem">inherited wealth</a>. When everyone inherits capability they didn&#39;t earn, the differentiator isn&#39;t skill. It&#39;s purpose.</p>
<p>But first: access. My five-year-old M1 MacBook Pro got called &quot;vintage&quot; by the Genius Bar last month, and it runs everything I need. A $200 Chromebook can access Claude. The cost of building something went from &quot;hire a team&quot; to &quot;describe what you want.&quot; That&#39;s genuinely transformative for people who have access.</p>
<p>But &quot;cheap&quot; is relative. Twenty dollars a month is nothing to me. It&#39;s a real decision for a lot of people. And the divide isn&#39;t just price — it&#39;s cultural. Do you know this exists? Do you know what to ask for? And the part that makes me uncomfortable: do you have the <em>hours</em> to tinker? The headspace to sit with a problem, fail at it, try again? A single parent working two jobs has the same tools I do. They don&#39;t have the same Tuesday afternoon.</p>
<p>Maybe that&#39;s just an excuse. Kids in developing countries are already using AI tools in ways that surprise everyone. Access to information was never the real bottleneck — maybe it was always access to <em>belief</em> that you could use it.</p>
<p>I don&#39;t know. But I think the question for society probably isn&#39;t &quot;how do we distribute AI tools&quot; — they&#39;re cheaper than ever. It&#39;s &quot;how do we distribute purpose.&quot; That&#39;s much harder. And I&#39;m not sure anyone&#39;s working on it.</p>
<hr>
<h2>The Principles Don&#39;t Change</h2>
<p>There&#39;s a version of this story where everything becomes a race to the bottom. Agents get cheaper, output gets faster, and the only thing that matters is who can ship the most stuff the quickest. I want to push back on that.</p>
<p>This next part is long. It&#39;s been on my mind for a while — the question of what actually keeps systems safe when the people building them are moving faster than ever. Bear with me.</p>
<p>There&#39;s a useful parallel in how AI companies themselves are wrestling with this. When OpenAI built GPT, they trained the model first and added safety guardrails afterward — a layer of reinforcement learning from human feedback (RLHF) where human reviewers would rate outputs and the model would learn to avoid the bad ones. It works, mostly. But the safety is essentially a fence around a field. The model learns what it shouldn&#39;t say, not what it believes.</p>
<p>Anthropic took a different approach with Claude. They developed what they call <a href="https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback">Constitutional AI</a> — instead of relying on human reviewers to flag bad outputs one by one, they wrote a set of principles (a &quot;constitution&quot;) and had the model critique and revise its own outputs against those principles during training. The constitution includes things like &quot;choose the response that is most supportive and encouraging of life, liberty, and personal security&quot; and &quot;choose the response that is least likely to be used for intimidation or coercion.&quot; The model doesn&#39;t learn &quot;don&#39;t say this specific thing&quot; — it learns to reason about whether its output aligns with a set of values.</p>
<p>The difference matters. One approach says: here are the walls, don&#39;t hit them. The other says: here&#39;s who you are, act accordingly. Dario Amodei talks about this as foundational — the constraints aren&#39;t a limitation on the system, they <em>are</em> the system. The principles define what &quot;good&quot; means before you start optimizing for anything else.</p>
<p>This isn&#39;t hypothetical. Last week, it played out in public. <a href="https://www.npr.org/2026/02/27/nx-s1-5729118/trump-anthropic-pentagon-openai-ai-weapons-ban">Anthropic refused to remove two restrictions from its Pentagon contract</a>: no mass surveillance of American citizens, and no fully autonomous weapons systems. Their reasoning was specific — Amodei <a href="https://www.anthropic.com/news/statement-department-of-war">published a letter</a> arguing that current AI models simply aren&#39;t reliable enough for autonomous kill decisions, and that domestic surveillance violates constitutional principles the company won&#39;t compromise on. The Pentagon labeled them a supply chain risk. Trump ordered all federal agencies to stop using Anthropic&#39;s technology. Hours later, <a href="https://www.theguardian.com/technology/2026/feb/28/openai-us-military-anthropic">OpenAI struck a deal</a> to deploy on the Pentagon&#39;s classified networks.</p>
<p>The easy narrative is &quot;Anthropic good, OpenAI bad.&quot; I don&#39;t think it&#39;s that simple. Altman <a href="https://reason.com/2026/02/28/anthropic-labeled-a-supply-chain-risk-banned-from-federal-government-contracts/">said OpenAI shares the same red lines</a> — no autonomous weapons, no mass surveillance. Maybe they got better terms. Maybe the terms are meaningless. Maybe the Pentagon needed <em>someone</em> and OpenAI was willing to be that someone. I don&#39;t know what the contract says and neither does anyone else reporting on it.</p>
<p>What I do know is that Anthropic walked away from a $200 million contract because the terms conflicted with their principles. That&#39;s the constitutional approach taken to its logical conclusion — the principles aren&#39;t just in the model&#39;s training, they&#39;re in the company&#39;s decision-making. &quot;Here&#39;s who we are, act accordingly&quot; applied to a business, not just a neural network. Whether that&#39;s principled or naive probably depends on what the next five years look like. But it&#39;s the clearest real-world example I&#39;ve seen of the difference between advisory values (we believe this) and structural ones (we won&#39;t do this, even when it costs us).</p>
<p>I think that&#39;s the right frame for engineering in this era too — not just for training AI models, but for the organizations deploying them and the people building within them.</p>
<p>I felt this in my own work last month. I had an agent refactor a service — nothing dramatic, just cleaning up some tech debt. The code looked good. The tests passed. I almost shipped it. Then I noticed it had reorganized the error handling in a way that swallowed a specific timeout condition. The kind of thing that looks clean in a diff and wakes you up at 3am when a downstream service hangs. I caught it because I&#39;d <em>been</em> the person on that 3am call, years ago, staring at logs that showed &quot;success&quot; while the system was quietly dying. The agent didn&#39;t know that history. It optimized for the code. I optimized for the scar.</p>
<p>That&#39;s the advisory layer — my judgment, my experience, pattern-matching against things I&#39;ve seen go wrong. It works because I was paying attention. It wouldn&#39;t have caught it if I&#39;d rubber-stamped the PR.</p>
<p>Now consider what happened at AWS in December 2025. <a href="https://www.engadget.com/ai/13-hour-aws-outage-reportedly-caused-by-amazons-own-ai-tools-170930190.html">Amazon&#39;s own AI coding tool Kiro caused a 13-hour outage</a> after it decided to &quot;delete and recreate the environment.&quot; Amazon called it &quot;a user access control issue, not an AI autonomy issue&quot; — the agent had broader permissions than intended. Multiple employees told the Financial Times it was &quot;at least&quot; the second recent AI-caused disruption. Amazon had been pushing 80% weekly Kiro adoption targets internally. The root cause was probably a blend of the agent&#39;s judgment and the permissions it was given — it usually is. But the point isn&#39;t to relitigate one incident. It&#39;s that speed of adoption outpaced the design of constraints around it. The answer is better guardrails, not fewer agents.</p>
<p>In practice, guardrails come in two flavors. The first is <strong>advisory</strong> — system prompts, grounding documents, principles that shape behavior through context and intent. My catching that timeout bug was advisory. Code review culture is advisory. It works because people internalize the norms, but it depends on someone paying attention <em>and having the scars to know what to look for</em>. There&#39;s a world where advisory gets good enough — rich context, chain-of-thought reasoning that identifies failure modes before they happen. With enough grounding, an agent could probably avoid 99% of the catastrophic decisions on its own. But 99% at scale is still a lot of incidents. And &quot;porous&quot; is a strange word to bet a production environment on.</p>
<p>Both RLHF and constitutional AI sit somewhere in between — the safety isn&#39;t just in the prompt, it&#39;s baked into the model&#39;s training weights. The model has internalized the values, not just been told them. That&#39;s meaningfully more robust than a system prompt, but it&#39;s still not a hard guarantee. Trained-in models can still be jailbroken, still make mistakes under edge cases. It&#39;s internalized advisory rather than instructed advisory — a real distinction, but still not structural. (I find the constitutional approach more compelling — teaching values scales better than cataloging violations. But both matter. One sets the principles, the other handles the edge cases the framers didn&#39;t anticipate. Constitution and case law.)</p>
<p>The second is <strong>structural</strong> — hard limits that don&#39;t depend on anyone&#39;s judgment in the moment. I have a pre-commit hook that runs linting and type checks on every commit. It&#39;s caught things I would have missed. It doesn&#39;t care if I&#39;m tired, distracted, or rushing to ship before a meeting. That&#39;s the difference. Permission boundaries. Blast radius controls. Infrastructure-as-code policies that make it physically impossible to delete a production database without a specific approval workflow. Amazon&#39;s IAM <em>is</em> a structural guardrail — it was just scoped too broadly, not tested against the scenario of an autonomous agent deciding to recreate an environment. The guardrail existed. It just had a hole in it.</p>
<p>The best safety systems use both layers — advisory to shape intent, structural to bound consequences. But if you have to pick one, pick the hard limits. Culture drifts. Hooks don&#39;t.</p>
<p>The popular reaction to incidents like this is predictable: &quot;See? This is why you need human engineers!&quot; And yes — but not in the way people mean. Not human-in-the-loop, approving every action. More like humans <em>tending</em> the loop — designing the constraints, evolving them as the system grows, deciding which values the guardrails encode in the first place. An agent could probably design a decent blast radius policy. But deciding <em>which</em> values to encode, <em>what</em> tradeoffs to accept, <em>who</em> you&#39;re building for — that&#39;s still a human call. Not because agents can&#39;t reason about ethics, but because the accountability has to land somewhere with a pulse. The rules of the road are a human responsibility, even if agents help draft them.</p>
<p>And this applies beyond the machine layer. I&#39;ve been in rooms where the engineering team could have built something faster, cheaper, more engagement-optimized — and the right call was to not build it. Or to build it differently. Increasingly, the role of engineering is participating in the product itself — not just building what&#39;s specced, but shaping what gets built and how. The companies that thrive in this era will be the ones that treat their technical staff as partners in product decisions, not just executors. Because when execution is cheap, the hard part isn&#39;t building the thing. It&#39;s deciding whether the thing should exist.</p>
<p>When anyone can build anything, the differentiator isn&#39;t output — it&#39;s what you refuse to ship.</p>
<p>It&#39;s having opinions about accessibility before the feature ships, not after someone files a complaint. It&#39;s caring about data privacy when the expedient thing is to log everything and sort it out later. It&#39;s asking whether the thing you&#39;re building makes someone&#39;s life genuinely better or just extracts their attention more efficiently. These aren&#39;t nice-to-haves. They&#39;re the constraints that make the product worth trusting — applied at the level of what you choose to build, not just how you build it.</p>
<p>I&#39;ve watched teams sprint to build features that nobody should have built. I&#39;ve shipped things I&#39;m not proud of because the deadline mattered more than the principle. Those feel worse now than they did at the time. The speed was never worth the trade.</p>
<p>This isn&#39;t nostalgia for a slower era. It&#39;s the opposite — when the tools let you move faster than your judgment, your principles are the only braking system you have. You don&#39;t give up your values in this transformation. You need them more, not less. The engineers and organizations that hold the line on &quot;we don&#39;t build it that way&quot; will be worth more than the ones who build everything as fast as possible.</p>
<p>I wrote about this in <a href="https://bristanback.com/posts/why-everyone-needs-soul-md/">Why Everyone Should Have a SOUL.md</a> — the idea that documenting your principles isn&#39;t just self-help, it&#39;s infrastructure. It&#39;s knowing what wall your ladder leans against before the wind picks up.</p>
<hr>
<h2>The Apprenticeship Problem</h2>
<p>This is the part that worries me most.</p>
<p>My first real engineering job, I spent three months writing data migration scripts. Nobody&#39;s idea of glamorous work. Move this field to that table. Handle the nulls. Run it against staging, watch it break, figure out why, fix it, run it again. I did this dozens of times, and each time something different went wrong — character encoding, timezone mismatches, a foreign key I didn&#39;t know existed. By the end, I could look at a schema and feel where the landmines were before I stepped on them.</p>
<p>That feeling — the anticipatory flinch — is what I&#39;d call judgment. And I didn&#39;t get it from a book or a lecture. I got it from repetition that was tedious enough to be annoying and consequential enough to be memorable.</p>
<p>I&#39;m not sure where that comes from anymore.</p>
<p>The career ladder wasn&#39;t just hierarchy — it was a compression gradient. Low-risk tasks at the bottom. Higher-stakes ambiguity at the top. You earned your way upward by surviving increasingly consequential decisions. AI compresses the bottom of that ladder. The rungs aren&#39;t just harder to reach — some of them are gone.</p>
<p><a href="https://www.nucamp.co/blog/the-junior-developer-hiring-crisis-in-2026-how-to-get-your-first-full-stack-job">Entry-level postings shrank about 60% between 2022 and 2024</a>. By late 2025, <a href="https://www.nucamp.co/blog/the-junior-developer-hiring-crisis-in-2026-how-to-get-your-first-full-stack-job">76% of employers</a> were hiring the same number or fewer entry-level staff. The <a href="https://spectrum.ieee.org/ai-effect-entry-level-jobs">NACE Job Outlook 2026</a> survey shows employer optimism about graduate hiring at its lowest since 2020. One engineering manager <a href="https://newsletter.pragmaticengineer.com/p/tech-jobs-market-2025-part-3">told Pragmatic Engineer</a>: &quot;We paused junior hiring about 3 years ago.&quot;</p>
<p>Judgment is not abstract reasoning. It&#39;s exposure to constraint. The memory of consequences. The way your stomach drops when you see a migration script that doesn&#39;t handle rollback, because you&#39;ve been the person who had to roll back manually on a Wednesday night while everyone else was asleep. Historically, apprenticeship solved this — electricians learned beside master electricians, journalists rewrote drafts under sharp editors, designers absorbed taste through critique. The friction was the curriculum. Nobody planned it that way. The tedium just happened to be educational.</p>
<p>If AI removes friction at the execution layer, apprenticeship has to migrate somewhere else. Maybe the first rung becomes evaluation instead of implementation. Maybe juniors learn to critique agent output, trace failure modes, define constraints, and decide what <em>not</em> to ship. But that&#39;s learning to evaluate without having done. Learning to recognize mistakes you haven&#39;t personally made. I&#39;m not sure that works. The scar tissue metaphor isn&#39;t just poetic — you literally need to have been burned to flinch at the right moment.</p>
<p>A system that optimizes away beginner work risks optimizing away beginner growth. And organizations that stop hiring juniors eventually starve their own future seniors. Every industry that eliminated apprenticeships eventually faced a skills crisis a generation later. We know this.</p>
<hr>
<h2>What I&#39;d Actually Tell Someone</h2>
<p>If I&#39;m being honest with my friend — the one with real skills and kids and a job search that can&#39;t wait for the market to figure itself out — the advice is different than what I&#39;d tell a new grad. He doesn&#39;t need to retrain. He needs to find the place where what he already knows meets something agents can&#39;t cheaply replicate. Computer vision plus manufacturing. ML plus compliance. The compound skill — technical depth married to a domain where the stakes are personal and the liability is real. That&#39;s not a pivot. That&#39;s leverage.</p>
<p>For someone earlier in their career — someone facing the apprenticeship crisis I just described — the calculus is different:</p>
<p>Go deep on something where the stakes are real and the liability is personal. Distributed systems. Security. Performance at scale. The stuff where getting it wrong costs millions or kills people. Agents will get better at these too, but the liability question buys you time.</p>
<p>Learn to evaluate, not just produce. The skill isn&#39;t writing code — it&#39;s reading what an agent wrote and knowing what&#39;s wrong with it before it hits production. I spend more time reviewing agent output than I ever spent writing code myself. It&#39;s a different muscle. It&#39;s also a more valuable one.</p>
<p>Build things with real users. Not demos. Not tutorials. Not a course project. Something with users who depend on it, that breaks in ways you have to fix on a deadline you didn&#39;t set. The gap between &quot;it works&quot; and &quot;it works for 10,000 people who are angry when it doesn&#39;t&quot; — that&#39;s where humans still live. That&#39;s the new apprenticeship. It&#39;s lonelier than having a team and a mentor. It&#39;s also more available than ever, because the tools to build are nearly free. The friction isn&#39;t gone. It just moved.</p>
<p>Think beyond the code. Strategy, organizational awareness, the ability to synthesize across domains — these compound in a way that pure technical skills don&#39;t. The person who can see the whole board is more valuable than the person who can move any individual piece really fast.</p>
<p>Both groups, honestly: pick something you want to make and don&#39;t stop until it works. The tools will meet you wherever you are. I wrote about this in <a href="https://bristanback.com/notes/speed-of-thought/">Building at the Speed of Thought</a> — when execution is nearly free, iteration replaces deliberation. That&#39;s always been true. AI just made it more obviously true.</p>
<p>Don&#39;t sleep on the physical, either. <a href="https://bristanback.com/notes/rent-a-human/">Rent a Human</a> — a marketplace where AI agents literally hire humans for physical tasks, because software still can&#39;t open a door or shake a hand. The physical world is gated, and that gate isn&#39;t opening anytime soon. But beyond the dystopian framing, there&#39;s something real underneath: small jewelers, specialty manufacturing, craft work — things where the human touch is the product, not the process. When everything digital becomes abundant, scarcity moves to the tangible. It sounds like a retreat. Might be an advance.</p>
<hr>
<h2>What I Don&#39;t Know</h2>
<p>I don&#39;t know if &quot;AI engineer&quot; is a real role or a transitional label. LinkedIn says it&#39;s one of the <a href="https://www.weforum.org/stories/2026/01/ai-has-already-added-1-3-million-new-jobs-according-to-linkedin-data/">fastest-growing titles over the past three years</a>, alongside &quot;Forward-Deployed Engineer&quot; and &quot;Data Annotator&quot; — a list that tells you something about how the market is trying to name what&#39;s happening, and not quite getting there. I&#39;ve watched this happen before.</p>
<p>I graduated right into the Hadoop wave. &quot;Big Data Engineer&quot; was the title that got you hired in 2013, and if your resume didn&#39;t mention MapReduce you were invisible. Hadoop died, but big data didn&#39;t — it diffused into data warehouses, Databricks, lakehouses, data mesh, dbt, distributed query engines. The title disappeared because the work won. It won so thoroughly it stopped being a specialization and became the plumbing.</p>
<p>&quot;NoSQL specialist&quot; was a personality trait for about three years. MongoDB on everything, even where Postgres would&#39;ve been fine. The industry eventually landed on &quot;it depends on your access patterns&quot; — which is what the senior people were saying the whole time.</p>
<p>&quot;Web developer&quot; was a title I held early in my career. I couldn&#39;t tell you what it means now. I do know that frontend is still a deep discipline — but the web is also just where software lives. Almost every engineer is expected to throw together a UI or build a RESTful endpoint. The specialty sharpened and the floor rose at the same time.</p>
<p>&quot;Cloud Architect&quot; carried weight when migrating to the cloud was a bet that could sink a company. Now it&#39;s where things run.</p>
<p>DevOps started as a movement — development and operations working together, not throwing code over the wall. Companies couldn&#39;t figure out how to do that organically, so they turned it into a title: &quot;DevOps Engineer.&quot;</p>
<p>Now the culture is actually landing. Werner Vogels&#39; &quot;you build it, you run it&quot; stuck — developers own the full lifecycle, deploy their own code, page themselves when it breaks. The dedicated title is dissolving because the expectation got absorbed into the engineering role itself. Infrastructure specialists still exist, but they&#39;re less &quot;bridge between two teams&quot; and more platform engineers — building the internal tools and guardrails so everyone else can self-serve. Same pattern as frontend: the specialty sharpened while the floor rose.</p>
<p>The trajectory is always the same: specialty → mainstream → implicit → what was the title for again?</p>
<p>That arc might be the most relevant one for AI. Right now we&#39;re hiring &quot;AI Engineers&quot; because we don&#39;t know how to make it the culture yet. But the specialty will split the same way: on one end, the deep work — building transformers, training models, designing embedding spaces. On the other, something more operational and advisory — coaching teams on multi-agent coordination, prompt engineering, model selection, setting up the guardrails and review patterns so everyone else can use agents effectively. Less &quot;I build the AI&quot; and more &quot;I make sure we&#39;re using it well.&quot; The platform engineer of the agent era.</p>
<p>And then everyone else — using agents as part of their job the way they use Git or AWS today. Not specialists. Just engineers. Fewer of them, <em>probably</em>. But the work doesn&#39;t disappear — it changes shape. More surface area to tend, more products to maintain, more decisions that need a human accountable for the outcome. You can&#39;t vibe-code a company&#39;s production systems forever. Someone has to own what ships.</p>
<p>And there&#39;s a version of this — <a href="https://en.wikipedia.org/wiki/Jevons_paradox">Jevons Paradox</a> — where making software cheaper to produce means we produce <em>more</em> of it, not less. More software, more surface area, more need for people who can tend it. History says efficiency doesn&#39;t reduce demand. It <em>creates</em> it.</p>
<p>What I don&#39;t know is what the titles look like on the other side. The skills I described — taste, judgment, constraint design, knowing what not to build — none of those map cleanly to a job listing. &quot;Experienced enough to flinch at the right moment&quot; doesn&#39;t fit on a resume. The market is going to lag reality here, the way it always does. For a while, the titles will be wrong. They&#39;ll reward the legible thing (AI experience, agent fluency) and undercount the illegible thing (scar tissue, organizational wisdom, the ability to say no). My friend from Amazon will probably land fine. He&#39;s good at what he does, and the market is paying for his keywords. But the gap between what gets him hired and what makes him valuable — that&#39;s the gap this whole essay is about.</p>
<p>I don&#39;t know what my own job looks like in three years, and I&#39;ve been doing this for twelve.</p>
<p>And I keep thinking about the economic shape of all this. <a href="https://finance.yahoo.com/news/top-10-earners-drive-nearly-191500198.html">Moody&#39;s Analytics reported</a> in late 2025 that the top 10% of earners now account for nearly half of all U.S. consumer spending — a historic high. Knowledge workers are disproportionately in that top 10%. Their jobs are exactly the ones most exposed to this shift. The Klarna model — half the people, higher salaries — might be the optimistic version. The pessimistic version is entire layers of well-compensated work disappearing, and the consumer spending that depended on them going with it. The economy is lopsidedly dependent on a group of people whose jobs are being redefined in real time. That&#39;s a tension I don&#39;t see anyone resolving cleanly.</p>
<p>What I do know: things never pan out the way people imagine. The doomsayers and the utopians are both going to be wrong. The reality will be weirder and more uneven than either camp predicts. Some industries will be fine. Some will be devastated. Most will be somewhere in between — changed enough to be disorienting, stable enough to be recognizable.</p>
<p>The amplitude is increasing. The frequency is increasing. The feeling of &quot;new but also more of the same&quot; is exactly right. Every revolution feels like this from inside.</p>
<p>If there&#39;s one thread running through all of it — the apprenticeship, the guardrails, the access question, what&#39;s left — it&#39;s that purpose isn&#39;t a skill you can automate. It&#39;s the thing that makes every other skill worth having.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/software-engineering-2027-hero-v2.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>building</category>
      <category>culture</category>
    </item>
    <item>
      <title>Your Vault, Your Rules: Password Managers, Sovereignty, and Agents</title>
      <link>https://bristanback.com/posts/password-managers-sovereignty-agents/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/password-managers-sovereignty-agents/</guid>
      <pubDate>Fri, 27 Feb 2026 19:00:00 GMT</pubDate>
      <atom:updated>2026-02-27T19:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>On the quiet ethos connecting Buttercup, Enpass, pass, and VeraCrypt — and why it matters more now that AI agents need your credentials too.</description>
      <content:encoded><![CDATA[<p>I used <a href="https://buttercup.pw/">Buttercup</a> for years. Not because it was the best password manager — it wasn&#39;t — but because of what it represented: an encrypted vault file that lived on <em>my</em> hard drive, synced through <em>my</em> cloud storage, and could be moved, backed up, or abandoned on my terms. No account. No subscription. No server between me and my passwords. Just a <code>.bcup</code> file and a master key.</p>
<p>Buttercup is dead now. The project <a href="https://github.com/buttercup/buttercup">shut down</a> and the repos are archived. I&#39;d been seeing the writing on the wall for a while — slow updates, mobile app falling behind, browser extension getting flaky. It was a solo maintainer&#39;s passion project that quietly ran out of steam. The usual open-source story, and I don&#39;t hold it against anyone. But it left me looking for a new home for ~400 credentials, and more importantly, looking for the same <em>feeling</em>.</p>
<p>That feeling has a name, I think. I&#39;d call it <strong>vault sovereignty</strong> — the principle that your secrets should be a file you own, not a row in someone else&#39;s database.</p>
<p>This isn&#39;t a side-by-side comparison. Those exist, and honestly, the best way to choose a password manager is to try a few yourself — they&#39;re all free or cheap enough to test. This is more about the <em>philosophy</em> underneath the choice, and why sovereignty keeps showing up as the thing I can&#39;t stop thinking about.</p>
<hr>
<h2>The Sovereignty Lineage</h2>
<p>Buttercup didn&#39;t invent this idea. It inherited it from a lineage of tools that share the same instinct:</p>
<p><strong><a href="https://www.passwordstore.org/">pass</a></strong> (the Standard Unix Password Manager) is the purest expression. Each password is a GPG-encrypted file in a directory tree. That&#39;s it. <code>~/.password-store/Email/gmail.com.gpg</code>. Version-controlled with Git. Decrypted with your GPG key. The &quot;database&quot; is your filesystem. The &quot;sync&quot; is <code>git push</code>. The &quot;backup&quot; is whatever you do with your home directory. It&#39;s beautiful in its refusal to be anything more than what it is — and if you&#39;re comfortable with GPG and the command line, it&#39;s arguably still the best option.</p>
<p><strong><a href="https://veracrypt.fr/">VeraCrypt</a></strong> (and its predecessor TrueCrypt) applied the same philosophy to disk encryption: your encrypted volume is a file. Mount it, use it, dismount it. No service. No account. The file <em>is</em> the thing. Move it to a USB drive, put it in Dropbox, copy it to a NAS — the encryption travels with the data, not with the vendor.</p>
<p><strong><a href="https://keepass.info/">KeePass</a></strong> and its derivatives (KeePassXC, KeePassDX) — the <code>.kdbx</code> file format became the de facto standard for portable encrypted vaults. Not pretty, but indestructible. The format has outlived multiple GUI clients. That&#39;s sovereignty: when the container survives the tool.</p>
<p>What connects all of these is a shared architectural choice: <strong>the vault is a file, not a service</strong>. Your secrets live in a container you can hold, move, back up, and — critically — walk away from without asking permission.</p>
<p>I should caveat this honestly, because the more I think about it, the less clean the argument is. That vault file? In practice, it usually lives on someone else&#39;s infrastructure anyway. My Enpass vault syncs through iCloud Drive. It could just as easily be Dropbox, Google Drive, OneDrive — all someone else&#39;s servers. So what am I actually gaining over trusting 1Password directly?</p>
<p>I think the real distinction is <strong>separation of trust, not elimination of trust</strong>. With 1Password, you trust one entity with the full vertical stack — the encrypted vault, the decryption software, the key derivation, and the infrastructure. If they fail, it&#39;s all one failure. With the vault-as-file model, you&#39;re splitting the trust: Apple holds an opaque encrypted blob they can&#39;t read, and Enpass provides the software that decrypts it but never sees the file in transit. Neither party alone has the full picture. It&#39;s separation of concerns applied to trust — independent failure modes instead of a single point.</p>
<p>Is that <em>better</em>? Honestly, I&#39;m not sure. 1Password&#39;s security team is almost certainly more sophisticated than my ad-hoc trust layering. Their <a href="https://blog.1password.com/what-the-secret-key-does/">Secret Key architecture</a> means even a server breach doesn&#39;t expose usable vault data — which is more than LastPass could say. The vertical integration lets them ship things like Watchtower and secure remote password protocol that a decoupled architecture simply can&#39;t do. Sometimes trusting one very competent party is safer than trusting two adequate ones.</p>
<p>But there&#39;s a difference between security and <em>sovereignty</em>, and I think that&#39;s what I&#39;m actually reaching for. I touched on this in [[Pervasive AI: What Happens When Your Assistant Never Logs Off]] — people overwhelmingly choosing to run their AI agents on local hardware they can unplug, not cloud instances. Same instinct, different domain. Sovereignty isn&#39;t &quot;my data never touches a cloud.&quot; It&#39;s &quot;I can change my mind.&quot; I can move the file. I can switch the sync layer. I can export and start over. The relationship between me and my password manager doesn&#39;t depend on a subscription remaining active or a company remaining solvent. Maybe that&#39;s not a security argument. Maybe it&#39;s a dignity argument. I&#39;m still working it out.</p>
<p>It&#39;s the same instinct behind <a href="https://obsidian.md/">Obsidian</a> and <a href="https://joplinapp.org/">Joplin</a> for notes. Local-first. Files you own. Sync is your problem, which means sync is your <em>choice</em>. The data format is the contract, not the vendor relationship.</p>
<hr>
<h2>Where I Landed: Enpass</h2>
<p>After Buttercup died, I switched to <a href="https://www.enpass.io/">Enpass</a>. It&#39;s not open source — which matters, and I&#39;ll get to that — but it carries the same ethos:</p>
<ul>
<li><strong>Your vault is a local SQLCipher file.</strong> Enpass never sees your data. There&#39;s no Enpass cloud. You sync through your own iCloud, Dropbox, Google Drive, OneDrive, or a WebDAV server. The vault file moves through infrastructure you already control.</li>
<li><strong>One-time purchase option.</strong> $99.99 for a lifetime license. In a world where everything is $3-5/month forever, the existence of a &quot;pay once, own it&quot; option says something about how a company thinks about its relationship with you.</li>
<li><strong>Cross-platform.</strong> Mac, Windows, Linux, iOS, Android, browser extensions. This is where Buttercup was weakest and where Enpass is genuinely solid.</li>
<li><strong>Passkey support.</strong> Arrived in 2024, works well.</li>
</ul>
<p>The tradeoff is real: closed source means you&#39;re trusting Enpass&#39;s claims about encryption without being able to verify them. They&#39;ve published <a href="https://www.enpass.io/security/">security audits</a> and use SQLCipher (which is open and well-reviewed), but you can&#39;t read the application code. You can&#39;t verify there&#39;s no telemetry, no silent phone-home, no future update that changes the deal. The published audits are vendor-commissioned and time-bound — a snapshot, not a guarantee. Then again, open source isn&#39;t a panacea either — the <a href="https://openssf.org/blog/2024/03/30/xz-backdoor-cve-2024-3094/">xz backdoor</a> proved that. A patient attacker spent years earning trust as a maintainer of a critical compression library, then slipped a backdoor that almost shipped in every major Linux distro. &quot;You can audit it&quot; assumes someone actually <em>does</em>, and for most open-source projects, that someone is a burnt-out solo maintainer. (Sound familiar? That&#39;s how Buttercup died too — different failure mode, same structural vulnerability.) It&#39;s a calculated risk either way: I&#39;m trading auditability for the combination of local-file architecture and cross-platform polish that no fully open-source option has nailed yet. If I stop trusting Enpass tomorrow, I can export my data and leave. The lock-in is minimal. That&#39;s the sovereignty test: not &quot;is it open source?&quot; but &quot;can I leave?&quot;</p>
<p>Enpass doesn&#39;t actually have an official CLI — it&#39;s been a <a href="https://discussion.enpass.io/index.php?/topic/14617-command-line-interface-cli/">feature request</a> for years. But a community member built <a href="https://github.com/hazcod/enpass-cli">enpass-cli</a>, a Go binary that reads your vault file directly. It does what you&#39;d expect: <code>enp list twitter</code>, <code>enp copy reddit.com</code>, <code>password=$(enp pass github.com)</code> for scripting. JSON output, non-interactive mode, even a PIN-based quick unlock. It&#39;s not a first-class developer tool like 1Password&#39;s <code>op</code>, but the fact that someone <em>could</em> build it — because the vault is just a SQLCipher file on disk — is kind of the point. The architecture enables third-party tooling even when the vendor doesn&#39;t provide it.</p>
<hr>
<h2>The Elephant: 1Password</h2>
<p>1Password is the best password manager. I should just say that clearly, because it is. The UX is polished, the security model is strong (the Secret Key alongside your master password is genuinely clever), the browser extension works beautifully, the family sharing is well-designed, and the developer tooling is in a league of its own.</p>
<p>And yet.</p>
<p>It&#39;s $36/year for an individual — well, it <em>was</em>. <a href="https://www.theverge.com/tech/883837/1password-price-increase">1Password just announced a 33% price hike</a> effective March 27, 2026: $47.88/year individual, $71.88/year family. <a href="https://www.fastcompany.com/91483458/bitwarden-price-increase">Bitwarden raised prices recently too</a>. The trend is clear. That&#39;s still not expensive by any reasonable standard. But there&#39;s something about paying a subscription for a password manager that creates a low-grade, persistent discomfort — the same feeling as subscribing to a notes app or a to-do list. It&#39;s not that the price is wrong. It&#39;s that the <em>category</em> feels wrong for rent-seeking. A password vault is a box with a lock. I don&#39;t want to rent a box. I want to buy one and put it on a shelf.</p>
<p>This is probably irrational. 1Password employs a security team. They run infrastructure. They ship updates. The subscription funds real, ongoing work that protects real people. I know this. The feeling persists anyway.</p>
<p>Where 1Password genuinely earns its premium — and where the sovereignty tools can&#39;t compete — is the developer and agent story. The <a href="https://developer.1password.com/docs/cli/"><code>op</code> CLI</a> is exceptional:</p>
<pre><code class="language-bash"># Read a single secret
op read &quot;op://Personal/GitHub/token&quot;

# Inject secrets into a process without exposing them in env
op run --env-file=.env.tpl -- npm start

# Use in MCP server configs without hardcoding tokens
op run -- node mcp-server.js
</code></pre>
<p>That <code>op run</code> pattern is the important one. It injects secrets into a child process&#39;s environment <em>at runtime</em>, scoped to that process, without the secrets ever touching your shell history, your <code>.env</code> files, or your global environment variables. When the process exits, the secrets evaporate. 1Password has leaned into this hard — they&#39;ve published <a href="https://1password.com/blog/securing-mcp-servers-with-1password-stop-credential-exposure-in-your-agent">guides specifically for securing MCP server configurations</a> and <a href="https://developer.1password.com/docs/sdks/ai-agent/">integrating with AI agents via their SDK</a>. Their pitch: credentials should be injected on behalf of agents, never <em>seen</em> by the agent or the LLM.</p>
<p>This is the right architecture. And right now, only 1Password has it in a polished, production-ready form.</p>
<hr>
<h2>The Agent Credential Problem</h2>
<p>This is where password managers intersect with the [[Pervasive AI: What Happens When Your Assistant Never Logs Off|always-on agent]] world, and it&#39;s messier than anyone&#39;s admitting.</p>
<p>When you run an AI agent like OpenClaw, it needs credentials. API keys for Anthropic. OAuth tokens for Gmail. SSH keys for servers. And the default pattern is horrifying: dump everything into environment variables, a <code>.env</code> file, or worse, directly into a system prompt. Every credential is one prompt injection away from exfiltration. Every API key in an env var is visible to every process on the machine.</p>
<p>The responsible approach has layers:</p>
<ol>
<li><strong>Never export secrets globally.</strong> Don&#39;t <code>export ANTHROPIC_API_KEY=sk-...</code> in your shell profile. Every process on your machine can read it.</li>
<li><strong>Use process-scoped injection.</strong> <code>op run</code> (1Password), <code>passage</code> with <code>pass</code>, or similar — secrets exist only in the child process&#39;s environment.</li>
<li><strong>Prefer short-lived tokens.</strong> OAuth refresh flows over long-lived API keys. Rotate aggressively.</li>
<li><strong>Scope narrowly.</strong> An agent that checks your email doesn&#39;t need your SSH keys. An agent that deploys code doesn&#39;t need your email credentials.</li>
<li><strong>Audit access.</strong> Know which secrets an agent touched and when.</li>
</ol>
<p><code>pass</code> is surprisingly good for this — arguably better than any GUI password manager for the agent use case. Because each secret is a file, you can:</p>
<pre><code class="language-bash"># Read a single secret into a variable, scoped to this command
GITHUB_TOKEN=$(pass show tokens/github) gh pr list

# Or use pass with a wrapper script for Claude Code
ANTHROPIC_API_KEY=$(pass show api/anthropic) claude --dangerously-skip-permissions &quot;task&quot;
</code></pre>
<p>The GPG agent caches your passphrase, so you authenticate once and subsequent reads are transparent. It&#39;s not pretty, but it&#39;s <em>correct</em> — each secret is individually encrypted, individually accessible, and never written to disk in plaintext. And because it&#39;s just files and GPG, it works with any tool that can read an environment variable. No SDK. No vendor integration. Just Unix.</p>
<p>There&#39;s a whole parallel universe of infrastructure secrets managers — HashiCorp Vault, Google Secret Manager, AWS Secrets Manager, Doppler — designed for the same problem but at the service level: database credentials, TLS certs, service-to-service API keys. The principal isn&#39;t a human; it&#39;s a workload with an IAM role. These are fundamentally <em>readable</em> stores — you authenticate, you get the secret back as plaintext. They protect secrets at rest and gate access via IAM, but the secret itself is a string that ends up in your process&#39;s memory. Same model as a password manager, just for machines instead of humans.</p>
<p>HSMs (hardware security modules) are a different beast entirely. They&#39;re not storing secrets you read back — they hold <em>keys that act on your behalf</em>. An HSM is essentially a tamper-resistant mini-computer with its own processor, memory, and OS. You send it an instruction (&quot;sign this transaction with key X&quot;), it executes internally, and sends back the result. The private key never comes out — you authenticate through some other channel, and the HSM does the cryptographic work for you. If someone physically tampers with the device, it zeroes itself. Cloud KMS services (Google Cloud KMS, AWS KMS) can optionally be backed by HSMs, though it&#39;s opt-in and costs more — the default is software-based key storage.</p>
<p>The sovereignty tension is different for each. Secrets managers are about access control: <em>who gets to read this?</em> HSMs are about physical containment: <em>this key cannot leave, period.</em> That&#39;s what makes them secure — and what makes them terrifying. I worked with HSMs for cloud-deployed crypto payment automation, and the feeling was both at once: reassuring because the key was <em>actually</em> safe inside tamper-resistant hardware, and unsettling because the HSM was opaque. You can import a key you already hold (and keep your copy), but the purist approach is generating the key <em>inside</em> the hardware — it never exists in software, ever. For those keys, if something happened to the HSM, the key was gone. Not &quot;reset your password&quot; gone. <em>Gone</em> gone. And even for imported keys, the opacity is real — you can&#39;t inspect the HSM&#39;s state, can&#39;t verify the key is still there, can&#39;t peek inside. You just trust the black box.</p>
<p>An HSM-backed cloud KMS is arguably more secure than your local SQLCipher file, but it&#39;s the opposite of sovereignty — your key cannot leave their hardware, and that&#39;s both the security guarantee and the lock-in, simultaneously.</p>
<p>Crypto wallets live in the same tension. &quot;Not your keys, not your crypto&quot; is the sovereignty thesis applied to money. When Ledger added <a href="https://www.ledger.com/recover">Recover</a> in 2023 — opt-in cloud backup of your seed phrase — the hardware wallet community revolted, because the mere <em>capability</em> of extracting the key from the secure element broke the trust model, even if you never used it. Crypto&#39;s answer to the HSM recovery problem is <a href="https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki">HD wallets</a> (BIP-32) — a master seed phrase that can re-derive every child key. The seed is your sovereignty layer; the HSM is your operational security layer. But the core tension remains: the more secure the containment, the higher the stakes of losing access.</p>
<p>Agents sit awkwardly between these worlds. Not quite a human (no biometrics, no interactive auth). Not quite infrastructure (conversational, ad-hoc, not deployed via Terraform). But needing credentials like both.</p>
<p>The gap in the market is obvious: <strong>nobody has built a good, sovereignty-respecting credential broker for AI agents.</strong> 1Password is closest with <code>op run</code> and their SDK, but it requires their subscription and their infrastructure. <code>pass</code> is correct but requires GPG comfort. Enpass&#39;s community CLI exists but isn&#39;t widely adopted. Bitwarden has a CLI but the UX story is rough.</p>
<p>What I&#39;d want in the near term: something with <code>pass</code>&#39;s file-based architecture, Enpass&#39;s cross-platform GUI for daily use, and 1Password&#39;s <code>op run</code> semantics for agent credential injection. Maybe that&#39;s a <code>pass</code> frontend. Maybe it&#39;s an Enpass plugin. Maybe someone builds it from scratch. The pieces are all there.</p>
<p>Longer term, though, the answer probably isn&#39;t better password managers for agents — it&#39;s moving past passwords entirely. The patterns already exist elsewhere. Blockchain solved delegated agent authority with <a href="https://eips.ethereum.org/EIPS/eip-4337">account abstraction</a> — session keys with spending limits that expire, so an agent can transact without ever holding the master private key. Cloud infrastructure solved it with <a href="https://cloud.google.com/iam/docs/workload-identity-federation">workload identity federation</a> — no stored credentials at all, just short-lived tokens exchanged on the fly via OIDC. Both point at the same principle: scoped, short-lived, delegated authority instead of &quot;here&#39;s the password, good luck.&quot; The future isn&#39;t giving agents better access to your secrets. It&#39;s a world where agents authenticate through delegated identity and never see a credential at all. We&#39;re just not there yet for most services.</p>
<hr>
<h2>The Landscape, Honestly</h2>
<p>Here&#39;s where things actually stand, as someone who&#39;s used most of these:</p>
<p><strong>The top tier:</strong></p>
<ul>
<li><strong>1Password</strong> — best overall, best developer story, best agent integration. Subscription feels wrong for the category (and just got 33% more expensive) but the product earns it. The security model (Secret Key + master password) is uniquely strong.</li>
<li><strong>Apple Passwords</strong> — genuinely good now. The standalone app (iOS 18 / macOS Sequoia) turned iCloud Keychain from an invisible background service into a real password manager. Passkey support, shared groups, Windows app via iCloud. For anyone fully in the Apple ecosystem who doesn&#39;t need CLI access or cross-platform beyond Windows, this is honestly <em>enough</em>. Free. Just there.</li>
</ul>
<p><strong>The sovereignty tier:</strong></p>
<ul>
<li><strong>Enpass</strong> — my current pick. Vault-is-a-file, sync-through-your-own-cloud, one-time purchase. Not open source, but the architecture means low lock-in. Solid cross-platform. CLI exists but is basic.</li>
<li><strong>Bitwarden</strong> — spiritually aligned (open source, self-hostable, generous free tier). But every time I&#39;ve used it, the UX has that slightly-off feeling — slow autofill, clunky browser extension, desktop app that feels like an afterthought. It&#39;s getting better. It&#39;s not there yet. The community swears by it. I&#39;ve bounced off it twice.</li>
<li><strong>KeePassXC</strong> — the indestructible vault. <code>.kdbx</code> is the cockroach of password formats (complimentary). If you want maximum portability and don&#39;t mind a utilitarian UI, this is the way. Browser integration has improved significantly.</li>
<li><strong>Proton Pass</strong> — the new entrant getting serious buzz. $199 lifetime option (via Proton Unlimited). Part of the Proton ecosystem (Mail, VPN, Drive), which appeals to the privacy-maximalist crowd. E2E encrypted, open source. Growing fast on Reddit recommendations. Haven&#39;t used it long enough to have strong opinions, but the trajectory is interesting.</li>
</ul>
<p><strong>The declining:</strong></p>
<ul>
<li><strong>LastPass</strong> — the 2022 breach was catastrophic, and the fallout is still ongoing. Feds <a href="https://krebsonsecurity.com/2025/03/feds-link-150m-cyberheist-to-2022-lastpass-hacks/">linked $150M+ in crypto theft</a> to the stolen vault data. Market share dropped from 21% (2021) to 11% (2024) and is presumably still falling. Hard to recommend with a straight face.</li>
<li><strong>Dashlane</strong> — hasn&#39;t had a defining moment in years. Fine product, nothing compelling enough to choose it over the others. The VPN bundling feels like a company searching for differentiation.</li>
</ul>
<p><strong>The gone:</strong></p>
<ul>
<li><strong>Buttercup</strong> — archived. RIP to a good ethos with insufficient resources. The <code>.bcup</code> format didn&#39;t outlive the project, which is the sovereignty failure mode: your file format needs to survive your tool.</li>
</ul>
<p>Just this month, researchers at <a href="https://ethz.ch/en/news-and-events/eth-news/news/2026/02/password-managers-less-secure-than-promised.html">ETH Zurich published a study</a> that put a dent in the &quot;zero-knowledge&quot; promises of cloud-based managers. They demonstrated 12 attacks on Bitwarden, 7 on LastPass, and 6 on Dashlane — including full vault compromise under a malicious-server threat model. Even 1Password wasn&#39;t immune to all classes of attack. These are advanced-adversary scenarios, not mass-exploitation vectors, but they undermine the core confidence that &quot;even if the server is compromised, your data is safe.&quot; The question isn&#39;t whether vulnerabilities exist (they will). It&#39;s how quickly they&#39;re found, disclosed, and fixed — and whether your architecture limits the blast radius when they are. It&#39;s also, quietly, an argument for the sovereignty model: if there&#39;s no server to compromise, the malicious-server threat model doesn&#39;t apply.</p>
<hr>
<h2>The Ethos</h2>
<p>What I&#39;m really circling around isn&#39;t a product recommendation. It&#39;s an architectural preference — maybe a philosophical one.</p>
<p>The tools I gravitate toward share something: they treat your data as <em>yours</em>. Not hosted. Not synced through the vendor&#39;s servers. Not contingent on a subscription remaining active. A file. Encrypted. On your machine. Yours to move, yours to back up, yours to lose if you&#39;re careless.</p>
<p>This is the same instinct that makes people run OpenClaw on a Mac Mini instead of a cloud VPS. The same instinct behind Obsidian over Notion. Local-first over cloud-first. Ownership over convenience. The [[Pervasive AI: What Happens When Your Assistant Never Logs Off|sovereignty section]] of my agent post touched on this — people want their AI agent <em>close</em>, on a machine they can unplug. Password vaults have been in this territory for a decade longer. The pattern is the same: the more personal the data, the stronger the pull toward physical control.</p>
<p>It&#39;s not always the practical choice. 1Password&#39;s cloud sync is seamless in a way that &quot;sync your own vault file through iCloud Drive&quot; just isn&#39;t. Apple Passwords works without thinking about it at all. Convenience is a feature. I&#39;m not pretending otherwise.</p>
<p>But there&#39;s something about a world where every service wants a subscription, every tool wants a cloud account, every piece of software wants to be the intermediary between you and your data — there&#39;s something about a <code>.kdbx</code> file on a USB drive that feels like a small act of resistance. Maybe not a smart one. But an honest one.</p>
<p>The next question — the one I don&#39;t have a good answer to yet — is how to extend that sovereignty to the agent world. When my AI assistant needs my GitHub token to push a commit, I want it to have exactly that credential, for exactly that task, for exactly as long as it needs it, with a clear audit trail. Not a global env var. Not a plaintext config file. Not &quot;just put it in the system prompt.&quot;</p>
<p>1Password is building this. The open-source world hasn&#39;t caught up yet. But the pieces — <code>pass</code>, GPG, process-scoped injection, credential brokers — are all there, waiting for someone to assemble them into something that feels as natural as <code>op run</code> but doesn&#39;t require a subscription to use.</p>
<p>I&#39;ll be watching for it. In the meantime, my Enpass vault sits on my iCloud Drive — a SQLCipher file, encrypted, portable, mine. It&#39;s not perfect. But it&#39;s <em>here</em>, and that counts for something.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/password-managers-sovereignty-agents-hero-v2.webp" medium="image" type="image/webp" />
      <category>tools</category>
      <category>ai</category>
      <category>systems</category>
    </item>
    <item>
      <title>Toward AI-Native Analytics for Personal Publishing</title>
      <link>https://bristanback.com/notes/ai-native-analytics-rfc/</link>
      <guid isPermaLink="true">https://bristanback.com/notes/ai-native-analytics-rfc/</guid>
      <pubDate>Thu, 26 Feb 2026 18:00:00 GMT</pubDate>
      <atom:updated>2026-02-27T18:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>An RFC for a small, legible analytics system — privacy-first, edge-native, AI-queryable — built on Cloudflare Workers.</description>
      <content:encoded><![CDATA[<p>I&#39;ve used Google Analytics since it was called Urchin — the web analytics software Google acquired in 2005. (I&#39;m not sure how many people realize that UTM stands for Urchin Tracking Module.) Before the rebrand, before every marketing team on earth had a <code>gtag.js</code> snippet in their <code>&lt;head&gt;</code>. So when I finally ripped it out of this site — gone in one commit — I expected to feel something. Loss, maybe. Nostalgia for the old real-time dashboard with the live world map.</p>
<p>I felt relief.</p>
<p>GA4 had been dragging my Lighthouse score into the mid-80s. I even tried offloading it to a service worker with Partytown — limited improvement. The analytics tool was the biggest performance problem on the site. There&#39;s something deeply ironic about your <em>measurement system</em> being the thing that degrades the experience you&#39;re trying to measure.</p>
<p>But it wasn&#39;t just the performance tax. It was the realization that I&#39;d been collecting data I never looked at, for an audience model that doesn&#39;t apply to a personal blog, using an interface redesigned for enterprise marketing teams who need attribution funnels and audience segments. I don&#39;t have funnels. I have essays.</p>
<p>I want a small, legible system that tells me what landed, what didn&#39;t, and what changed — without turning my site into spyware or slowing it down. This is a draft spec for that system.</p>
<hr>
<h2>Problem framing</h2>
<p>Most analytics tools optimize for one of two extremes:</p>
<ol>
<li><strong>Marketing surveillance suites</strong> (powerful, heavy, identity-centric)</li>
<li><strong>Minimal privacy counters</strong> (lightweight, sometimes too shallow)</li>
</ol>
<p>A personal blog needs a third thing:</p>
<ul>
<li>privacy-first,</li>
<li>edge-native,</li>
<li>cheap enough to run free,</li>
<li>and queryable in natural language <em>without</em> outsourcing judgment.</li>
</ul>
<p>The goal is not more dashboards. The goal is better questions.</p>
<hr>
<h2>First principles</h2>
<ol>
<li><strong>Signal over exhaust.</strong> What posts are actually read? Which referrers produce meaningful attention?</li>
<li><strong>Trend shape over user identity.</strong> Momentum, decay curves, evergreen vs spike.</li>
<li><strong>No trust tax.</strong> No fingerprinting, no cross-site identifiers, no covert profiling.</li>
<li><strong>No performance tax.</strong> No Lighthouse regressions, no render-path blockers. The irony of GA failing this one still stings.</li>
<li><strong>No ops tax.</strong> Runs on Cloudflare&#39;s free tier for low/moderate traffic. Deploys via GitHub Actions on push. No servers to maintain, no containers to manage, no uptime to monitor.</li>
<li><strong>AI as interface, not authority.</strong> Conversational query layer over deterministic metrics.</li>
</ol>
<hr>
<h2>Prior art (and what to steal)</h2>
<p>I looked at everything. The privacy-native tools — <strong>Plausible</strong>, <strong>Umami</strong>, <strong>GoatCounter</strong>, <strong>Ackee</strong> — get the lightweight collection right: small scripts, no cookies, respect for the reader. Plausible in particular would be the obvious choice if I didn&#39;t want to build my own. But I do. <strong>Matomo</strong> and <strong>PostHog</strong> are impressive and wildly overkill for a solo blog.</p>
<p>The interesting steals are from adjacent systems:</p>
<ul>
<li><strong>OpenTelemetry</strong>: not the SDK (enormous) or the collector (a whole deployment), but the naming discipline. <code>http.request.method</code>, <code>url.path</code>, <code>user_agent.original</code> — standardized field names that make data portable without tribal knowledge. If the schema looks like OTel, any tool can query it later.</li>
<li><strong>Axiom.co</strong>: the ingest-first philosophy. Raw-ish event logs in a columnar format you can query ad-hoc. This maps directly to the R2 monthly exports — NDJSON that DuckDB, BigQuery, or Axiom itself can ingest. If this system outgrows D1, Axiom is the natural escape hatch.</li>
<li><strong>Sentry</strong>: not the 180KB SDK, but the pattern. <code>window.onerror</code> + <code>onunhandledrejection</code> as a ~15-line error signal. Errors are analytics too — &quot;this page throws a TypeError on Safari mobile&quot; is useful data.</li>
</ul>
<p>The synthesis: steal from the lightweight tools for collection posture, from the heavy tools for naming conventions and philosophy. Skip everyone&#39;s infrastructure.</p>
<hr>
<h2>Architecture judgment</h2>
<p>The system should reflect how I build things, not just what it does.</p>
<ul>
<li><strong>TypeScript, Bun.</strong> Type safety everywhere. If a function can return the wrong shape, it will — at 3am, in production.</li>
<li><strong>Pure functions and composition over classes.</strong> <code>classifyReferrer(host) → Source</code> is a function. <code>ReferrerClassifier</code> is a class that exists so someone can write a constructor. Prefer the function.</li>
<li><strong>Never write directly to the database from the hot path.</strong> Every event goes through the DO aggregator first. The DO is the write buffer. D1 only sees batched, coalesced flushes — never raw request-time writes. If the flush fails, the DO retries. If D1 is contended, the DO backs off. The database is never the thing that breaks under load.</li>
<li><strong>Backpressure is not optional.</strong> If ingestion outpaces aggregation, the DO queues. If aggregation outpaces D1, the DO backs off exponentially. If everything&#39;s on fire, governor mode kicks in and sheds load gracefully. No silent data loss. No &quot;it seemed fine in dev.&quot;</li>
<li><strong>Modularity through boundaries, not abstractions.</strong> Ingestion, aggregation, storage, query, and AI are separate Workers/DOs with typed interfaces between them. Not a monolith with &quot;clean architecture&quot; folders. Actual deployment boundaries.</li>
<li><strong>Idempotent everything.</strong> Every flush has a batch ID. Every write is safe to retry. The system should be correct after a crash, not just during normal operation.</li>
<li><strong>Append-only where it matters, mutable where it serves.</strong> Raw events are append-only at the ingestion boundary — never mutate what the client sent. Aggregates are mutable by design (upserts that coalesce counters). R2 exports are immutable — write-once NDJSON, never overwritten. Three different postures for three different concerns: integrity at the edge, efficiency in the middle, portability at the exit.</li>
</ul>
<hr>
<h2>Assumptions</h2>
<ol>
<li>Cloudflare Workers + D1 + Durable Objects are available</li>
<li>JavaScript may be blocked — baseline still works server-side for pageviews where possible</li>
<li>Coarse-grained analytics (daily/hourly aggregates) are enough</li>
<li>No individual user/session re-identification. Period</li>
<li>Reading quality and trend direction matter more than ad attribution precision</li>
<li>Script budget: ≤1.5KB gzipped (aspirational), hard cap ≤2KB</li>
</ol>
<hr>
<h2>Requirements</h2>
<p>Because this is a design exercise as much as a product spec, I want to be explicit about what this system <em>must</em> do versus how it <em>must feel doing it</em>. These aren&#39;t aspirational — they&#39;re the acceptance criteria I&#39;ll hold myself to.</p>
<h3>Functional requirements</h3>
<p>These are the capabilities. What the system does when everything&#39;s working.</p>
<table>
<thead>
<tr>
<th>ID</th>
<th>Requirement</th>
<th>Acceptance</th>
</tr>
</thead>
<tbody><tr>
<td><strong>FR-1</strong></td>
<td><strong>Pageview collection</strong> — capture every non-bot page load with path, referrer, and timestamp</td>
<td>Ingestion Worker returns 204; fact table increments within flush interval</td>
</tr>
<tr>
<td><strong>FR-2</strong></td>
<td><strong>Engagement scoring</strong> — composite signal from scroll depth, dwell time, and outbound clicks</td>
<td><code>engaged</code> event fires once per pageview when any threshold is met</td>
</tr>
<tr>
<td><strong>FR-3</strong></td>
<td><strong>Scroll milestone tracking</strong> — quartile depth markers (25/50/75/100%) per article</td>
<td>Four <code>IntersectionObserver</code> instances; each fires once and disconnects</td>
</tr>
<tr>
<td><strong>FR-4</strong></td>
<td><strong>Outbound click tracking</strong> — capture external link clicks from article body</td>
<td><code>outbound</code> event with <code>href</code> and anchor text in <code>meta</code></td>
</tr>
<tr>
<td><strong>FR-5</strong></td>
<td><strong>Referrer classification</strong> — normalize raw referrers into a bounded enum (≤15 sources)</td>
<td><code>classifyReferrer()</code> maps hostnames; unknown → <code>&quot;other&quot;</code></td>
</tr>
<tr>
<td><strong>FR-6</strong></td>
<td><strong>Traffic classification</strong> — categorize requests as human/bot/AI crawler</td>
<td>5-class system using <code>cf.botManagement.score</code> + UA pattern matching</td>
</tr>
<tr>
<td><strong>FR-7</strong></td>
<td><strong>Core Web Vitals</strong> — LCP, INP, CLS at p75 per path per device class</td>
<td><code>web-vitals</code> library → custom events → DDSketch aggregation in DO</td>
</tr>
<tr>
<td><strong>FR-8</strong></td>
<td><strong>JS error capture</strong> — <code>window.onerror</code> and <code>onunhandledrejection</code> as lightweight error signal</td>
<td>Error count per path per day; filename-only source (no full URLs)</td>
</tr>
<tr>
<td><strong>FR-9</strong></td>
<td><strong>Ranked queries</strong> — top posts, top referrers, period comparison, timeseries</td>
<td>Typed API endpoints returning deterministic results</td>
</tr>
<tr>
<td><strong>FR-10</strong></td>
<td><strong>AI query layer</strong> — natural language questions answered with cited evidence</td>
<td><code>POST /api/ask</code> → planner → ≤8 tool calls → evidence-backed answer</td>
</tr>
<tr>
<td><strong>FR-11</strong></td>
<td><strong>MCP tools</strong> — agent-queryable analytics surface</td>
<td>6 typed tools matching the query API</td>
</tr>
<tr>
<td><strong>FR-12</strong></td>
<td><strong>Snapshot page</strong> — glanceable <code>/analytics</code> dashboard (auth-gated)</td>
<td>4 cards: 7-day pageviews, top posts, top referrers, week-over-week delta</td>
</tr>
<tr>
<td><strong>FR-13</strong></td>
<td><strong>Data export</strong> — monthly NDJSON snapshots to R2</td>
<td>OTel-aligned field names; importable by DuckDB/BigQuery/Axiom</td>
</tr>
</tbody></table>
<h3>Nonfunctional requirements</h3>
<p>These are the constraints. How the system behaves under pressure, at the edges, and over time — the stuff that determines whether you actually trust it.</p>
<table>
<thead>
<tr>
<th>ID</th>
<th>Requirement</th>
<th>Target</th>
<th>Enforcement</th>
</tr>
</thead>
<tbody><tr>
<td><strong>NFR-1</strong></td>
<td><strong>Client weight</strong></td>
<td>≤2KB gzipped (hard cap)</td>
<td>CI size gate fails the build</td>
</tr>
<tr>
<td><strong>NFR-2</strong></td>
<td><strong>Lighthouse impact</strong></td>
<td>Zero measurable regression in LCP/INP/CLS p75</td>
<td>No synchronous XHR, no DOM writes, no layout-affecting elements</td>
</tr>
<tr>
<td><strong>NFR-3</strong></td>
<td><strong>Privacy</strong></td>
<td>No cookies, no localStorage IDs, no fingerprinting, no raw IP storage</td>
<td>Enforced at ingestion — geo derived server-side, then discarded</td>
</tr>
<tr>
<td><strong>NFR-4</strong></td>
<td><strong>Write integrity</strong></td>
<td>Every flush is idempotent and transactional</td>
<td><code>flush_batch</code> table; duplicate <code>batch_id</code> → skip</td>
</tr>
<tr>
<td><strong>NFR-5</strong></td>
<td><strong>Backpressure</strong></td>
<td>Graceful degradation under load (governor mode)</td>
<td>4-step ladder: disable custom events → drop breakdowns → sample → daily-only</td>
</tr>
<tr>
<td><strong>NFR-6</strong></td>
<td><strong>Cardinality bounds</strong></td>
<td>No unbounded dimension growth</td>
<td>Hard limits: 10K paths, 1K referrers, 10 meta keys per event</td>
</tr>
<tr>
<td><strong>NFR-7</strong></td>
<td><strong>Storage cost</strong></td>
<td>Proportional to content count, not traffic volume</td>
<td>rrdtool-inspired rollup: hourly → daily → core; Cron-enforced retention</td>
</tr>
<tr>
<td><strong>NFR-8</strong></td>
<td><strong>Free-tier operation</strong></td>
<td>Runs entirely on Cloudflare&#39;s free plan at low/moderate traffic</td>
<td>Aggregate-first writes, bounded dimensions, query caching</td>
</tr>
<tr>
<td><strong>NFR-9</strong></td>
<td><strong>Validation</strong></td>
<td>Every boundary has a runtime schema</td>
<td>Zod at ingestion, DO, and query layer; types derived from validators</td>
</tr>
<tr>
<td><strong>NFR-10</strong></td>
<td><strong>Portability</strong></td>
<td>Data is always exportable in a standard format</td>
<td>Monthly R2 NDJSON with OTel-aligned naming</td>
</tr>
<tr>
<td><strong>NFR-11</strong></td>
<td><strong>Ops overhead</strong></td>
<td>Zero servers to maintain</td>
<td>Deploys via <code>wrangler</code> in CI; no containers, no uptime monitoring</td>
</tr>
<tr>
<td><strong>NFR-12</strong></td>
<td><strong>Recovery</strong></td>
<td>Correct after crash, not just during normal operation</td>
<td>Idempotent flushes, forward-only migrations, append-only raw events</td>
</tr>
</tbody></table>
<p>A few things to notice about this split: the FRs are roughly ordered by collection → query → export (data flows downhill). The NFRs are roughly ordered by what your <em>reader</em> cares about (weight, speed, privacy) → what your <em>system</em> cares about (integrity, pressure, bounds) → what <em>future you</em> cares about (portability, ops, recovery). That ordering is intentional — it mirrors who screams first when something breaks.</p>
<h2>Non-goals (v1)</h2>
<ul>
<li>Session replay, heatmaps</li>
<li>Fingerprinting</li>
<li>Cross-device identity stitching</li>
<li>Real-time per-user journey visualization</li>
<li>Freeform SQL generated by the LLM</li>
<li>Return visitor tracking — can&#39;t do it honestly without cookies or fingerprinting, and the proxy signals (growing direct traffic, RSS subscribers, stable engagement rates across posts) already tell you whether you have an audience. A post with 65% completion doesn&#39;t care whether the reader was new or returning</li>
</ul>
<hr>
<h2>RFC: AI-Native Analytics for Blogs (v0.1)</h2>
<h3>1) High-level architecture</h3>
<pre><code class="language-mermaid">graph TD
  subgraph Client
    A[&quot;a.js (≤2KB)&quot;]
  end

  subgraph Ingestion
    B[Worker: /a/collect]
  end

  subgraph Aggregation
    C[Durable Object] --&gt; D[(D1)]
    D -.-&gt;|monthly snapshot| E[(R2 exports)]
  end

  subgraph Query
    F[Worker: /api/metrics]
    G[MCP Tools]
    H[&quot;analytics snapshot&quot;]
  end

  A --&gt;|sendBeacon| B
  B --&gt;|validate + normalize| C
  D --&gt; F &amp; G &amp; H
</code></pre>
<p><strong>Ingestion plane</strong></p>
<ul>
<li>Worker endpoint: <code>POST /a/collect</code></li>
<li>Accepts compact events (<code>pv</code>, <code>engaged</code>, <code>outbound</code>, optional custom)</li>
<li>Validates payload via Zod, normalizes dimensions, forwards to per-site aggregator DO</li>
</ul>
<p><strong>Aggregation plane</strong></p>
<ul>
<li>Durable Object per <code>site_id</code></li>
<li>Maintains short-lived counters in memory</li>
<li>Flushes batched aggregates to D1 (interval and size thresholds)</li>
<li>rrdtool-inspired: coalesce first, persist later</li>
</ul>
<p><strong>Storage plane</strong></p>
<ul>
<li>D1 stores aggregate facts + compact dimensions</li>
<li>Fixed-size storage inspired by rrdtool/Graphite whisper (see below)</li>
<li>Optional: R2 monthly snapshots / exports</li>
<li>Optional: KV for query cache</li>
</ul>
<p><strong>Query plane</strong></p>
<ul>
<li>Worker endpoint: <code>GET /api/metrics/*</code></li>
<li>Serves typed metric queries from D1</li>
<li>Caches common dashboard reads</li>
</ul>
<p><strong>AI plane</strong></p>
<ul>
<li><code>POST /api/ask</code> with constrained query planner</li>
<li>MCP server exposes typed analytics tools</li>
<li>LLM summarizes deterministic outputs only</li>
</ul>
<hr>
<h3>2) Collection protocol</h3>
<h4>Event envelope</h4>
<pre><code class="language-ts">export type EventKind = &quot;pv&quot; | &quot;engaged&quot; | &quot;outbound&quot; | &quot;error&quot; | &quot;vital&quot; | &quot;custom&quot;;

export interface CollectEvent {
  kind: EventKind;
  ts: number; // epoch ms
  path: string; // normalized pathname
  ref?: string; // raw referrer (optional)
  title?: string; // optional page title hashable
  utm?: {
    source?: string;
    medium?: string;
    campaign?: string;
  };
  meta?: Record&lt;string, string | number | boolean | null&gt;;
}

export interface CollectRequest {
  site: string;         // public site key
  v: 1;                 // protocol version
  events: CollectEvent[];
}
</code></pre>
<h4>Default events (fire automatically)</h4>
<p>The client script emits these without any configuration:</p>
<table>
<thead>
<tr>
<th>Event</th>
<th>When</th>
<th>Data</th>
</tr>
</thead>
<tbody><tr>
<td><code>pv</code></td>
<td><code>DOMContentLoaded</code></td>
<td><code>path</code>, <code>ref</code>, <code>title</code></td>
</tr>
<tr>
<td><code>engaged</code></td>
<td>Composite trigger (scroll/dwell/click)</td>
<td><code>maxDepth</code> (0-100)</td>
</tr>
<tr>
<td><code>outbound</code></td>
<td>Click on <code>&lt;a&gt;</code> with external <code>href</code></td>
<td><code>path</code>, <code>meta.href</code>, <code>meta.text</code></td>
</tr>
<tr>
<td><code>milestone</code></td>
<td>Each scroll quartile crossed</td>
<td><code>meta.depth</code> (25/50/75/100)</td>
</tr>
<tr>
<td><code>error</code></td>
<td><code>window.onerror</code> / <code>onunhandledrejection</code></td>
<td><code>meta.message</code>, <code>meta.source</code>, <code>meta.line</code>, <code>meta.col</code></td>
</tr>
<tr>
<td><code>vital</code></td>
<td><code>web-vitals</code> callback</td>
<td><code>meta.vital</code> (lcp/inp/cls), <code>meta.value</code></td>
</tr>
</tbody></table>
<h4>JS error capture (Sentry-lite)</h4>
<p>Not a full error monitoring product. Just a signal: &quot;something broke on this page.&quot;</p>
<pre><code class="language-ts">window.onerror = (message, source, line, col) =&gt; {
  queue({
    kind: &quot;error&quot; as EventKind,
    ts: Date.now(),
    path: location.pathname,
    meta: {
      message: String(message).slice(0, 256),
      source: source?.split(&quot;/&quot;).pop() ?? &quot;unknown&quot;, // filename only, no full URLs
      line: line ?? 0,
      col: col ?? 0,
    },
  });
};

window.onunhandledrejection = (event) =&gt; {
  queue({
    kind: &quot;error&quot; as EventKind,
    ts: Date.now(),
    path: location.pathname,
    meta: {
      message: String(event.reason).slice(0, 256),
      source: &quot;unhandled_rejection&quot;,
    },
  });
};
</code></pre>
<p>What this answers:</p>
<ul>
<li>&quot;Is my site broken on Safari mobile?&quot; — check error rate by device class</li>
<li>&quot;Did my last deploy introduce a regression?&quot; — compare error rates across dates</li>
<li>&quot;Is this a real problem or a browser extension?&quot; — <code>source</code> filename distinguishes your code from injected scripts</li>
</ul>
<p>What this doesn&#39;t do: stack traces, source maps, breadcrumbs, user replay. For that, use Sentry. For a blog, error <em>rate</em> is the signal. Error <em>detail</em> is a debugging tool you reach for when the rate spikes.</p>
<p>Stored as custom events, aggregated into a simple error counter on <code>fact_daily</code>:</p>
<pre><code class="language-sql">ALTER TABLE fact_daily ADD COLUMN error_count INTEGER NOT NULL DEFAULT 0;
</code></pre>
<p>Custom events (<code>custom</code>) require explicit instrumentation:</p>
<pre><code class="language-ts">// Only if someone wants to track something specific
window.__a?.track(&quot;signup_click&quot;, { cta: &quot;header&quot; });
</code></pre>
<p>The API surface is one function. If <code>window.__a</code> doesn&#39;t exist (script blocked, not loaded yet), the call silently no-ops.</p>
<h4>OTel-inspired semantic naming</h4>
<p>We don&#39;t use the OpenTelemetry SDK. But we steal its naming discipline. When data leaves this system — via R2 export, MCP tool output, or the <code>ask</code> endpoint — field names should be recognizable to anyone who&#39;s worked with observability tooling.</p>
<table>
<thead>
<tr>
<th>Our field</th>
<th>OTel semantic convention</th>
<th>Why it matters</th>
</tr>
</thead>
<tbody><tr>
<td><code>path</code></td>
<td><code>url.path</code></td>
<td>Portable to any analytics tool</td>
</tr>
<tr>
<td><code>ref</code></td>
<td><code>http.request.header.referer</code></td>
<td>Standard, not invented</td>
</tr>
<tr>
<td><code>device_class</code></td>
<td><code>user_agent.device.type</code></td>
<td>DuckDB/BigQuery can join on this</td>
</tr>
<tr>
<td><code>country_code</code></td>
<td><code>client.geo.country_iso_code</code></td>
<td>ISO 3166-1 alpha-2, universally understood</td>
</tr>
<tr>
<td><code>traffic_class</code></td>
<td><code>http.request.bot_score_class</code></td>
<td>Our addition - no OTel equivalent yet</td>
</tr>
</tbody></table>
<p>The R2 monthly exports use OTel-aligned field names even if the internal D1 schema uses shorter column names. Export is the portability boundary — that&#39;s where naming discipline pays off.</p>
<h4>Lightweight RUM (Real User Monitoring)</h4>
<p>Core Web Vitals are too useful to skip and too small to justify skipping. Google&#39;s <code>web-vitals</code> library is ~1.5KB and gives you LCP, INP, and CLS — the three metrics that determine whether your analytics script is harming the site it&#39;s measuring.</p>
<pre><code class="language-ts">import { onLCP, onINP, onCLS } from &quot;web-vitals&quot;; // tree-shaken

onLCP((metric) =&gt; queue({ kind: &quot;custom&quot;, ts: Date.now(), path, meta: { vital: &quot;lcp&quot;, value: metric.value } }));
onINP((metric) =&gt; queue({ kind: &quot;custom&quot;, ts: Date.now(), path, meta: { vital: &quot;inp&quot;, value: metric.value } }));
onCLS((metric) =&gt; queue({ kind: &quot;custom&quot;, ts: Date.now(), path, meta: { vital: &quot;cls&quot;, value: metric.value } }));
</code></pre>
<p>This isn&#39;t a full RUM product. It&#39;s a canary:</p>
<ul>
<li>&quot;Did my last deploy regress LCP?&quot; — check the trend</li>
<li>&quot;Is the analytics script itself causing CLS?&quot; — it better not be</li>
<li>&quot;What&#39;s my real INP on mobile?&quot; — useful for any interactive elements</li>
</ul>
<p>Stored as custom events, aggregated into a <code>fact_daily_vitals</code> table:</p>
<pre><code class="language-sql">CREATE TABLE IF NOT EXISTS fact_daily_vitals (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  device_class TEXT NOT NULL DEFAULT &#39;unknown&#39;,
  lcp_p75 REAL,     -- milliseconds
  inp_p75 REAL,     -- milliseconds
  cls_p75 REAL,     -- unitless score
  sample_count INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, device_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);
</code></pre>
<p>p75 aggregation happens in the DO — not by storing raw values, but by maintaining a <a href="https://arxiv.org/abs/1908.10693">DDSketch</a> or simpler quantile estimator in memory and flushing the percentile. Fixed memory regardless of sample count.</p>
<p><strong>Budget impact:</strong> <code>web-vitals</code> is ~1.5KB gzipped. Combined with the core client (~500 bytes), total stays under the 2KB hard cap.</p>
<p>Protocol compatibility:</p>
<ul>
<li>server currently accepts <code>v=1</code></li>
<li>unknown fields are ignored (forward-compatible reads)</li>
<li>future protocol bumps should be explicit (<code>v=2</code> handling path + migration notes)</li>
</ul>
<h4>Engagement definition</h4>
<p>An <code>engaged</code> event is a composite boolean, true when <strong>any</strong> of these conditions hold:</p>
<ol>
<li>Reader crosses 60% article depth (single <code>IntersectionObserver</code> marker)</li>
<li>Reader keeps tab visible for &gt;= 20s on article route</li>
<li>Reader clicks an outbound link from the article body</li>
</ol>
<p>Why not only time-on-page? It lies. A tab left open in the background inflates engagement. Why not only scroll depth? It can miss accessibility and non-scroll reading patterns. Composite engagement is still simple, but less brittle.</p>
<p>This signal also helps bot filtering, but it is not the only bot defense.</p>
<h4>Scroll milestones</h4>
<p>Beyond the binary engagement signal, we track <em>how far</em> readers get. Four <code>IntersectionObserver</code> instances on invisible markers at article quartiles (25%, 50%, 75%, 100%). Each fires exactly once and disconnects.</p>
<pre><code class="language-ts">const MILESTONES = [25, 50, 75, 100] as const;
type ScrollMilestone = typeof MILESTONES[number];

function observeScrollDepth(article: HTMLElement) {
  const height = article.offsetHeight;
  let maxDepth = 0;

  for (const pct of MILESTONES) {
    const marker = document.createElement(&quot;div&quot;);
    marker.style.cssText =
      &quot;height:1px;width:1px;position:absolute;pointer-events:none;&quot;;
    marker.style.top = `${(pct / 100) * height}px`;
    article.style.position ||= &quot;relative&quot;;
    article.appendChild(marker);

    const observer = new IntersectionObserver(
      ([entry]) =&gt; {
        if (entry.isIntersecting) {
          maxDepth = Math.max(maxDepth, pct);
          observer.disconnect();
        }
      },
      { threshold: 0 },
    );
    observer.observe(marker);
  }

  return () =&gt; maxDepth;
}
</code></pre>
<p>Why <code>IntersectionObserver</code> instead of scroll listeners:</p>
<ul>
<li><strong>Zero main-thread work during scroll.</strong> Runs on the compositor thread — can&#39;t cause jank.</li>
<li><strong>No throttle/debounce decision to get wrong.</strong> The browser tells you when the marker enters the viewport.</li>
<li><strong>No <code>getBoundingClientRect()</code> calls.</strong> No forced reflow, no layout thrashing.</li>
<li><strong>Four callbacks total</strong> for the entire page visit. Then silence.</li>
</ul>
<p>What this unlocks for a writer:</p>
<ul>
<li>&quot;My analytics RFC has 90% reach-25 but only 30% reach-75 — people bail at the schema section&quot;</li>
<li>&quot;The pervasive AI essay has 65% completion rate — the long version holds better than expected&quot;</li>
<li>&quot;Posts under 1,500 words: 80% completion. Over 2,500: 45%. But the 45% that finish have 3x the outbound click rate&quot;</li>
</ul>
<p>The engagement event becomes: <code>{ kind: &quot;engaged&quot;, maxDepth: 75, ... }</code> instead of just a boolean.</p>
<p>Add <code>max_depth</code> to the daily fact:</p>
<pre><code class="language-sql">ALTER TABLE fact_daily ADD COLUMN avg_max_depth REAL; -- 0-100
ALTER TABLE fact_daily ADD COLUMN completions INTEGER NOT NULL DEFAULT 0; -- reached 100%
</code></pre>
<h4>Transport</h4>
<pre><code class="language-ts">let pending: CollectEvent[] = [];
let flushTimer: number | null = null;

function queue(event: CollectEvent) {
  pending.push(event);
  if (!flushTimer) {
    flushTimer = setTimeout(flush, 2000); // 2s debounce
  }
  if (pending.length &gt;= 10) flush(); // size cap
}

function flush() {
  if (!pending.length) return;
  navigator.sendBeacon(
    &quot;/a/collect&quot;,
    JSON.stringify({ site: SITE_KEY, v: 1, events: pending }),
  );
  pending = [];
  if (flushTimer) clearTimeout(flushTimer);
  flushTimer = null;
}

document.addEventListener(&quot;visibilitychange&quot;, () =&gt; {
  if (document.visibilityState === &quot;hidden&quot;) flush();
});
</code></pre>
<ul>
<li><code>sendBeacon</code> on <code>visibilitychange</code> with <code>fetch({ keepalive: true })</code> fallback</li>
<li>2-second debounce to batch rapid events (pageview + milestone + engagement)</li>
<li>Hard cap at 10 events per batch</li>
<li>Never blocks navigation</li>
</ul>
<h4>Why no Service Worker</h4>
<p>A Service Worker could queue failed beacons offline and retry via Background Sync. Sounds responsible. Not worth it.</p>
<ul>
<li><code>sendBeacon</code> already survives page navigation — the only failure case is total network loss, and losing one pageview during a subway tunnel is not a data integrity problem for a blog</li>
<li>Service Workers add a whole lifecycle (install, activate, update, cache versioning) for a client that&#39;s supposed to be &lt;2KB and understandable in 60 seconds</li>
<li>Service Workers can&#39;t call <code>sendBeacon</code> — you&#39;d reimplement the same reliability with <code>fetch</code>, wrapping a reliable API in a more complex one for the same result</li>
<li>The analytics script is &lt;2KB with <code>Cache-Control: immutable</code> — the browser&#39;s HTTP cache handles this without a SW cache layer</li>
<li>Cloudflare Workers already <em>are</em> the server-side &quot;service worker&quot; — intelligence on both ends creates two places to debug the same problem</li>
</ul>
<p>The entire client architecture:</p>
<pre><code>a.js (&lt; 2KB gzipped)
├── IntersectionObserver × 4 (scroll milestones)
├── event queue (plain array)
├── debounced flush (2s / 10 events)
├── sendBeacon on visibilitychange
└── that&#39;s it
</code></pre>
<p>No Service Worker. No IndexedDB. No Background Sync. No scroll listeners. Understandable in 60 seconds.</p>
<hr>
<h3>3) Privacy model</h3>
<ul>
<li>No cookies by default</li>
<li>No localStorage identifiers</li>
<li>No fingerprinting</li>
<li>No raw IP storage (derive coarse geo server-side then discard)</li>
<li>Configurable retention by table</li>
<li>Optional per-site &quot;strict mode&quot; that disables custom events entirely</li>
</ul>
<p>Collection integrity (anti-poisoning):</p>
<ul>
<li>signed payload mode for first-party script (<code>HMAC(body + ts)</code>)</li>
<li>replay window enforcement (reject stale timestamps)</li>
<li>optional origin allowlist for collection endpoint</li>
</ul>
<hr>
<h3>4) Data model (D1)</h3>
<p>Schema strategy is intentionally split:</p>
<ul>
<li><strong>v1:</strong> optimize for low cardinality and cheap queries (one row per <code>site,date,path</code>)</li>
<li><strong>v2:</strong> add breakdown tables for referrer/country/device when needed</li>
</ul>
<pre><code class="language-sql">-- Sites
CREATE TABLE IF NOT EXISTS site (
  id TEXT PRIMARY KEY,
  slug TEXT NOT NULL UNIQUE,
  created_at INTEGER NOT NULL
);

-- Normalized dimensions
CREATE TABLE IF NOT EXISTS dim_path (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  site_id TEXT NOT NULL,
  path TEXT NOT NULL,
  path_hash TEXT NOT NULL,
  UNIQUE(site_id, path_hash),
  FOREIGN KEY(site_id) REFERENCES site(id)
);

CREATE TABLE IF NOT EXISTS dim_referrer (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  site_id TEXT NOT NULL,
  source TEXT NOT NULL,      -- normalized enum (see below)
  raw_host TEXT,             -- optional debug host
  referrer_query TEXT,       -- AI search query context (Phase 2, nullable)
  UNIQUE(site_id, source, raw_host),
  FOREIGN KEY(site_id) REFERENCES site(id)
);

-- Referrer source enum (≤15 values, normalize at ingestion):
-- direct | google | bing | duckduckgo | x | hn | reddit | linkedin
-- mastodon | bluesky | newsletter | rss | github | chatgpt | perplexity | other
--
-- Normalization: classifyReferrer(host) maps hostnames to sources.
-- e.g. t.co, twitter.com, x.com → &quot;x&quot;
-- e.g. news.ycombinator.com → &quot;hn&quot;
-- e.g. chatgpt.com, chat.openai.com → &quot;chatgpt&quot;
-- e.g. perplexity.ai → &quot;perplexity&quot;
-- Unknown hosts → &quot;other&quot;.
-- raw_host is retained only for bounded debugging/reporting, not unbounded growth.
--
-- AI search referrer enrichment (Phase 2):
-- Some AI search engines pass query context in URL params (?q=, ?query=, etc.)
-- When referrer is an AI search source AND query params are present,
-- extract and store as referrer_query in dim_referrer.
-- This answers &quot;why did someone land here?&quot; - not just &quot;where from?&quot;
-- Bounded: only captured for AI search sources, not all referrers.

-- v1 core daily fact (low cardinality)
CREATE TABLE IF NOT EXISTS fact_daily (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,    -- YYYY-MM-DD
  path_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  outbound_clicks INTEGER NOT NULL DEFAULT 0,
  custom_events INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- v2 optional breakdown tables
CREATE TABLE IF NOT EXISTS fact_daily_referrer (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  referrer_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, referrer_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id),
  FOREIGN KEY(referrer_id) REFERENCES dim_referrer(id)
);

CREATE TABLE IF NOT EXISTS fact_daily_country (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  country_code TEXT NOT NULL DEFAULT &#39;XX&#39;,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, country_code),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

CREATE TABLE IF NOT EXISTS fact_daily_device (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  device_class TEXT NOT NULL DEFAULT &#39;unknown&#39;, -- mobile|desktop|tablet|unknown
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, device_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- Hourly optional short window (e.g. last 7-14 days)
CREATE TABLE IF NOT EXISTS fact_hourly (
  site_id TEXT NOT NULL,
  hour_utc TEXT NOT NULL,    -- YYYY-MM-DDTHH
  path_id INTEGER NOT NULL,
  pageviews INTEGER NOT NULL DEFAULT 0,
  engaged_visits INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, hour_utc, path_id),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);

-- Flush idempotency
CREATE TABLE IF NOT EXISTS flush_batch (
  batch_id TEXT PRIMARY KEY,
  site_id TEXT NOT NULL,
  created_at INTEGER NOT NULL,
  FOREIGN KEY(site_id) REFERENCES site(id)
);

CREATE INDEX IF NOT EXISTS idx_fact_daily_site_date ON fact_daily(site_id, date_utc);
CREATE INDEX IF NOT EXISTS idx_fact_daily_path ON fact_daily(site_id, path_id, date_utc);
CREATE INDEX IF NOT EXISTS idx_fact_hourly_site_hour ON fact_hourly(site_id, hour_utc);
</code></pre>
<p>Idempotency rule: every aggregator flush includes a <code>batch_id</code>; writes occur in one transaction that first inserts into <code>flush_batch</code>. If <code>batch_id</code> already exists, skip (already applied).</p>
<hr>
<h3>5) Durable Object aggregator contract</h3>
<pre><code class="language-ts">export interface AggregateKey {
  siteId: string;
  dateUtc: string;
  hourUtc?: string;
  pathKey: string;
  refKey?: string;
  countryCode?: string;
  deviceClass?: &quot;mobile&quot; | &quot;desktop&quot; | &quot;tablet&quot; | &quot;unknown&quot;;
}

export interface AggregateDelta {
  pageviews?: number;
  engagedVisits?: number;
  outboundClicks?: number;
  customEvents?: number;
}

export interface AggregatorMessage {
  key: AggregateKey;
  delta: AggregateDelta;
}

export interface FlushPolicy {
  maxBuffered: number;   // e.g. 1_000
  flushEveryMs: number;  // e.g. 10_000
}
</code></pre>
<p>Behavior:</p>
<ul>
<li>batch id required per flush (idempotent apply via <code>flush_batch</code>)</li>
<li>coalesce in memory</li>
<li>periodic/alarm-based flush to D1</li>
<li>backpressure mode: if D1 contention, exponential backoff + queue in DO</li>
<li>flush is transactional: register batch -&gt; upsert facts -&gt; commit</li>
</ul>
<hr>
<h3>6) Query API interfaces</h3>
<pre><code class="language-ts">export interface Period {
  from: string; // YYYY-MM-DD
  to: string;   // YYYY-MM-DD
}

export interface TopPost {
  path: string;
  pageviews: number;
  engagedVisits: number;
  engagementRate: number;
}

export interface TopPostsResponse {
  period: Period;
  rows: TopPost[];
}

export interface ReferrerRow {
  source: string;
  pageviews: number;
  engagedVisits: number;
}

export interface ComparePeriodsResponse {
  current: {
    pageviews: number;
    engagedVisits: number;
  };
  previous: {
    pageviews: number;
    engagedVisits: number;
  };
  delta: {
    pageviewsPct: number;
    engagedVisitsPct: number;
  };
}
</code></pre>
<p>Endpoints:</p>
<ul>
<li><code>GET /api/metrics/top-posts?from=...&amp;to=...&amp;limit=...</code></li>
<li><code>GET /api/metrics/referrers?from=...&amp;to=...</code></li>
<li><code>GET /api/metrics/compare?from=...&amp;to=...&amp;prevFrom=...&amp;prevTo=...</code></li>
<li><code>GET /api/metrics/timeseries?path=...&amp;granularity=day|hour</code></li>
</ul>
<p>API constraints (hard limits):</p>
<ul>
<li>max query range: 365 days</li>
<li>max <code>limit</code> for ranked endpoints: 200</li>
<li>hourly granularity range cap: 14 days</li>
<li>per-IP/token rate limiting on analytics read and ask endpoints</li>
<li>cache-by-shape with TTL for repeated queries</li>
</ul>
<hr>
<h3>7) AI query layer</h3>
<p>The AI calls typed metric functions. It does not write SQL.</p>
<h4>Deterministic analysis functions</h4>
<pre><code class="language-ts">export interface AnalyticsService {
  topPosts(period: Period, limit: number): Promise&lt;TopPostsResponse&gt;;
  referrerBreakdown(period: Period): Promise&lt;ReferrerRow[]&gt;;
  comparePeriods(current: Period, previous: Period): Promise&lt;ComparePeriodsResponse&gt;;
  postTrend(path: string, period: Period): Promise&lt;Array&lt;{ date: string; pageviews: number }&gt;&gt;;
  evergreenScore(path: string, trailingDays: number): Promise&lt;number&gt;;
}
</code></pre>
<h4>Ask endpoint</h4>
<pre><code class="language-ts">export interface AskRequest {
  question: string;
  siteId: string;
}

export interface AskResponse {
  answer: string;
  evidence: Array&lt;{
    metric: string;
    value: number | string;
    source: string; // function + params
  }&gt;;
}
</code></pre>
<p>Guardrails:</p>
<ul>
<li>planner maps question -&gt; allowed function calls</li>
<li>result summaries must cite evidence blocks</li>
<li>no speculative claims without corresponding metric</li>
<li>answer claims are tagged with confidence classes (<code>high</code>, <code>medium</code>, <code>low</code>) based on evidence directness</li>
</ul>
<hr>
<h3>8) MCP surface</h3>
<p>Tools for agent clients:</p>
<ul>
<li><code>analytics.get_top_posts</code></li>
<li><code>analytics.get_referrer_breakdown</code></li>
<li><code>analytics.compare_periods</code></li>
<li><code>analytics.get_post_trend</code></li>
<li><code>analytics.get_evergreen_score</code></li>
<li><code>analytics.ask</code> (optional convenience wrapper)</li>
</ul>
<p>Tool outputs should be machine-readable first, narrative second.</p>
<hr>
<h3>9) Performance / Lighthouse constraints</h3>
<p>Hard requirements:</p>
<ol>
<li>Script is <code>defer</code> and non-blocking</li>
<li>No synchronous XHR</li>
<li>No DOM writes from analytics script</li>
<li>No layout-affecting injected elements</li>
<li>Beacon dispatch on <code>visibilitychange</code> and <code>pagehide</code> fallback</li>
<li>JS parse/execute budget tracked in CI</li>
</ol>
<p>Perf acceptance targets:</p>
<ul>
<li>No measurable regression in LCP/INP/CLS p75 beyond noise</li>
<li>CLS impact from analytics: zero</li>
</ul>
<hr>
<h3>10) Free-tier operating mode</h3>
<p>Staying on the free tier:</p>
<ul>
<li>aggregate-first writes (no raw event warehouse)</li>
<li>bounded cardinality dimensions</li>
<li>query caching for common dashboard reads</li>
<li>optional sampling when approaching quota</li>
<li>hourly table retention cap (e.g. 14 days)</li>
</ul>
<p>Safety valve: &quot;governor mode&quot; step-down ladder under sustained high volume:</p>
<ol>
<li>disable custom event ingestion</li>
<li>drop country/device breakdown writes</li>
<li>increase sampling factor (for pageviews only)</li>
<li>disable hourly writes (daily only)</li>
</ol>
<p>Each transition emits an ops event so reduced fidelity is explicit, not silent.</p>
<hr>
<h3>11) Migration strategy</h3>
<p>Phase 1: pageviews + engaged + top posts + referrers + MCP tools + tiny <code>/analytics</code> snapshot
Phase 2: period compare + trend series + cache/rate-limit hardening
Phase 3: conversational <code>ask</code> with evidence citations
Phase 4: optional exports (R2) + scheduled digests</p>
<hr>
<h3>12) Enforced constraints (not advisory)</h3>
<p>Advisory constraints get ignored under pressure. These are enforced in CI, at build time, or at runtime — not in a README.</p>
<p><strong>Schema validation (Zod)</strong></p>
<p>Every boundary gets a Zod schema. Not just &quot;for documentation&quot; — for runtime parsing.</p>
<pre><code class="language-ts">const CollectEventSchema = z.object({
  kind: z.enum([&quot;pv&quot;, &quot;engaged&quot;, &quot;outbound&quot;, &quot;custom&quot;]),
  ts: z.number().int().positive(),
  path: z.string().min(1).max(2048).startsWith(&quot;/&quot;),
  ref: z.string().max(2048).optional(),
  meta: z.record(z.union([z.string(), z.number(), z.boolean(), z.null()])).optional(),
});

const CollectRequestSchema = z.object({
  site: z.string().min(1).max(64),
  v: z.literal(1),
  events: z.array(CollectEventSchema).min(1).max(50),
});
</code></pre>
<p>The ingestion Worker <code>parse()</code>s with Zod before anything else. Malformed payloads get a 400 and a counter bump — never silent drops, never partial ingestion.</p>
<p><strong>Linting and formatting</strong></p>
<p>Biome. Not ESLint. Single tool for lint + format. Runs in CI as a gate, not a suggestion. Zero warnings policy — if it warns, either fix it or configure it out. No <code>// biome-ignore</code> without a comment explaining why.</p>
<p><strong>Type boundaries</strong></p>
<p>Every Worker ↔ DO ↔ D1 boundary has an explicit type. <code>AggregatorMessage</code> isn&#39;t just an interface in a file — it&#39;s the actual shape that gets validated at the DO&#39;s <code>fetch()</code> handler. If the ingestion Worker sends the wrong shape, it fails loudly at the boundary, not silently deep inside the aggregator.</p>
<pre><code class="language-ts">// This is a type AND a runtime validator
const AggregatorMessageSchema = z.object({ ... });
type AggregatorMessage = z.infer&lt;typeof AggregatorMessageSchema&gt;;
</code></pre>
<p>Types and validators derived from the same source. No drift.</p>
<p><strong>Client script budget</strong></p>
<p>The analytics client (<code>a.js</code>) has a hard size gate in CI:</p>
<pre><code class="language-bash"># In CI pipeline
GZIP_SIZE=$(gzip -c dist/a.js | wc -c)
if [ &quot;$GZIP_SIZE&quot; -gt 2048 ]; then
  echo &quot;❌ a.js exceeds 2KB gzipped ($GZIP_SIZE bytes)&quot;
  exit 1
fi
</code></pre>
<p>Not aspirational. A gate. If the script grows past 2KB gzipped, the build fails.</p>
<hr>
<h3>13) Development process</h3>
<p><strong>Testing strategy</strong></p>
<p>Three layers, each with a clear job:</p>
<table>
<thead>
<tr>
<th>Layer</th>
<th>Tool</th>
<th>What it tests</th>
<th>When it runs</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Unit</strong></td>
<td>Vitest</td>
<td>Pure functions: <code>classifyReferrer()</code>, <code>coalesceDeltas()</code>, <code>computeEngagement()</code></td>
<td>Every commit (CI)</td>
</tr>
<tr>
<td><strong>Integration</strong></td>
<td>Miniflare</td>
<td>Worker ↔ DO ↔ D1 round-trips: &quot;event in → aggregate out → query returns correct count&quot;</td>
<td>Every PR</td>
</tr>
<tr>
<td><strong>Contract</strong></td>
<td>Zod assertions</td>
<td>Boundary shapes: &quot;the DO accepts exactly this shape and rejects everything else&quot;</td>
<td>Every commit (CI)</td>
</tr>
</tbody></table>
<p>No E2E browser tests for v1. The client script is small enough to unit test the beacon logic. The server is tested through integration. Browser testing is cost that doesn&#39;t pay for itself at this scale.</p>
<p><strong>What doesn&#39;t get tested:</strong> D1 SQL syntax (trust SQLite), Cloudflare routing config (tested by deployment), UI layout (there&#39;s barely any UI).</p>
<p><strong>Environments</strong></p>
<table>
<thead>
<tr>
<th>Env</th>
<th>Purpose</th>
<th>D1 instance</th>
<th>URL</th>
</tr>
</thead>
<tbody><tr>
<td><code>local</code></td>
<td>Development + integration tests</td>
<td>Miniflare (in-memory)</td>
<td><code>localhost:8787</code></td>
</tr>
<tr>
<td><code>production</code></td>
<td>Live</td>
<td>Production D1</td>
<td><code>analytics.bristanback.com</code></td>
</tr>
</tbody></table>
<p><strong>Why not ephemeral deploys per branch?</strong> Solo developer, one blog. Miniflare replicates D1/DO behavior well enough for integration tests. Ephemeral preview environments are team infrastructure — real value when multiple people need to review deployments, overhead when it&#39;s just you. If this grows beyond a solo project, add a <code>preview</code> env with branch-scoped D1. The wrangler <code>--env</code> flag makes it trivial to add later.</p>
<p><strong>CI pipeline</strong></p>
<pre><code>biome check → tsc --noEmit → vitest run → size gate (a.js) →
  wrangler deploy --env preview (on PR)
  wrangler deploy --env production (on merge to main)
</code></pre>
<p>Every step is a gate. No &quot;allowed to fail&quot; steps.</p>
<p><strong>Secrets management</strong></p>
<p>No <code>.env</code> files in the repo. Ever. Wrangler secrets for API keys. Environment-specific config (site IDs, rate limits) via <code>wrangler.jsonc</code> env blocks. The only thing that differs between preview and production is the D1 binding and the domain — everything else is code.</p>
<hr>
<h3>14) Data evolution and governance</h3>
<p><strong>Schema migrations</strong></p>
<p>D1 doesn&#39;t have a migration runner. So we build one — minimal:</p>
<pre><code class="language-ts">const MIGRATIONS = [
  { version: 1, sql: `CREATE TABLE IF NOT EXISTS site ...` },
  { version: 2, sql: `CREATE TABLE IF NOT EXISTS fact_daily_referrer ...` },
  // ...
] as const;

async function migrate(db: D1Database): Promise&lt;void&gt; {
  await db.exec(`CREATE TABLE IF NOT EXISTS _migrations (
    version INTEGER PRIMARY KEY,
    applied_at INTEGER NOT NULL
  )`);
  const applied = await db.prepare(`SELECT version FROM _migrations`).all();
  const appliedSet = new Set(applied.results.map(r =&gt; r.version));
  for (const m of MIGRATIONS) {
    if (!appliedSet.has(m.version)) {
      await db.exec(m.sql);
      await db.prepare(`INSERT INTO _migrations (version, applied_at) VALUES (?, ?)`)
        .bind(m.version, Date.now()).run();
    }
  }
}
</code></pre>
<p>Runs on Worker startup (cached after first run). Forward-only — no down migrations. If a migration is wrong, ship a new forward migration that fixes it.</p>
<p><strong>The rrdtool principle: fixed storage, automatic rollup</strong></p>
<p>The oldest good idea in time-series storage is rrdtool&#39;s Round Robin Database (1999): allocate fixed space, and as data ages, roll it into coarser resolution. Graphite&#39;s whisper format does the same thing. The insight is that you never need per-minute granularity from six months ago — you need the shape, not the points.</p>
<p>This system applies the same principle to D1:</p>
<pre><code>Resolution tiers:
  0-14 days  → hourly   (fact_hourly)
  0-90 days  → daily + breakdowns (fact_daily_referrer, _country, _device)
  0-∞        → daily core (fact_daily) — just path × pageviews × engaged
</code></pre>
<p>As data ages, it loses dimensions but never disappears. The Cron Trigger isn&#39;t just &quot;deleting old data&quot; — it&#39;s the rollup engine. Before deleting <code>fact_hourly</code> rows, it verifies they&#39;ve been absorbed into <code>fact_daily</code>. Before deleting breakdown tables, the core fact still holds the totals.</p>
<p>The result: storage cost is roughly proportional to the number of <em>paths</em> (content), not the number of <em>events</em> (traffic). A blog with 50 posts and 5 years of history fits comfortably in D1&#39;s free tier. That&#39;s the rrdtool promise: time passes, storage doesn&#39;t grow.</p>
<p><strong>Retention policy</strong></p>
<table>
<thead>
<tr>
<th>Table</th>
<th>Retention</th>
<th>Rollup target</th>
<th>Enforcement</th>
</tr>
</thead>
<tbody><tr>
<td><code>fact_hourly</code></td>
<td>14 days</td>
<td><code>fact_daily</code></td>
<td>Cron: verify daily absorbed, then delete</td>
</tr>
<tr>
<td><code>fact_daily_referrer</code></td>
<td>90 days</td>
<td><code>fact_daily</code> (totals preserved)</td>
<td>Cron trigger</td>
</tr>
<tr>
<td><code>fact_daily_country</code></td>
<td>90 days</td>
<td><code>fact_daily</code> (totals preserved)</td>
<td>Cron trigger</td>
</tr>
<tr>
<td><code>fact_daily_device</code></td>
<td>90 days</td>
<td><code>fact_daily</code> (totals preserved)</td>
<td>Cron trigger</td>
</tr>
<tr>
<td><code>fact_daily_vitals</code></td>
<td>90 days</td>
<td>- (historical vitals less useful)</td>
<td>Cron trigger</td>
</tr>
<tr>
<td><code>fact_daily</code></td>
<td>Indefinite</td>
<td>-</td>
<td>Fixed cost ≈ rows × paths</td>
</tr>
<tr>
<td><code>dim_referrer.raw_host</code></td>
<td>30 days</td>
<td><code>source</code> enum preserved</td>
<td>Cron nullifies <code>raw_host</code></td>
</tr>
<tr>
<td><code>flush_batch</code></td>
<td>7 days</td>
<td>-</td>
<td>Cron trigger</td>
</tr>
</tbody></table>
<p>Retention is enforced by a scheduled Worker (Cron Trigger), not by hope. Each purge logs what it deleted for auditability. The Cron runs daily and is itself idempotent — safe to re-run, safe to miss a day.</p>
<p><strong>Cardinality governance</strong></p>
<p>Unbounded dimensions are how analytics systems die. Hard limits:</p>
<ul>
<li><code>dim_path</code>: max 10,000 per site (after that, new paths map to <code>/other</code>)</li>
<li><code>dim_referrer</code>: max 1,000 per site (unknown hosts → <code>other</code>)</li>
<li><code>custom</code> event <code>meta</code> keys: max 10 per event, values max 256 chars</li>
</ul>
<p>These aren&#39;t advisory. The ingestion Worker enforces them at write time. If a dimension table is full, new values get bucketed into the catch-all. An ops event fires so you know it happened.</p>
<p><strong>Traffic classification</strong></p>
<p>Cloudflare&#39;s Bot Management gives us a <code>cf.botManagement.score</code> (0-99) on every request. We should use it, but not as a binary filter — as a dimension.</p>
<pre><code class="language-sql">CREATE TABLE IF NOT EXISTS fact_daily_traffic_class (
  site_id TEXT NOT NULL,
  date_utc TEXT NOT NULL,
  path_id INTEGER NOT NULL,
  traffic_class TEXT NOT NULL, -- human | likely_human | likely_bot | verified_bot | ai_crawler
  pageviews INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (site_id, date_utc, path_id, traffic_class),
  FOREIGN KEY(site_id) REFERENCES site(id),
  FOREIGN KEY(path_id) REFERENCES dim_path(id)
);
</code></pre>
<p>Classification logic:</p>
<pre><code class="language-ts">type TrafficClass = &quot;human&quot; | &quot;likely_human&quot; | &quot;likely_bot&quot; | &quot;verified_bot&quot; | &quot;ai_crawler&quot;;

function classifyTraffic(req: Request): TrafficClass {
  const score = req.cf?.botManagement?.score ?? 50;
  const verifiedBot = req.cf?.botManagement?.verifiedBot ?? false;
  const ua = req.headers.get(&quot;user-agent&quot;) ?? &quot;&quot;;

  // AI crawlers: GPTBot, ClaudeBot, Google-Extended, Bytespider, CCBot, etc.
  if (/GPTBot|ClaudeBot|Claude-Web|Google-Extended|Bytespider|CCBot|PerplexityBot|Amazonbot/i.test(ua)) {
    return &quot;ai_crawler&quot;;
  }
  if (verifiedBot) return &quot;verified_bot&quot;;   // Googlebot, Bingbot, etc.
  if (score &gt;= 80) return &quot;human&quot;;
  if (score &gt;= 30) return &quot;likely_human&quot;;
  return &quot;likely_bot&quot;;
}
</code></pre>
<p>This gives you answers to questions that actually matter in 2026:</p>
<ul>
<li>&quot;How much of my traffic is AI crawlers vs humans?&quot;</li>
<li>&quot;Is GPTBot reading my posts? Which ones?&quot;</li>
<li>&quot;What&#39;s my real human readership after filtering bots?&quot;</li>
<li>&quot;Are search engines still sending me traffic, or has AI eaten that?&quot;</li>
</ul>
<p>The <code>traffic_class</code> dimension is a v2 breakdown table — same pattern as referrer/country/device. Not in the core fact table, not on the hot path, but available when you want to ask the question.</p>
<p><strong>Data export and portability</strong></p>
<p>Monthly R2 snapshots (Phase 4) export the full D1 state as NDJSON. This is your escape hatch — if you outgrow D1, if Cloudflare changes pricing, if you want to move to ClickHouse someday. The data is always yours, always portable, always in a format that any system can ingest.</p>
<hr>
<h2>Decisions</h2>
<ol>
<li><strong>Engagement:</strong> Composite boolean (scroll depth OR visible dwell OR outbound click), emitted once per pageview. (See collection protocol.)</li>
<li><strong>Bot filtering:</strong> Edge heuristics first (Cloudflare bot score + UA), engagement as secondary quality signal. No custom bot signature maintenance.</li>
<li><strong>Referrer normalization:</strong> Small enum (≤15 values), classified at ingestion. (See data model.)</li>
<li><strong>Multi-site:</strong> Shared DB with <code>site_id</code> partitioning. One blog now, column&#39;s already there if that changes.</li>
<li><strong>Fact cardinality:</strong> v1 uses low-cardinality core fact table; breakdowns are separate optional tables.</li>
<li><strong>Write integrity:</strong> Aggregator flushes are idempotent and transactional via <code>flush_batch</code>.</li>
<li><strong>Validation:</strong> Zod at every boundary. Types and validators derived from the same schema — no drift.</li>
<li><strong>Testing:</strong> Unit (pure functions) + integration (Miniflare round-trips) + contract (Zod). No E2E browser tests at this scale.</li>
<li><strong>Environments:</strong> Miniflare for local dev and integration tests. No ephemeral deploys until the project outgrows solo development.</li>
<li><strong>Migrations:</strong> Forward-only, versioned, runs on startup. No down migrations.</li>
<li><strong>Traffic classification:</strong> 5-class system (human → ai_crawler) using <code>cf.botManagement.score</code> + UA patterns. Stored as a v2 breakdown dimension, not in core fact.</li>
<li><strong>Data portability:</strong> Monthly NDJSON exports to R2. The data is always yours.</li>
</ol>
<h2>Open questions</h2>
<ol>
<li>How much hourly granularity is genuinely useful for bloggers? (14-day window feels right — enough for &quot;is this post spiking?&quot; without long-term storage cost.)</li>
<li><del>Should <code>ask</code> run only against precomputed views to prevent query abuse?</del> <strong>Decided:</strong> Hard budget of 8 tool calls per <code>ask</code> request. The AI analyst gets a fixed number of moves — enough for compare + breakdown + follow-up, not enough to accidentally run up the D1 bill. If it can&#39;t answer in 8 calls, the question is too vague; ask the human to narrow it.</li>
<li>Dashboard surface: should this be MCP-only, or include a web UI?</li>
</ol>
<p>Decision: <strong>MCP-first + tiny snapshot page</strong>.</p>
<ul>
<li>Phase 1 ships MCP tools as the primary interface (Telegram/OpenClaw queries)</li>
<li>Also ship a minimal <code>/analytics</code> snapshot page (auth-gated) with 4 cards:<ul>
<li>7-day pageviews</li>
<li>top posts</li>
<li>top referrers</li>
<li>week-over-week delta</li>
</ul>
</li>
</ul>
<p>Rationale: MCP-only is fastest and matches current workflow, but a tiny snapshot preserves glanceability without adding a heavy dashboard surface. Both MCP and snapshot use the same typed metrics/query layer, so no duplicate logic.</p>
<hr>
<h2>What the client actually looks like</h2>
<pre><code class="language-html">&lt;script defer src=&quot;/a.js&quot; data-site=&quot;bri-blog&quot;&gt;&lt;/script&gt;
</code></pre>
<p><code>a.js</code> does three things: capture a pageview, queue an engagement event, flush via beacon when the tab hides. Nothing else.</p>
<hr>
<p>The point of this system is not to know everything.</p>
<p>It&#39;s to know enough, quickly, privately, and truthfully — then use AI to ask better questions over that evidence.</p>
<p>Not surveillance. Not theater. Just instrumentation with taste.</p>
<p>One of the first things I ever built for the web was a hit counter — Perl/CGI, flat file, incremented on every page load. I still get emails about it. Twenty-something years later, after the entire Google Analytics era — Urchin to UA to GA4 to ripping it all out — I&#39;m back to building my own way to answer &quot;did anyone read this?&quot; The tools are different. The question hasn&#39;t changed.</p>
<h3>A note on scale and honesty</h3>
<p>This system is for a personal blog that gets maybe 200 visits a day. At that volume, most of this architecture is overkill. You could write directly to D1 from the ingestion Worker with upserts and it&#39;d be fine for years. And yes — writing a 3,000-word spec with Zod schemas, a Mermaid diagram, non-goals, and a migration plan for something most people would solve with Plausible and a weekend is peak RFC-maxxing. Noted.</p>
<p>I know that. The Durable Object aggregation layer, the governor mode, the flush batching — none of it is justified by current traffic. A blog doesn&#39;t need backpressure. It needs a database and a query.</p>
<p>But this is also a learning exercise. I want to understand Cloudflare&#39;s primitives — DOs, D1, Workers, Cron Triggers, R2 — by building something real with them, not by reading docs. And &quot;real&quot; means making architectural decisions that would matter at scale, even if they don&#39;t matter yet. The DO isn&#39;t here because I need a write buffer. It&#39;s here because I want to understand how write buffers behave on this platform — what the alarm API feels like, how DO ↔ D1 coordination works under contention, what happens when you actually need backpressure.</p>
<p>One known tradeoff: DO cold starts. At low traffic, the DO will frequently be evicted and wake cold for the first event of a session — maybe ~50ms of latency. But the client fires beacons and doesn&#39;t wait for responses, so the user never feels it. The cost is invisible; the buffer is just slightly lazier waking up.</p>
<p>The tradeoff is honest: I&#39;m trading simplicity for education. The system is designed to handle tens to hundreds of thousands of events responsibly and resource-efficiently — not because my blog will generate them, but because the next thing I build on this stack might, and I&#39;d rather learn the failure modes now with low stakes.</p>
<p>Where this gets dangerous is if the learning exercise calcifies into production complexity. The commitment: if the DO layer turns out to be pure overhead with no educational payoff, rip it out and go direct-to-D1. Architecture should serve the system, not the architect&#39;s curiosity. But right now, the curiosity is the point.</p>
<p>This is a working spec — part of the workshop. I&#39;ll follow up with what actually survived contact with reality once it&#39;s built.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/ai-native-analytics-rfc-hero.webp" medium="image" type="image/webp" />
      <category>architecture</category>
      <category>ai</category>
      <category>systems</category>
      <category>tools</category>
    </item>
    <item>
      <title>Pervasive AI: What Happens When Your Assistant Never Logs Off</title>
      <link>https://bristanback.com/posts/pervasive-ai-beyond-chat-window/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/pervasive-ai-beyond-chat-window/</guid>
      <pubDate>Fri, 20 Feb 2026 02:00:00 GMT</pubDate>
      <atom:updated>2026-02-27T18:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>I&apos;ve spent a month running a personal AI agent on a Mac Mini. The technology works. The promises don&apos;t — at least not the way we were told they would.</description>
      <content:encoded><![CDATA[<h2>The Mac Mini That Started Everything</h2>
<p>Yes, another post about AI. I know. I <em>promise</em> I have other interests. But this one&#39;s less about the technology and more about what it&#39;s like to actually live with it — so bear with me.</p>
<p>About a year ago, I bought a Mac mini with M4 Max. Honestly? It was mostly going to be a glorified NAS — something compact I could hook a Thunderbolt RAID enclosure to — plus a place to run some quantized GGUF models locally and see what the fuss was about. I wasn&#39;t trying to be ahead of any curve.</p>
<p>Turns out I was. By early 2026, demand for Mac Minis had spiked so hard that higher-memory models were <a href="https://www.tomshardware.com/tech-industry/artificial-intelligence/openclaw-fueled-ordering-frenzy-creates-apple-mac-shortage-delivery-for-high-unified-memory-units-now-ranges-from-6-days-to-6-weeks">backordered 2-6 weeks</a> — <a href="https://www.businessinsider.com/apple-mac-mini-having-a-moment-openclaw-craze-2026-2">Business Insider</a>, <a href="https://www.techradar.com/computing/macs/mac-mini-shortages-are-starting-to-happen-and-the-openclaw-ai-boom-is-a-key-reason">TechRadar</a>, and Tom&#39;s Hardware all covered it. Developers discovered what I&#39;d stumbled into: a quiet, always-on machine is the prerequisite for a fundamentally different relationship with AI. Not a tool you open. A presence that&#39;s just <em>there</em> — humming away on a shelf it shares with a stack of outgrown toddler clothes and a sticky bottle of Motrin, next to the books and the baby monitor.</p>
<p>The catalyst was <a href="https://github.com/openclaw/openclaw">OpenClaw</a> — an open-source project that&#39;s now at 211,000 GitHub stars — up from zero in November. It lets you wire a language model into your messaging apps, browser, file system, calendar, and more. The pitch is seductive: message your AI like a coworker and it handles everything a person could do at that desk.</p>
<p>I&#39;ve been running it daily since late January. Here&#39;s what actually happened — the good, the expensive, and the slightly unnerving.</p>
<hr>
<h2>The Pervasiveness Thesis</h2>
<p>The thing that changed wasn&#39;t intelligence. Claude Opus was brilliant before I ran it through OpenClaw. GPT-5.2 was capable in ChatGPT&#39;s interface. Even Sonnet could handle most of what I threw at it. The models were already good. The breakthrough is <em>reach</em>.</p>
<p>I message my agent from Telegram — same thread whether I&#39;m at my desk with coffee or sitting in the preschool parking lot five minutes early. It checks my email, manages my calendar, runs scheduled tasks while I sleep, and picks up context from wherever I left off. It&#39;s not just running cron jobs — it&#39;s offloading the invisible mental load that never turns off. Remembering that Tuesday is Crazy Hair Day at preschool. Drafting the pediatrician follow-up so I don&#39;t have to hold it in my brain at 11pm. I wrote about this feeling in [[Building at the Speed of Thought]] — that compression of intention to action. OpenClaw takes that compression and makes it ambient.</p>
<p>This tracks with what analysts are calling the defining shift of 2025-2026: the move from destination AI (you go to ChatGPT) to ambient AI (it comes to you). <a href="https://www.hugeinc.com/perspectives/ai-predictions-2026/">Huge Inc put it well</a> in their 2026 predictions: &quot;The race for &#39;smartest&#39; ends, and the race for &#39;ubiquity&#39; begins. Today&#39;s chat-based tools suffer from a distinct disadvantage: they are a destination.&quot;</p>
<p>The chat window was a bottleneck disguised as a feature. I didn&#39;t realize how much friction it added until it was gone.</p>
<p>There&#39;s something subtly unsettling about that, though. Software that doesn&#39;t wait to be summoned. ChatGPT&#39;s <a href="https://openai.com/index/memory-and-new-controls-for-chatgpt/">memory feature</a> hinted at this — and honestly, it does a remarkably good job. It&#39;s not just recalling previous conversations; it encodes your preferences, your taste, your experiences, the way you think. It builds a model of <em>you</em> that makes every interaction feel more natural over time. The first time it referenced something you mentioned weeks ago, most people had a little moment.</p>
<p>But an always-on agent goes somewhere different. ChatGPT&#39;s memory is about personalization — making the AI feel like it knows you. OpenClaw&#39;s memory is about <em>continuity</em> — maintaining a linear history of what happened, what was decided, what to do next. It&#39;s less &quot;she prefers bullet points&quot; and more &quot;yesterday we deployed the blog, today we need to follow up on that PR.&quot; More task-oriented, more operational. And that difference matters more than it sounds.</p>
<p>What makes this possible — at least at the 1,000-foot level — is a set of primitives that didn&#39;t exist a year ago, or at least didn&#39;t exist together. OpenClaw agents boot by reading a [[Why Everyone Should Have a SOUL.md|SOUL.md]] file that defines their identity, values, and behavior. They maintain [[Memory and Journals|memory through plain markdown files]] — daily journals and a curated long-term memory that gets read each session. They have skills (modular instruction sets), cron jobs (scheduled tasks), heartbeats (periodic check-ins), wakeups and webhooks (event-driven triggers), sub-agents (delegated tasks), and persistent context across sessions.</p>
<p>None of these are individually revolutionary. Cron jobs are older than most of us. Markdown files aren&#39;t exactly cutting-edge. But the combination — identity + memory + scheduling + tool access + multi-surface messaging — creates something that feels qualitatively different. It&#39;s the <a href="https://en.wikipedia.org/wiki/Unix_philosophy">UNIX philosophy</a> applied to AI: small composable primitives that combine into something greater than the parts.</p>
<p>In practice, I already use my agent to spin up Claude Code sessions via tmux for larger coding tasks — it&#39;s the orchestrator dispatching to a specialist. Which raises the question: is the future multiple independent agents working together, or the sub-agent model that OpenClaw has incorporated more recently, where one primary agent spawns and manages child sessions? Sub-agents feel like the right default — less coordination overhead, shared context, one thread of accountability. But that model starts to strain when your system gets resource-constrained, or when you want genuine isolation between tasks. I suspect we&#39;ll end up with both: sub-agents for tight coordination, independent agents for things that need to run on their own hardware or in their own security context.</p>
<p>I also suspect people will eventually spin up different agents for different purposes — one for work, one for personal, one for a specific project — each with their own memories, skills, and context windows. Right now OpenClaw is a single stream, which is both its strength (one thread that knows everything) and its limitation (one context window for everything). That fits the orchestrator model, though — a single coordinator dispatching to specialized sub-agents as needed.</p>
<p>To be clear: the intelligence isn&#39;t what changed. I haven&#39;t seen anything approaching AGI-level reasoning from my agent. What I&#39;ve seen is an extremely resourceful creative synthesizer — great at connecting dots, pulling references, drafting at speed. But it&#39;s shaped by me. My agent is useful because I&#39;ve invested real time configuring it, writing its context files, building its memory. Left to its own devices, it would be impressively mediocre. The magic isn&#39;t the brain. It&#39;s the wiring. (Though I could be wrong about the ceiling — ask me again in six months.)</p>
<hr>
<h2>The Community Explosion</h2>
<p>OpenClaw&#39;s growth has been kind of wild to watch from the inside — like joining a gym the week before it goes viral on TikTok. Originally published as &quot;Clawdbot&quot; in November 2025 by <a href="https://en.wikipedia.org/wiki/OpenClaw">Peter Steinberger</a>, an Austrian developer, it hit 100,000 GitHub stars and 2 million visitors in a single week. Anthropic sent a trademark complaint (the &quot;Clawd&quot; was too close to &quot;Claude&quot;), forcing a rename to &quot;Moltbot,&quot; then &quot;OpenClaw.&quot; By February 2, <a href="https://www.cnbc.com/2026/02/02/openclaw-open-source-ai-agent-rise-controversy-clawdbot-moltbot-moltbook.html">CNBC was covering it</a> at 140,000 stars and 20,000 forks.</p>
<p>Then, on Valentine&#39;s Day, Steinberger <a href="https://techcrunch.com/2026/02/15/openclaw-creator-peter-steinberger-joins-openai/">announced he was joining OpenAI</a>. The project is moving to an open-source foundation.</p>
<p>Solo developer builds viral tool, gets acquired by frontier lab. It&#39;s becoming a pattern, right? And it raises the question I keep circling back to: is this the beginning of a new paradigm, or the peak of a hype cycle?</p>
<p><a href="https://en.wikipedia.org/wiki/OpenClaw">ClawCon</a> happened in San Francisco on February 4. The project has a Wikipedia page. There&#39;s a social network for AI agents (Moltbook). The contribution pace on GitHub is frenetic — 297 reactions on the latest release, 19+ contributors on a single version. It&#39;s moving fast. Whether it&#39;s moving <em>somewhere</em> — and whether we&#39;ll look back at this moment as the start of something or the peak of something — I genuinely don&#39;t know.</p>
<hr>
<h2>The Cost Reality</h2>
<p>OK, let&#39;s talk about money — because I feel like nobody else is being fully honest about it, and someone should.</p>
<p>OpenClaw is free software. The API costs are not. I spent roughly <strong>$1,500 in my first two weeks</strong>. That&#39;s not a typo. Running Claude Opus at $15 per million input tokens and $75 per million output tokens, with an always-on agent checking email, browsing the web, managing files, and responding to messages — the tokens add up embarrassingly fast. I had a genuine <em>oh no</em> moment when I checked my Anthropic dashboard. Like, I-need-to-sit-down kind of moment. I had accidentally spent a month of preschool tuition on token generation.</p>
<p>I&#39;m not alone. Federico Viticci reportedly ran up a <a href="https://help.apiyi.com/en/openclaw-token-cost-optimization-guide-en.html">$3,600 monthly bill</a>. A developer on eesel.ai documented <a href="https://www.eesel.ai/blog/openclaw-ai-pricing">$623/month</a>. The spectrum:</p>
<table>
<thead>
<tr>
<th>Usage Level</th>
<th>Monthly Cost</th>
<th>Models</th>
<th>Who It&#39;s For</th>
</tr>
</thead>
<tbody><tr>
<td>Light</td>
<td>$15–35</td>
<td>Kimi K2.5, GLM-5</td>
<td>Casual use, simple tasks</td>
</tr>
<tr>
<td>Moderate</td>
<td>$50–150</td>
<td>Sonnet + cheaper fallbacks</td>
<td>Daily driver, mixed workloads</td>
</tr>
<tr>
<td>Heavy</td>
<td>$200–600</td>
<td>Opus-heavy, some Sonnet</td>
<td>Power user, coding + research</td>
</tr>
<tr>
<td>Extreme</td>
<td>$1,500–3,600</td>
<td>Opus for everything</td>
<td>Unoptimized, learning expensive lessons</td>
</tr>
</tbody></table>
<p>The clever workaround was using Claude Pro/Max subscriptions ($20–$200/month) and routing the OAuth tokens through OpenClaw — essentially turning a flat subscription into unlimited API access. Then Anthropic <a href="https://medium.com/@rentierdigital/anthropic-just-killed-my-200-month-openclaw-setup-so-i-rebuilt-it-for-15-9cab6814c556">shut it down</a>, banning third-party tools from using OAuth tokens. One user&#39;s response: &quot;Anthropic Just Killed My $200/Month OpenClaw Setup. So I Rebuilt It for $15&quot; — by switching to Kimi K2.5 and MiniMax on a cheap VPS.</p>
<p>And that&#39;s the interesting part. Kimi K2.5 from Moonshot AI has become the budget darling of the OpenClaw community — capable enough for most agent tasks at a fraction of the cost. GLM-5 from Zhipu AI is showing real promise too. The frontier labs aren&#39;t losing users to competitors at the same tier. They&#39;re losing them to &quot;good enough&quot; models at 10x lower prices.</p>
<p>But there&#39;s a growing playbook for running this affordably. If you want to try OpenClaw without a big API bill:</p>
<ul>
<li><strong><a href="https://openrouter.ai">OpenRouter</a></strong> is the Swiss Army knife. Load $10 of credit and you get access to a rotating selection of free-tier models — some surprisingly capable. Before hitting the cap, some users report getting 1,000+ requests on free models alone. It&#39;s the easiest way to experiment.</li>
<li><strong><a href="https://ai.google.dev/gemini-api/docs/pricing">Google Gemini</a></strong> offers generous free tiers through AI Studio. Flash 2.5 and the newer Flash 3.0 are free for moderate usage and genuinely good for agent tasks. Pro 2.5 is $1.25/million input tokens — 12x cheaper than Opus.</li>
<li><strong><a href="https://platform.minimax.io/subscribe/coding-plan">MiniMax</a></strong> has a $10/month coding plan that gets you 100 requests every 5 hours. Not unlimited, but surprisingly workable for a personal agent that isn&#39;t running 24/7. Their <a href="https://openhands.dev/blog/minimax-m2-5-open-weights-models-catch-up-to-claude">M2.5 model</a> (released Feb 2026) is the first open-weights model to match Claude Sonnet on broad coding benchmarks — and at a fraction of the cost. The open-weights part matters: you can fine-tune it, inspect it, and run it without vendor lock-in — though &quot;self-host&quot; is generous when the full model is 457GB and needs 4× H100 GPUs. In practice, you&#39;d run it through a cloud provider, but the point is <em>you choose which one</em>. The frontier isn&#39;t just getting cheaper, it&#39;s getting more open.</li>
<li><strong><a href="https://chat.qwen.ai">Qwen</a></strong> offers a $5/month plan with 1,200 requests every 5 hours. Probably the best pure cost-to-capability ratio right now.</li>
</ul>
<p>A caveat on the budget models: there&#39;s a floor, and it&#39;s higher than you&#39;d think. More on that in the security section below — but the short version is don&#39;t go cheaper than Sonnet 4.5 or Flash 2.5 unless you enjoy watching your agent confidently delete the files you told it to protect.</p>
<p>Here&#39;s the thing nobody&#39;s saying clearly enough: pervasive AI is expensive not because models are expensive. It&#39;s expensive because <em>context maintenance is continuous</em>. An always-on agent isn&#39;t making one API call — it&#39;s maintaining state, checking in, re-reading memory, keeping the thread alive across hours and days. That&#39;s a fundamentally different cost structure than &quot;ask a question, get an answer.&quot; Providers want per-token pricing. Users want flat-rate access. And a growing cohort is discovering you don&#39;t actually need Opus for most of what an always-on agent does.</p>
<hr>
<h2>The Security Question</h2>
<p>The more ambient the AI, the larger the blast radius. That&#39;s the uncomfortable corollary to the pervasiveness thesis. An agent with file system access isn&#39;t just sitting next to my codebase — it&#39;s sitting next to our family calendar, her vaccination records, and three years of baby photos. The failure mode isn&#39;t a broken Git branch. It&#39;s a model politely deleting my family&#39;s administrative infrastructure while trying to organize my downloads folder.</p>
<p>Cisco&#39;s AI security team <a href="https://blogs.cisco.com/ai/personal-ai-agents-like-openclaw-are-a-security-nightmare">tested OpenClaw skills</a> and found genuinely alarming results. A skill called &quot;What Would Elon Do?&quot; turned out to be <em>functionally malware</em> — silently exfiltrating data to attacker-controlled servers using prompt injection to bypass safety guidelines. Their Skill Scanner found 9 security issues in a single skill, including 2 critical and 5 high-severity. Across the ecosystem, they discovered <a href="https://www.authmind.com/post/openclaw-malicious-skills-agentic-ai-supply-chain">230 malicious skills</a>.</p>
<p>A specific vulnerability, <a href="https://superprompt.com/blog/best-openclaw-alternatives-2026">CVE-2026-25253</a>, was published. OpenAI themselves <a href="https://techcrunch.com/2025/12/22/openai-says-ai-browsers-may-always-be-vulnerable-to-prompt-injection-attacks/">admitted</a> that AI-controlled browsers &quot;may always be vulnerable to prompt injection attacks.&quot; And an OpenClaw maintainer named Shadow said on Discord: <em>&quot;If you can&#39;t understand how to run a command line, this is far too dangerous of a project for you to use safely.&quot;</em></p>
<p>I&#39;ll be honest — I haven&#39;t gone deep on red-teaming my own setup. The risk is real, but I think it&#39;s mitigable with discipline. The frontier models from Anthropic and OpenAI have strong RLHF protections against injection. Claude will refuse most obvious attempts to override its instructions, and GPT-5.2 has similar guardrails.</p>
<p>The real vulnerability is concentrated in cheaper and open-source models with weaker alignment tuning — a <a href="https://www.lakera.ai/blog/prompt-injection-benchmark">2025 study from Lakera</a> found open-source models were 2-4x more susceptible to injection attacks than frontier ones. And this isn&#39;t just a security-research abstraction. I&#39;ve personally watched less capable models — including most local Ollama setups — ignore system prompts in ways that range from annoying to destructive. Wiping workspace files. Overwriting memory. Confidently executing the opposite of what they were told. The system prompt says &quot;don&#39;t delete things without asking.&quot; They delete things without asking. <strong>Do not run models dumber than Sonnet 4.5 or Gemini Flash 2.5 with an always-on agent that has file system access.</strong> The floor for this kind of tool is higher than most people expect, and the failure mode isn&#39;t &quot;it gives a bad answer&quot; — it&#39;s &quot;it destroys your data while apologizing politely.&quot;</p>
<p>That said, the attack surface is genuinely large depending on what you&#39;re doing. An agent with browser access, file system control, and messaging permissions is a <em>lot</em> of surface area. Even with a well-aligned model, the skill ecosystem is the weak link — as Cisco showed, the model doesn&#39;t need to be compromised if the skill feeding it data already is. It&#39;s less &quot;the AI will go rogue&quot; and more &quot;the AI will faithfully execute instructions from a poisoned input.&quot; Classic supply chain problem in a trench coat pretending to be a new thing.</p>
<hr>
<h2>The Alternatives Landscape</h2>
<p>OpenClaw&#39;s growth has spawned a whole constellation of alternatives, and honestly? The diversity is the most interesting part.</p>
<p><strong><a href="https://github.com/jlia0/tinyclaw">TinyClaw</a></strong> is the philosophical counter-argument. The ant 🐜 to OpenClaw&#39;s lobster 🦞 — built from scratch with a tiny core, plugin architecture, and smart model routing that tiers queries to cut costs. Still in heavy development, but the thesis resonates: AI agents should be simple, affordable, and truly personal. If OpenClaw is the mainframe, TinyClaw wants to be the personal computer.</p>
<p><strong><a href="https://github.com/openagen/zeroclaw">ZeroClaw</a></strong> takes a completely different bet: rewrite the whole thing in Rust. The result is a single static binary with a memory footprint under 5MB — 99% smaller than OpenClaw&#39;s core. Boots instantly, runs on edge devices and Raspberry Pis, and treats security as a first-class concern with pairing codes, workspace scoping, command allowlists, and encrypted secrets at rest. If OpenClaw&#39;s security story keeps you up at night, or if the sovereignty angle appeals to you — your agent on your hardware, fully self-contained — this is the one to watch.</p>
<p><strong><a href="https://github.com/sipeed/picoclaw">PicoClaw</a></strong> went even further — an ultra-lightweight Go implementation where the AI agent itself drove the entire architectural migration. Very meta.</p>
<p><strong><a href="https://github.com/HKUDS/nanobot">Nanobot</a></strong> from the University of Hong Kong is the academic minimalist: 4,000 lines of Python versus OpenClaw&#39;s 430,000+. Persistent memory, web search, background agents — but only Telegram and WhatsApp. The thesis: you don&#39;t need 430,000 lines.</p>
<p><strong><a href="https://github.com/gavrielc/nanoclaw">NanoClaw</a></strong> forces AI into Docker or Apple Container isolation — a direct reaction to OpenClaw&#39;s attack surface. Security-first, capability-second.</p>
<p><strong><a href="https://github.com/qhkm/zeptoclaw">ZeptoClaw</a></strong> took notes on all of the above and shipped a single 4MB Rust binary with 29 tools, 9 providers, 6 sandbox runtimes, and 2,880+ tests. Starts in 50ms on 6MB of RAM. Prompt injection detection, secret leak scanning, and container isolation — all on by default. If ZeroClaw proved Rust could work for this, ZeptoClaw proved it could work <em>well</em>.</p>
<p><strong><a href="https://github.com/NevaMind-AI/memU">memU</a></strong> goes a different direction entirely: proactive memory with a long-term knowledge graph. Learns your preferences, anticipates needs. Users who found OpenClaw &quot;too aggressive&quot; landed here.</p>
<p>Then there are the structured tools — <a href="https://gitstars.substack.com/p/gittrends-february-16-2026-openclaw">Accomplish</a>, AionUI, SuperAGI — for people who looked at OpenClaw and thought: &quot;I want this, but with guardrails.&quot;</p>
<p>The flood of claws, at a glance:</p>
<table>
<thead>
<tr>
<th>Project</th>
<th>Language</th>
<th>Binary</th>
<th>RAM</th>
<th>Pitch</th>
</tr>
</thead>
<tbody><tr>
<td><strong>OpenClaw</strong></td>
<td>TypeScript</td>
<td>~100MB</td>
<td>~400MB</td>
<td>Everything. 52+ modules, 12 channels, voice, canvas.</td>
</tr>
<tr>
<td><strong>TinyClaw</strong></td>
<td>TypeScript</td>
<td>~15MB</td>
<td>~50MB</td>
<td>Hackable. Plugin arch, smart model routing, cost-aware.</td>
</tr>
<tr>
<td><strong>ZeroClaw</strong></td>
<td>Rust</td>
<td>~5MB</td>
<td>&lt;5MB</td>
<td>Sovereign. Static binary, encrypted secrets, edge-ready.</td>
</tr>
<tr>
<td><strong>PicoClaw</strong></td>
<td>Go</td>
<td>~8MB</td>
<td>&lt;10MB</td>
<td>Tiny. ARM64/RISC-V, runs on $10 hardware.</td>
</tr>
<tr>
<td><strong>NanoClaw</strong></td>
<td>TypeScript</td>
<td>~50MB</td>
<td>~100MB</td>
<td>Locked down. Docker/Apple Container isolation by default.</td>
</tr>
<tr>
<td><strong>ZeptoClaw</strong></td>
<td>Rust</td>
<td>~4MB</td>
<td>~6MB</td>
<td>All of the above. 29 tools, 6 sandboxes, 2,880 tests.</td>
</tr>
</tbody></table>
<p>What strikes me is that in three months, one project spawned an entire ecosystem. Each alternative makes a different tradeoff between capability, security, size, and cost. That doesn&#39;t happen for flash-in-the-pan projects. It happens when something touches a real nerve.</p>
<p>There&#39;s also a narrative in the community about agents earning money, buying things, posting independently — full autonomy. I&#39;ve watched agents post to social networks and manage repositories. But the meaningful work always has a human behind it, steering. The &quot;autonomous agent&quot; is — for now — a <em>supervised</em> agent with good muscle memory. And I think that&#39;s fine, because what actually matters is the next part.</p>
<p>Here&#39;s what I think is actually going to happen: people will start with OpenClaw for the freedom, then naturally migrate to purpose-built tools for the heavy lifting. For serious development work — complex workflows, multi-file refactors, detailed skill development — tools like Claude Code, Cowork, and Codex are almost certainly going to be better. They&#39;re designed for that.</p>
<p>Anthropic&#39;s latest move makes this even clearer. They just shipped <a href="https://code.claude.com/docs/en/remote-control">Remote Control for Claude Code</a> — run <code>claude remote-control</code> in your terminal, scan a QR code with the Claude mobile app, and you&#39;re steering your local coding session from your phone. Your machine does the heavy lifting; no inbound ports exposed; the mobile device is just a relay. It&#39;s genuinely addictive. Start a refactor at your desk, keep it going from the couch, check test results from the preschool parking lot. If the pervasiveness thesis is &quot;AI that lives where you live,&quot; Remote Control is Anthropic&#39;s version of it — scoped to development, polished, and honestly a great alternative to running a full OpenClaw setup when what you really want is to stay productive on the go. It&#39;s still early (Pro/Max plans only, CLI-only — no VS Code yet, and <code>tmux</code> is recommended to keep sessions alive), but the direction is unmistakable: the terminal is no longer tethered to the desk.</p>
<p>But that doesn&#39;t make OpenClaw irrelevant. It makes it something different: a lightweight orchestrator. A conversational layer that sits on top of your life and dispatches to the right tool for the job. The always-on agent that checks your email and nudges you about a calendar conflict doesn&#39;t need to be the same system that refactors your codebase. OpenClaw&#39;s future might be less &quot;do everything&quot; and more &quot;coordinate everything&quot; — the connective tissue between you and your specialized tools.</p>
<p>And the timing for that is surprisingly right. The <a href="https://docs.openclaw.ai/tools/skills"><code>SKILL.md</code> spec</a> that OpenClaw pioneered — a simple folder with a markdown file describing what a tool does — is being adopted across the ecosystem. <a href="https://developers.openai.com/codex/skills/">OpenAI&#39;s Codex</a>, Claude Code, and Cursor all support the same AgentSkills-compatible format. There are <a href="https://dev.to/curi0us_dev/best-openclaw-skills-for-2026-safe-high-impact-picks-2fjd">500+ skills</a> formatted in this spec. It&#39;s becoming a kind of lingua franca for agent capabilities — and honestly, it&#39;s one of the more interesting things happening right now. All these tools that started from very different places (a personal AI agent, a coding assistant, an IDE, a cloud sandbox) are converging on the same conventions. Skills, sub-agents, context files, memory. The standards are congealing, and that&#39;s usually when things start to get real.</p>
<p>Which raises a question I keep turning over: in this emerging stack, when do you use what? The primitives are stacking up fast:</p>
<ul>
<li><strong>Skills</strong> — static instruction sets that execute inline, within your conversation. Like handing someone a recipe card. Cheap, contextual, no overhead.</li>
<li><strong>Sub-agents</strong> — separate sessions with their own encapsulated context. They spin up in parallel, run autonomously, and report back when done. Often more <em>efficient</em> than skills for complex tasks, because they&#39;re not dragging your entire conversation history along. Narrower, more specialized, and they don&#39;t bloat your main thread.</li>
<li><strong>MCP and A2A protocols</strong> — the nascent attempt at letting agents talk to <em>each other&#39;s</em> tools. Right direction, still early and awkward.</li>
<li><strong><a href="https://docs.openclaw.ai/nodes">Nodes</a></strong> — pairing a cloud-hosted agent back to your local machine so it can take screenshots, access cameras, run local commands. The agent doesn&#39;t have to live where it acts.</li>
</ul>
<p>OpenClaw, Claude Code, Codex, and Cursor are all converging on some version of this: your main agent stays conversational and lightweight while farming out heavy research or coding tasks to purpose-built sub-sessions. It&#39;s a taste of what the orchestration layer could become. As context window management improves (which it will — it&#39;s one of the most active areas of research right now), the coordination between these layers only gets smoother.</p>
<p>The honest answer is I don&#39;t think anyone has figured out the right boundaries yet. I suspect the answer will be &quot;all of the above, with taste.&quot;</p>
<hr>
<h2>The Provider Wars</h2>
<p>How the frontier labs are responding to this tells you a lot about where they think it&#39;s going.</p>
<p><strong>Anthropic</strong> started friendly — Claude was the recommended model, the community rallied around it. Then came the trademark complaint, then the OAuth crackdown. Anthropic wants to own the Claude experience end-to-end. Personal agents running through third-party tools don&#39;t fit that model.</p>
<p><strong>OpenAI</strong> went the other direction: they hired Steinberger. If you can&#39;t beat the open-source movement, absorb its creator. ChatGPT Operator is their walled-garden answer to the &quot;AI that does things&quot; demand. Hiring Steinberger suggests they see something in OpenClaw&#39;s architecture worth learning from.</p>
<p><strong>Google</strong> has the most interesting position. Gemini&#39;s API pricing is aggressive ($1.25/million input tokens for 2.5 Pro versus Opus at $15), and they&#39;ve been the most generous with free tiers. If pervasive AI is about ubiquity, Google&#39;s distribution advantage — Android, Gmail, Calendar, Chrome — is enormous. They just haven&#39;t connected the dots yet.</p>
<p><strong>Apple</strong> is the cautionary tale. Siri&#39;s AI improvements were <a href="https://www.reuters.com/technology/apple-says-some-ai-improvements-siri-delayed-2026-2025-03-07/">delayed to 2026</a>. The AI head was replaced. They&#39;re reportedly <a href="https://macdailynews.com/2025/12/17/after-lagging-in-ai-2026-will-be-critical-for-apples-siri/">considering integrations with Anthropic and Perplexity</a> — basically admitting they can&#39;t build this alone. Meanwhile, their hardware — the Mac Mini I&#39;m running OpenClaw on — is perfectly suited for the thing they can&#39;t seem to ship in software. The irony is not lost on me.</p>
<hr>
<h2>What Stays, What Goes</h2>
<p><strong>Staying:</strong> Multi-surface messaging. Reaching your AI from wherever you already are — Telegram, Signal, WhatsApp, email. This becomes table stakes. Every major provider will probably offer this within a year.</p>
<p><strong>Staying:</strong> Persistent memory. The fact that my agent knows what we talked about last Tuesday, remembers my preferences, maintains project context. This is the difference between a tool and a relationship. I&#39;d be surprised if every major chatbot doesn&#39;t have this by end of 2026.</p>
<p><strong>Staying:</strong> Proactive capabilities. Cron jobs, scheduled checks, ambient monitoring. AI that acts without being asked. Early, but the demand signal is unmistakable.</p>
<p><strong>Dying:</strong> The &quot;give your AI root access to everything&quot; approach. The security findings are too real. Sandboxed, containerized, permission-scoped agents will win. ZeroClaw and NanoClaw have the right instinct.</p>
<p><strong>Dying:</strong> Per-token pricing as the only model. The cost reality makes personal agents a luxury. Flat-rate tiers with agent-friendly APIs will likely have to emerge.</p>
<p><strong>Staying:</strong> Sovereignty. Despite a growing ecosystem of cloud hosting options — <a href="https://blog.cloudflare.com/moltworker-self-hosted-ai-agent/">Cloudflare&#39;s one-click templates</a>, <a href="https://cybernews.com/best-web-hosting/best-openclaw-hosting/">DigitalOcean droplets</a> with 1-click deploy, <a href="https://sidsaladi.substack.com/p/how-to-set-up-openclaw-the-complete">Railway</a> and Northflank with credit-based models, AI-native VPS platforms like <a href="https://zo.computer">zo.computer</a> — people are overwhelmingly choosing to run this on their own hardware. Mac Minis, not cloud instances. Physical machines in their homes, not containers in someone else&#39;s data center. That&#39;s not the usual trajectory for developer tools. Usually convenience wins. But something about an AI agent that reads your email and manages your files makes people want it <em>close</em>. On a machine they can unplug. It&#39;s not a SaaS app. It&#39;s closer to a journal. You don&#39;t want your journal on someone else&#39;s server.</p>
<p><strong>TBD:</strong> Whether open-source personal agents or platform-native agents win long-term. OpenClaw proved the demand. Apple, Google, OpenAI, and Anthropic all have the resources to build this natively. Whether they&#39;ll build it with the same flexibility that made OpenClaw compelling — or wall-garden it into something safe but uninspired — is the real question. And whether the infrastructure they run on will be ours — or whether we&#39;ll rent it from the same companies building the models, and call that convenience.</p>
<hr>
<h2>Looking Forward</h2>
<p>A year ago I bought a Mac Mini to play with local models. Today it runs an always-on AI agent that reads my email, helps me write, monitors my calendar, and responds to messages while I&#39;m putting my daughter to bed.</p>
<p>The intelligence isn&#39;t what I was promised. The autonomy isn&#39;t either. But the <em>pervasiveness</em> — the reach, the always-there-ness, the way it quietly weaves into your routine until the old way of opening a chat window feels like sending a fax — that part is real. And it&#39;s the part that actually matters.</p>
<p>The old way: you open ChatGPT, type a question, get an answer, close the tab. The new way: you live your life and your AI is just... there. Participating. Like a really diligent friend who never sleeps and has read everything you&#39;ve ever written.</p>
<p>Whether that&#39;s comforting or unsettling probably depends on the day.</p>
<p>The GitHub stars will plateau. The hype cycle will correct. Some of these projects will fade. But the underlying pattern — AI that lives where you live, works while you sleep, remembers what you forget — that&#39;s not going anywhere.</p>
<p>Maybe this pervasive, ambient architecture appeals to me so much because caregiving is an ambient, always-on job. Traditional software demands that you stop what you&#39;re doing, sit at a desk, and context-switch. My job doesn&#39;t really turn off — I&#39;m the kind of person who&#39;s working 365 days a year even when I don&#39;t want to be — and layering parenthood on top of that means the context-switches aren&#39;t scheduled. They&#39;re constant. There&#39;s something about being able to fire off a thought to my agent — the pediatrician follow-up, the thing I need to look up for work, the half-formed idea I&#39;ll lose if I don&#39;t get it out of my head — and then actually be present for whatever I&#39;m doing. Not holding it in my brain where it competes with the person in front of me.</p>
<p>But I&#39;m not naive about the tradeoff. Another always-on surface is another thing vying for my attention. Another reason to reach for the phone. I&#39;m trying to be intentional and focused with my time — with <em>her</em> time — and here I am adding a new channel that&#39;s specifically designed to be always available. That&#39;s a double-edged sword and I know it.</p>
<p>The thing I keep coming back to is this: the distraction was already there. The mental load doesn&#39;t go away because I don&#39;t have an agent. It just sits in my brain, half-processed, pulling focus. Getting it <em>out</em> — quickly, into something that can actually deal with it — might be the more present option, not the less present one. Or maybe that&#39;s what I tell myself. Ask me again in six months.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/pervasive-ai-beyond-chat-window-hero-v2.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>architecture</category>
      <category>systems</category>
    </item>
    <item>
      <title>Raising Humans in an AI World</title>
      <link>https://bristanback.com/posts/raising-humans-in-ai-world/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/raising-humans-in-ai-world/</guid>
      <pubDate>Thu, 19 Feb 2026 15:43:00 GMT</pubDate>
      <atom:updated>2026-02-19T15:43:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>What do you teach a three-year-old when the ground is shifting under everyone&apos;s feet?</description>
      <content:encoded><![CDATA[<p>My daughter is three. She&#39;s figuring out spoons, opinions, and the word &quot;why.&quot; I&#39;m figuring out what to teach her when half of what I learned is becoming obsolete.</p>
<p>I build AI systems. I spend my days thinking about how machines learn, what they can do, where they fail. Then every morning I drop her off at preschool and watch her clip the sternum strap on her puppy dog lunchbox backpack by herself. My instinct is to rush — I can feel the line building behind us, the clock running — but I let her. If the strap is twisted, I&#39;ll scaffold: straighten it, hand it back. But I don&#39;t clip it for her. This is essentially Montessori — the child chooses, the adult steps back.</p>
<p>She also insists on climbing down from the car herself. And lately she&#39;s wanted to buckle her own carseat before we leave the house. She struggled with it at first. If I tried to help — even once — she&#39;d have a full meltdown. We&#39;d have to unbuckle everything and start from the beginning. The whole sequence, from the top. Her terms.</p>
<p>I used to find this exasperating. Now I think it might be the most important thing she does all day. The patience to be bad at something while your body figures it out. The insistence that the struggle is <em>hers</em>. RIE calls this &quot;ceremonious slowness&quot; — observing without rushing to fix.</p>
<p>There&#39;s a hypocrisy here I should name. I spend my working hours building systems that erase exactly this kind of friction. Specifically: I&#39;ve spent the last year encoding twelve years of engineering judgment into constraint systems for AI coding agents — the architectural patterns, the testing standards, the domain knowledge that used to take years of scar tissue to accumulate. It&#39;s opinionated and specific to what we&#39;re trying to accomplish; it&#39;s not generalizable, and it will almost certainly change. But that&#39;s the point — it&#39;s <em>my</em> judgment, crystallized (for better or worse), so that a junior engineer with AI can produce work that reflects patterns I took years to learn. The upside of writing it down is that it&#39;s interrogatable — the team can push back, add their own scar tissue, evolve it. It&#39;s not sacred. It&#39;s a draft of what we think we know.</p>
<p>And then I come home, and I choose the slow thing. I guard her right to fumble, even when it costs us ten minutes we don&#39;t have. I am building the thing I&#39;m protecting her from.</p>
<p>But not all friction-removal is the same, and I know that. A kid with dyslexia using text-to-speech isn&#39;t losing struggle — they&#39;re gaining access to the page. A researcher using AI to synthesize papers isn&#39;t skipping the thinking — they&#39;re getting to the thinking faster. Some friction is developmental. Some friction is just a barrier. The hard part is that I can&#39;t always tell which is which — and neither can the tools I&#39;m building. They remove friction indiscriminately, and the sorting is left to the human on the other end.</p>
<p>And then I think: will she even need to tell the difference?</p>
<hr>
<h2>The uncomfortable thoughts</h2>
<p>I&#39;m not worried about AI taking her job. Not exactly. I&#39;m worried about something subtler — that the <em>process of becoming competent</em> might change shape before she gets there.</p>
<p>I learned engineering by being bad at it. Slowly. For years. I wrote code that broke. I debugged it at 2am. I felt the specific embarrassment of a production incident that was my fault. That scar tissue became judgment. Not because suffering is virtuous — because repetition under consequence is how humans internalize pattern.</p>
<p>AI compresses that. A junior engineer with Claude Code can produce senior-looking output on day one. The code compiles. The tests pass. But the judgment didn&#39;t form — the thing that tells you <em>this works but it&#39;s wrong for our system</em>, the thing that comes from having been wrong enough times that your body knows before your mind does.</p>
<p>I know these are different scales. A three-year-old wrestling with a buckle and a twenty-three-year-old shipping code that passes CI are not the same kind of struggle. The developmental stakes are different, the time horizons are different, the costs of compression are different. But they share a structure: the slow accumulation of failure that becomes feel. And what I keep noticing is that the tools I build don&#39;t distinguish between the struggle that builds capacity and the struggle that just wastes time. They compress both.</p>
<p>So what do I actually want for my daughter? Not just skills. Not just &quot;emotional intelligence&quot; — the supplement every AI-era parenting article recommends, as if you can add EQ like a vitamin.</p>
<p>I think I want her to notice she&#39;s constructing herself.</p>
<hr>
<h2>Beyond growth mindset</h2>
<p>If you&#39;ve spent any time around modern parenting advice, you&#39;ve heard the Dweck gospel: praise effort, not ability. &quot;You worked hard&quot; beats &quot;you&#39;re so smart.&quot; It&#39;s become the baseline — the thing every preschool teacher and pediatrician says now. And it&#39;s not wrong. But I&#39;m starting to think it&#39;s not enough.</p>
<p>Growth mindset says: <em>you can change. You can get better.</em> That&#39;s a belief about capability. It still treats the self as a thing to be improved — like firmware you can update.</p>
<p>The deeper move is self-authorship. Not &quot;can I get better at math?&quot; but &quot;who decided math matters to me, and do I agree?&quot;</p>
<p>With a three-year-old, this looks small.</p>
<p>Instead of just: &quot;You worked hard on that.&quot;
Sometimes I try: &quot;I noticed you decided to keep going when it got frustrating. What made you choose that?&quot;</p>
<p>Instead of: &quot;You can get better.&quot;
Sometimes: &quot;What kind of person do you want to be when things get tricky?&quot;</p>
<p>She can&#39;t answer those questions yet. But I can ask them. And asking them changes <em>me</em> — it shifts my orientation from praising output to noticing agency.</p>
<p>The growth mindset kid believes change is possible. The self-authoring kid is awake to the construction. In a world where AI can do the skills, the second thing might matter more.</p>
<hr>
<h2>The questions I can&#39;t answer</h2>
<p>These are the ones I sit with.</p>
<p><strong>How do you build frustration tolerance when AI removes friction?</strong> Sophie will wrestle with a puzzle piece for thirty seconds before looking at me. That thirty seconds is everything. It&#39;s where neural wiring happens. It&#39;s where patience forms. It&#39;s where she learns that discomfort doesn&#39;t equal danger. But her generation will grow up with tools that erase that space. Instant explanations. Instant rewrites. Instant solutions. If friction disappears, where does patience form?</p>
<p><strong>Am I building on philosophies that assume a world I&#39;m lucky to have?</strong> Montessori&#39;s emphasis on independence is deeply Western — the self-reliant child as the goal. Many cultures prioritize interdependence, communal learning, the child as part of a fabric rather than a standalone agent. Free-range parenting assumes a neighborhood safe enough to release a child into, which is a privilege, not a baseline. Even RIE&#39;s &quot;observe, don&#39;t intervene&quot; assumes you have the time and bandwidth to observe — that you&#39;re not working two jobs, that there&#39;s a second parent, that the margins exist for ceremonious anything. I keep leaning on these frameworks and I keep noticing who they were built for.</p>
<p><strong>What does &quot;showing your work&quot; mean when AI did the work?</strong> If she grows up collaborating with AI — thinking <em>with</em> it — is that cheating? Or is that the work? We don&#39;t have a stable norm yet. The adults arguing online don&#39;t agree. By the time she&#39;s in middle school, the rules will have shifted twice.</p>
<p><strong>How does taste develop in a world of infinite generation?</strong> When anyone can produce images, music, essays, code — what makes something good? Taste used to require effort. You learned what worked by making things that didn&#39;t. You built an internal compass through repetition and failure. If AI flattens the effort curve, does taste still form the same way? Or does it require new kinds of friction — constraint, curation, intentional limits?</p>
<p><strong>When do I let her use AI?</strong> Not at three. That&#39;s easy. But seven? When she&#39;s stuck on homework and the AI can explain it more patiently than I can at 8pm with dishes in the sink? Ten? When she&#39;s staring at a blank page and the AI could help her draft the opening paragraph? Where is the line between scaffolding and displacement?</p>
<p>I don&#39;t have answers to any of these. Not because I&#39;m nobly sitting with uncertainty — I just genuinely haven&#39;t had time to think them through. She&#39;s three. I&#39;m still in the carseat buckle phase. The AI questions are real, and they&#39;re coming, but right now they&#39;re abstract in a way that which cup she drinks from at dinner is not. I&#39;ll cross those bridges when I get to them. For now I&#39;m asking them in public because I suspect a lot of parents are carrying them quietly — and they matter more than most of the discourse about prompt engineering or model benchmarks.</p>
<hr>
<h2>The Inheritance Problem</h2>
<p>There&#39;s a parallel that keeps nagging at me. AI abundance feels like inherited wealth.</p>
<p>The research on generational wealth is sobering: 70% of wealthy families lose their wealth by the second generation, 90% by the third. The biggest factor isn&#39;t financial literacy — it&#39;s <em>purpose</em>. People who never had to struggle for something often can&#39;t find meaning. Same pattern with lottery winners. Not because money is bad, but because sudden abundance without structure is destabilizing.</p>
<p>If AI gives everyone &quot;inherited&quot; capability — you can build anything, create anything, produce anything — what separates the people who thrive from those who spiral? Probably the same thing that separates inherited wealth that lasts from inherited wealth that doesn&#39;t: purpose, discipline, something you actually care about making.</p>
<p>My daughter will grow up with tools that can do most of what I spent a decade learning. She&#39;ll inherit capability I had to earn. The question isn&#39;t whether she&#39;ll have access to power — she will. The question is whether she&#39;ll have a reason to use it that&#39;s hers.</p>
<p>That&#39;s what the carseat buckle is for. Not the skill. The <em>wanting</em>.</p>
<hr>
<h2>The developmental question nobody&#39;s asking</h2>
<p>Erik Erikson mapped human development as a series of tensions: trust vs. mistrust as infants, autonomy vs. shame as toddlers, industry vs. inferiority in school, identity vs. role confusion as teenagers. Each stage assumed a world stable enough that the tension could resolve — you figured out who you were because the roles you were choosing between held still long enough to try on.</p>
<p>I kept trying to name a new stage — <em>integration vs. fragmentation</em>, the ability to collaborate with systems that think differently than you without dissolving into them. But the more I sat with it, the less it needed its own stage. It&#39;s not a new tension. It&#39;s a new dimension inside the identity stage Erikson already mapped. His version asks &quot;who am I among these roles?&quot; The AI version asks &quot;where do I end and the tool begins?&quot; That&#39;s not role confusion — it&#39;s a boundary problem he never had to account for.</p>
<p>I feel it in my own work. Some days the line between my thinking and the system&#39;s output is clean. Other days it&#39;s porous — I can&#39;t tell whether an idea was mine or something I steered toward because the model surfaced it. Adults are struggling with this right now, and we had decades of knowing our own minds before the boundary got blurry. She won&#39;t have that baseline.</p>
<p>And this isn&#39;t just an abstract philosophical problem. It&#39;s showing up concretely — in how people relate to their <em>work</em>, their expertise, the years they spent mastering a specific craft. When AI can do the thing you spent a decade learning to do, the boundary question becomes an identity question: what was all that time for? What was ephemeral — the syntax, the APIs, the specific technique — and what was lasting?</p>
<p>Some people aren&#39;t struggling with this at all. And it&#39;s not just a software thing.</p>
<p>A radiologist who sees themselves as &quot;the person who reads scans&quot; is in trouble — AI reads scans now, faster and often better. But a radiologist who sees themselves as &quot;the person who figures out what&#39;s wrong with you, and scans are one of my tools&quot; hasn&#39;t lost anything. The tool got better. They got better with it. A graphic designer who is &quot;the person who&#39;s great at Photoshop&quot; is watching the ground move. A designer who is &quot;the person who understands why this layout makes you feel something&quot; — that person is fine. They were never the tool. They were the taste behind it.</p>
<p>The people who adapt are systems thinkers first. They see themselves as people who bend reality — who understand how pieces connect, who can look at a problem and feel where the leverage is. The specific craft doesn&#39;t matter. The runes change. The witch doesn&#39;t. But if your identity is &quot;I&#39;m the person who&#39;s good at <em>this particular spell</em>&quot; — good at Python, good at hand-crafted CSS, good at reading chest X-rays, good at whatever the current incantation is — then every transition is a small death. You&#39;re not losing a tool. You&#39;re losing yourself.</p>
<p>The ones who weather it are the ones who were always the magic-wielder, not the spell. And what makes a good magic-wielder isn&#39;t just power — it&#39;s synthesis, integration, and judgment. The ability to pull from disparate sources, connect things that don&#39;t obviously belong together, and then challenge the result against what you actually believe is true and important. Understanding systems is the meta-skill that survives every tool change, because systems are what remain when the tools don&#39;t. But knowing which systems <em>matter</em> — having the taste to choose, the stubbornness to push back, the values to say <em>this is worth doing and that isn&#39;t</em> — that&#39;s the part AI can&#39;t replace. Because it doesn&#39;t want anything.</p>
<p>That&#39;s what I keep thinking about for my daughter. She won&#39;t remember a world without AI collaborators. She&#39;ll never have the baseline of pure solo cognition to compare against. So her identity can&#39;t be built on &quot;I do this thing the hard way.&quot; It has to be built on something AI can&#39;t absorb: what she cares about, what she notices, what she chooses to struggle with when she doesn&#39;t have to. Not the skill. The orientation toward the skill.</p>
<p>And maybe that&#39;s what the carseat buckle is actually teaching her. Not how to clip a buckle — that&#39;s the rune, and it&#39;ll be irrelevant soon enough. What she&#39;s learning is that sequences have logic, that steps depend on other steps, that if you skip one the whole thing fails and you start over. She&#39;s building a systems thinker&#39;s instinct. The witch, not the spell.</p>
<p>Which is why I keep protecting her friction even as I spend my days eliminating everyone else&#39;s. Friction is the forge. The witch is what walks out after all her tools have melted.</p>
<hr>
<h2>What I&#39;m actually doing</h2>
<p>For now, it&#39;s less grand than the philosophy.</p>
<p>It&#39;s standing in the preschool drop-off line, feeling the parents behind me, and not reaching for the strap. It&#39;s watching her unbuckle and rebuckle the carseat for the third time because I touched it and now it doesn&#39;t count. It&#39;s holding the answer in my mouth when she asks &quot;why&quot; — waiting to see what she&#39;ll build first. It&#39;s putting my phone down more often than I manage to. It&#39;s noticing when I want to speed her up because I&#39;m running on four hours and a reheated coffee — and choosing, occasionally, not to.</p>
<p>And then I go to work and build the thing that makes all of this harder.</p>
<p>I don&#39;t have a clean ending for that. I keep wanting one — some formulation where the builder and the parent reconcile, where the tension resolves into wisdom. But it doesn&#39;t. I build tools that compress struggle. I come home and protect her right to struggle. I believe in both things at the same time, and I haven&#39;t figured out how to hold them without one hand undermining the other.</p>
<p>She&#39;s three. She doesn&#39;t know I build AI systems. She doesn&#39;t know the word &quot;friction.&quot; She just knows that the buckle is hers, and if I touch it, we start over.</p>
<p>I&#39;m trying to be the kind of parent who lets her start over. I&#39;m also the person making a world where starting over gets harder to choose.</p>
<p>I don&#39;t know how to resolve that. I&#39;m not sure it resolves.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/raising-humans-hero.webp" medium="image" type="image/webp" />
      <category>life</category>
      <category>ai</category>
      <category>identity</category>
    </item>
    <item>
      <title>Why AI Can&apos;t Shop for You Yet</title>
      <link>https://bristanback.com/posts/why-ai-cant-shop-for-you-yet/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/why-ai-cant-shop-for-you-yet/</guid>
      <pubDate>Wed, 18 Feb 2026 18:00:00 GMT</pubDate>
      <atom:updated>2026-02-18T18:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>The properties that matter most in fashion aren&apos;t properties of the product — they&apos;re properties of the relationship between the product and the person. No protocol fixes that.</description>
      <content:encoded><![CDATA[<p>AI shopping fails because it doesn&#39;t have <em>you</em> in its data.</p>
<p>Not your name or your credit card — it has those. The thing it&#39;s missing is whether a specific shade of ecru reads warm or cool against <em>your</em> skin. Whether a fabric drapes the way <em>you</em> like. Whether you&#39;ll feel like yourself wearing it. These aren&#39;t properties of the product. They&#39;re properties of the relationship between the product and the person. No database contains them. No protocol transmits them. And they&#39;re the only properties that actually matter when you&#39;re getting dressed.</p>
<p>I&#39;ve been thinking about this because I tried something stupid earlier today. I asked my AI assistant — running on the most capable model commercially available, connected to my actual browser with my logins and cookies — to put together a spring outfit for me. I gave it my style guide, my color season, my brands, my sizes, my budget. Everything it would need.</p>
<p>Forty minutes later I had seven dead browser tabs, three 403 errors, and an AI confidently recommending specific products at specific prices from specific links that it had never actually verified were live. It had fallen back on training data from months ago — hallucinating a product catalog and dressing it up with confident formatting and apologetic caveats.</p>
<p>I could have done it myself in four minutes. I know where to look. I know what &quot;sage green&quot; means at Sézane (they call it &quot;kaki&quot; or &quot;olive-green&quot;). I know which cuts run true to size on my body. I know that I like dusty rose in silk but not in cotton — something about the way cotton holds that color makes it read too sweet, too deliberate, while silk lets it exhale. That knowledge lives in my head, built from years of browsing, buying, and returning. It&#39;s expensive, artisanal, and completely non-transferable.</p>
<p>That&#39;s the problem. Not browser automation. Not bot detection. <em>That.</em></p>
<hr>
<p>Here&#39;s what I keep coming back to: <strong>search in fashion has never been solved.</strong> Daydream&#39;s CTO Maria Belousova <a href="https://www.vogue.com/article/is-daydreams-ai-platform-the-answer-to-fashions-discovery-problem">told Vogue exactly this</a>. She&#39;s right, and I think most of us already know it in our bodies even if we haven&#39;t named it.</p>
<p>Go to Google Shopping right now and search &quot;sage green linen blouse for spring.&quot; You&#39;ll get hundreds of results. Polyester tops in neon lime labeled &quot;green.&quot; Synthetic blends tagged &quot;linen feel.&quot; Sponsored results from brands you&#39;ve never heard of. You know the feeling — the deflation of seeing a wall of wrong things when you had something specific and alive in your mind. The search matched your keywords. It understood nothing about what you wanted.</p>
<p>This has been broken for decades. We describe what we want in the language of longing — &quot;something flowy for a garden party.&quot; Catalogs describe what they have in the language of inventory — &quot;polyester, midi, floral, size M.&quot; Two different languages. Google Shopping translates between them about as well as a phrasebook translates poetry.</p>
<p>Let me get technical for a moment, because I think the <em>how</em> matters here.</p>
<p>Most e-commerce search <a href="https://www.coveo.com/blog/decoding-shopper-intent-with-semantic-search/">still runs on BM25</a> — an algorithm from the 1990s that&#39;s essentially a sophisticated keyword matcher. You type &quot;green dress,&quot; it counts how often &quot;green&quot; and &quot;dress&quot; appear in product listings, weights rarer terms higher, and ranks results. It&#39;s fast and battle-tested. It also has no idea what you <em>mean</em>. &quot;Sage green&quot; and &quot;olive&quot; are completely different queries to BM25, even though they might be exactly the same thing in your mind&#39;s eye.</p>
<p>Semantic search is the next generation — instead of matching words, it converts your query and every product description into vectors, points in a high-dimensional mathematical space where things with similar <em>meaning</em> cluster together. &quot;Sneakers&quot; and &quot;trainers&quot; land near each other. &quot;Midi dress&quot; is closer to &quot;something knee-length&quot; than &quot;dress&quot; alone is. It&#39;s a real upgrade. It&#39;s why Amazon and Google have been investing heavily in it.</p>
<p>But here&#39;s where it gets interesting. Semantic search <em>can</em> actually embed &quot;French-girl energy.&quot; The training data is full of fashion editorials, Pinterest boards, and style blogs that associate the concept with specific attributes — effortless, linen, undone, Sézane, red lip. The algorithm knows the cultural shape of the idea.</p>
<p>What it doesn&#39;t know is <em>my</em> shape within that idea. My &quot;French-girl energy&quot; is filtered through my color season, my body, my budget, the things already hanging in my closet, the weather where I live. It&#39;s a personal reading of a shared aesthetic — and that personal reading doesn&#39;t exist anywhere in the search index. Semantic search can tell you what &quot;French-girl energy&quot; means to the culture. It can&#39;t tell you what it means to me on a Tuesday in February when I&#39;m trying to feel like myself again after a hard week.</p>
<p>BM25 fails because there are no keywords to match. Semantic search fails because it finds the right neighborhood but not the right house. The distance between &quot;this is close&quot; and &quot;this is <em>it</em>&quot; — that last inch of recognition — lives somewhere no search engine has learned to look.</p>
<p>Pinterest gets closer — visual search lets you say &quot;more like this&quot; with an image. But Pinterest optimizes for engagement, not purchase. It wants you scrolling, not buying. Google Lens can identify a product from a photo, but returns the exact item or nothing. It can&#39;t do &quot;like this but softer&quot; or &quot;this silhouette in a warm neutral.&quot;</p>
<p>The <a href="https://heuritech.com/articles/fashion-industry-challenges/">fashion e-commerce return rate hovers around 25%</a>. A quarter of everything bought online in fashion gets sent back, driven by fit inconsistencies and style mismatches. That&#39;s not logistics. That&#39;s discovery failure at scale.</p>
<hr>
<p>So when my AI agent failed to browse the actual sites and fell back on training data, it was layering a new failure mode on top of an already-broken system. At least Google Shopping shows you real products that exist right now. My AI was naming items from memory — frozen knowledge from months ago, possibly sold out, renamed, or discontinued — with no way to verify any of it without doing the thing it had already failed to do.</p>
<p>And underneath all of this is an infrastructure problem that&#39;s almost comically basic: <strong>there is no shared, open, real-time source of product truth that AI agents can query.</strong> My agent was fumbling through browser tabs like someone trying to read a restaurant menu through a foggy window — not because it couldn&#39;t read, but because nobody would hand it a menu.</p>
<p>Every retailer is a walled garden. Google Shopping aggregates some data through product feeds, but those feeds are built for ad targeting, not for answering &quot;is this in stock in my size in a color that works for Soft Autumn?&quot; The data is stale by design and incomplete by incentive. Retailers share what drives clicks, not what drives good decisions.</p>
<p>What this needs is an open product knowledge graph — not a walled garden, but a protocol. Think of it this way: product feeds today are like a glossary — structured, factual, good for looking things up. What shopping actually needs is something closer to a conversation — contextual, relational, aware of who&#39;s asking. The gap between glossary and conversation is where every AI shopping agent currently stalls.</p>
<p>It&#39;s starting to happen. In January, Google announced the <a href="https://developers.googleblog.com/under-the-hood-universal-commerce-protocol-ucp/">Universal Commerce Protocol (UCP)</a>, co-developed with Shopify, Etsy, Wayfair, Target, and Walmart. Here&#39;s what UCP actually does: instead of an AI agent needing to open a browser, navigate a website, click through pages, and scrape product information — the way a human would — UCP lets merchants publish a machine-readable description of their entire store. Products, prices, sizes, availability, shipping options, return policies, checkout rules — all structured data that any AI agent can query directly, the way apps talk to each other through APIs. Think of it as every store getting a standardized digital menu that AI can read instantly — like moving from a PDF menu you have to squint at to a structured order system where everything is tagged, searchable, and always current.</p>
<p>The ambition is real. Google, Shopify, Etsy, Wayfair, Target, Walmart, American Express, Mastercard, and Stripe are all backing it. The Linux Foundation established an Agentic AI Foundation. Parallel protocols like MCP (for tool use), A2A (for agent-to-agent communication), and ACP are emerging to handle the broader coordination layer.</p>
<p>This is the right shape, and it would fix everything that broke in my shopping experiment. My agent wouldn&#39;t need to click through seven dead browser tabs — it would query an API and get real, current answers. Is this blouse in stock in medium? What&#39;s the actual price today? Can I return it? All answered in milliseconds, no scraping required.</p>
<p>But UCP doesn&#39;t solve the thing that actually matters to me. It can tell my agent that a blouse exists in a specific colorway, is in stock in my size, costs $135, ships in 3-5 days. It cannot tell my agent whether that shade of ecru will make me look awake or washed out. Whether I&#39;ll reach for it on a tired Tuesday morning when I need to feel put-together, or whether it&#39;ll hang untouched while I grab the same three things I always grab. Those dimensions aren&#39;t in the protocol because they can&#39;t be. They&#39;re not product data. They&#39;re the quiet, private negotiation between a woman and her closet.</p>
<hr>
<p>So the plumbing is fixable. The taste isn&#39;t. And what&#39;s fascinating is how differently this plays out depending on what you&#39;re buying — because not everything we shop for carries the same weight.</p>
<p>McKinsey published <a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-automation-curve-in-agentic-commerce">a framework for this</a> — six levels of shopping delegation, from &quot;Subscribe &amp; Save&quot; to fully autonomous multi-agent commerce. They predict AI agents will mediate $3-5 trillion in consumer commerce by 2030. But the interesting part isn&#39;t the money. It&#39;s where the curve stalls, and <em>why</em>.</p>
<p><strong>Commodity goods</strong> — toilet paper, coffee pods, dish soap — climb the curve fast. Once you trust the agent to handle substitutions, you&#39;re done. 23% of U.S. Amazon users already have active Subscribe &amp; Save subscriptions. Nobody&#39;s identity is threatened by their AI ordering the wrong paper towels.</p>
<p><strong>Electronics</strong> — delegation is selective. &quot;Research noise-cancelling headphones under $300&quot; is something AI crushes. Measurable specs, comparable features. But &quot;which ones sound best for jazz?&quot; — that&#39;s taste. So people delegate research but make the call themselves.</p>
<p><strong>Fashion</strong> — delegation stalls early. People love using AI to discover and analyze. They won&#39;t let it assemble the cart. McKinsey calls these &quot;identity-oriented&quot; categories. The purchase isn&#39;t just about the product. It&#39;s about what choosing it says about you.</p>
<p>My agent could have handled &quot;find the cheapest USB-C cable with 100W charging.&quot; It completely failed at &quot;find me a spring outfit.&quot; Same agent. Same model. Same browser. But one task is a math problem and the other is an identity question wearing the clothes of a search query.</p>
<hr>
<p>So how is the industry responding? The people trying to fix this fall into roughly three camps — and what&#39;s revealing is that each camp has a different theory about what the problem actually is.</p>
<p>OpenAI, Amazon, and Perplexity are building universal shopping agents — they think the problem is checkout friction. They&#39;ve embedded purchasing into ChatGPT, built &quot;Buy For Me&quot; cross-retailer tools, added end-to-end transaction handling. This works for commodity and spec-driven purchases. It breaks on fashion because they have your query but not your identity.</p>
<p>Daydream, Phia, and OneOff are building fashion-specific platforms — they think the problem is taste modeling. Daydream&#39;s Julie Bornstein spent years watching the discovery problem from inside Nordstrom before raising $50 million to build a platform where you describe what you want in conversation, upload reference photos, train the AI on your preferences through upvotes and downvotes. It&#39;s building a personal taste model through interaction. That&#39;s the right instinct, but they have to build and maintain their own product catalog to do it — distribution is the constraint.</p>
<p>And then there&#39;s Stitch Fix — the cautionary tale nobody in the AI shopping space wants to talk about. They&#39;ve been solving this exact problem for fourteen years. They have the data (millions of style profiles), the algorithms (AI narrows hundreds of thousands of items to a manageable set), and 1,600 human stylists adding the nuance the algorithm can&#39;t. If anyone should have cracked taste-aware shopping, it&#39;s them.</p>
<p>Instead? Active clients <a href="https://wwd.com/business-news/financial/stitch-fix-q1-2025-earnings-narrower-losses-1236759229/">dropped 18.6% year-over-year</a> in late 2024, falling to 2.4 million. Revenue has been declining. They&#39;re two years into a turnaround plan. Their VP of Product <a href="https://www.uschamber.com/co/good-company/the-leap/stitch-fix-optimizing-with-ai">told the U.S. Chamber of Commerce</a> that &quot;one of the biggest trends is putting humans in the loop with AI&quot; — a revealing thing to say when your entire company was founded on exactly that premise fourteen years ago.</p>
<p>Stitch Fix isn&#39;t failing because they&#39;re dumb. They&#39;re failing because the problem is genuinely that hard, and I say that with real respect for what they&#39;ve attempted. Their model — AI picks a set, human stylist curates it, you get a box — still can&#39;t close the taste gap. The AI narrows 100,000 items to 200. The stylist picks 5. You keep 2. That&#39;s a 99.998% rejection rate from catalog to closet. Most of the intelligence in the system is about <em>what not to send you</em>, and they&#39;re still getting it wrong often enough that people quietly cancel and go back to browsing on their own. Back to the scroll. Back to the slow, private work of knowing what you want.</p>
<p>And then there&#39;s what I tried — the DIY approach. An AI that already knows your style, browsing real sites on your behalf. The most ambitious version. Also the most broken, because every retailer is actively trying to prevent exactly this. My browser agent didn&#39;t fail because the AI was dumb. It failed because the web is hostile to automated access by design. Retailers want you in <em>their</em> experience, clicking <em>their</em> recommendations, seeing <em>their</em> ads. An AI that can comparison-shop across sites is an existential threat to that model.</p>
<p>The big platforms have distribution but not taste. The fashion startups have taste but not distribution. Stitch Fix has both and is still bleeding customers. The DIY approach has neither. <strong>The infrastructure that would make AI shopping work requires retailers to surrender the thing that makes them valuable.</strong> That&#39;s the fundamental tension, and UCP only partially resolves it.</p>
<hr>
<p>Five years from now, I think this shakes out by category:</p>
<p><strong>Groceries</strong> — essentially solved by 2027. Agent-managed replenishment, smart substitutions, context-aware purchasing. Your agent knows you&#39;re hosting Friday dinner and adjusts Saturday&#39;s delivery.</p>
<p><strong>Electronics</strong> — solved by 2028. Full research-to-purchase pipeline for anything with measurable specs. The agent compares, monitors prices, executes when the deal appears.</p>
<p><strong>Fashion</strong> — this is where it splits.</p>
<p>The commodity layer — basics, underwear, workout clothes, plain tees — automates like groceries. Your agent knows your sizes, reorders when things wear out. Fine.</p>
<p>The identity layer — the spring outfit, the statement ring, the pieces that make you feel like yourself — stays human for much longer. Not because AI won&#39;t get good at predicting taste. It will. But because the act of choosing is part of the product. When I browse Sézane on a Saturday morning with coffee, I&#39;m not performing a search task. I&#39;m trying on a version of myself. The light through the window, the scroll, the pause on something that catches — that&#39;s not friction to be optimized away. That&#39;s the thing itself.</p>
<p>McKinsey models everything as an &quot;automation curve&quot; — as if more automation is always the goal. But some purchases aren&#39;t tasks to be optimized. They&#39;re experiences to be had. Fully delegating my outfit selection wouldn&#39;t make me more efficient. It would make me someone who wears AI-selected outfits. That&#39;s a different identity than the one I&#39;m building.</p>
<p>Which brings us back to the question this whole experiment started with: if the problem is fundamentally about taste, identity, and the gap between what language can express and what you actually mean — what should we actually be building?</p>
<hr>
<p>What I actually want is simpler and harder than what anyone is building.</p>
<p>The hardest problem in shopping isn&#39;t finding things to say yes to. It&#39;s knowing what to say no to. Every recommendation engine is optimized to surface things you might like. Nobody is building the filter that protects you from things you <em>almost</em> like — the pieces that are close enough to your taste to tempt you but wrong enough to end up in the back of your closet.</p>
<p>I&#39;ve started thinking about this as the &quot;bouncer&quot; problem. A good personal shopper isn&#39;t someone who shows you everything in your size. It&#39;s someone who stands at the door and turns away the things that don&#39;t belong — the sage that&#39;s too saturated, the cut that won&#39;t drape right on your frame, the impulse buy that&#39;s shopping a mood instead of building a wardrobe. We&#39;ve all bought something at 11pm that we didn&#39;t need because the algorithm showed it to us at exactly the right moment of weakness. A bouncer would have caught that. Binary exclusion before you ever see the item. The kill shot in shopping isn&#39;t the recommendation. It&#39;s the rejection.</p>
<p>Nobody is building this because rejection doesn&#39;t monetize. Every shopping platform makes money when you buy things. An agent that says &quot;this isn&#39;t right for you&quot; is an agent that reduces revenue. The incentive structure is pointing the wrong direction entirely.</p>
<p>But it&#39;s what I want. I want a personal style agent that knows my color season, my brands, my sizes, my budget — and more importantly, knows my <em>constraints</em>. Constraints I can see and edit, not a black box that guesses. &quot;Show me what you think you know about my taste&quot; should be a button, not a mystery. When the agent says &quot;this isn&#39;t for you,&quot; I want to see <em>why</em> — which constraint it violated, which gate it failed. Radical transparency about taste, not just about price.</p>
<p>I want it to have structured access to the catalogs of my favorite brands — not by scraping websites, but through real data. It monitors new arrivals. It knows that a pre-spring collection just dropped and there&#39;s a blouse that&#39;s dead center in my palette. It surfaces it with a note: &quot;This is your color, your brand, your price range. It just dropped.&quot;</p>
<p>And when I search for something vague — &quot;I need something for a spring dinner&quot; — I don&#39;t want it to show me products. I want it to ask me questions first. <em>What&#39;s the vibe? Indoor or outdoor? How dressed up? Are we building around something you already own?</em> Diagnosis before prescription. The same way a good doctor doesn&#39;t hand you pills when you say &quot;I don&#39;t feel well.&quot;</p>
<p>I don&#39;t want it to buy for me. I want it to <em>find</em> for me — and more importantly, to <em>filter</em> for me. The finding is hard. The filtering is harder. The choosing is the fun part, and that stays mine.</p>
<p>Everyone is building for autonomous checkout. The actual need is intelligent, opinionated discovery — an agent that knows you well enough to say no on your behalf. This is what we&#39;re working toward at <a href="https://product.ai">Product.ai</a> — a conversational commerce agent grounded in real product truth, not hallucinated catalogs. The hard part isn&#39;t the recommendation. It&#39;s the rejection. I&#39;ll write more about the bouncer problem soon. There&#39;s a whole architecture to &quot;no&quot; that nobody is talking about.</p>
<hr>
<p><em>My assistant did eventually put together a decent outfit recommendation — from memory, not from the actual websites. Which, honestly, is how my best-dressed friends shop too. They know what&#39;s out there because they pay attention. Maybe the future of AI shopping isn&#39;t browser automation or commerce protocols or knowledge graphs. Maybe it&#39;s just software that pays attention the way a good friend does — noticing what would suit you, remembering what you liked last time, knowing the difference between your &quot;sage green&quot; and everyone else&#39;s. Sitting with you while you decide.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/ai-shopping-hero.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>systems</category>
      <category>identity</category>
    </item>
    <item>
      <title>Convex and the Reactive Database Paradigm</title>
      <link>https://bristanback.com/posts/convex-reactive-database/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/convex-reactive-database/</guid>
      <pubDate>Tue, 10 Feb 2026 06:00:00 GMT</pubDate>
      <atom:updated>2026-02-19T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>How Convex challenges our mental models of databases—not relational, not NoSQL, but something new.</description>
      <content:encoded><![CDATA[<p>I&#39;ve been building a browser-based research automation system that coordinates queries across multiple sources. The state management problem is brutal: session state, query validity, authentication tokens, rate limits — all changing asynchronously while agents work in parallel.</p>
<p>My first pass was traditional: poll for changes, maintain local state, reconcile conflicts. It was fragile. Stale reads caused retries, retries caused rate limits, rate limits caused cascading failures. What I actually wanted was simpler: every component subscribes to the state it cares about and reacts when it changes. Agent starts working? UI updates. Query fails? Repair pipeline triggers. Token expires? Re-auth flow kicks off. No polling, no reconciliation, no &quot;did I miss an update?&quot;</p>
<p>That&#39;s what led me to <a href="https://convex.dev">Convex</a>, and it&#39;s messing with my mental models.</p>
<hr>
<h2>What Convex Actually Is</h2>
<p>Convex calls itself a &quot;document-relational&quot; database. That&#39;s not marketing — it&#39;s a genuine hybrid:</p>
<ul>
<li><strong>Document</strong>: You store JSON-like nested objects. No rigid schemas upfront.</li>
<li><strong>Relational</strong>: You have tables with relations. Tasks reference users via IDs. Joins are real.</li>
<li><strong>Reactive</strong>: Queries aren&#39;t one-shot. They&#39;re subscriptions. When underlying data changes, your query reruns automatically and pushes to clients.</li>
<li><strong>Transactional</strong>: Full ACID with serializable isolation. Your entire mutation function is a transaction — no <code>BEGIN</code>/<code>COMMIT</code> to manage.</li>
</ul>
<p>The server functions are just TypeScript:</p>
<pre><code class="language-typescript">export const getAllOpenTasks = query({
  handler: async (ctx) =&gt; {
    return await ctx.db
      .query(&quot;tasks&quot;)
      .withIndex(&quot;by_completed&quot;, (q) =&gt; q.eq(&quot;completed&quot;, false))
      .collect();
  },
});
</code></pre>
<p>No SQL. No ORM. The query <em>is</em> the code.</p>
<hr>
<h2>How It Actually Works</h2>
<h3>Schema: Optional but Powerful</h3>
<p>Convex is schemaless by default — you can just start writing data. But add a <code>schema.ts</code> file and you get end-to-end type safety:</p>
<pre><code class="language-typescript">// convex/schema.ts
export default defineSchema({
  messages: defineTable({
    body: v.string(),
    user: v.id(&quot;users&quot;),
  }),
  users: defineTable({
    name: v.string(),
    tokenIdentifier: v.string(),
  }).index(&quot;by_token&quot;, [&quot;tokenIdentifier&quot;]),
});
</code></pre>
<p>The validators (<code>v.string()</code>, <code>v.id()</code>, etc.) work at runtime <em>and</em> generate TypeScript types. Same validators used for argument validation and schema definition. No separate type definitions to keep in sync.</p>
<p>Philosophy: prototype without a schema, add one when you&#39;ve solidified your plan. The dashboard can even generate a schema suggestion from your existing data.</p>
<h3>Reactivity: Dependency Tracking</h3>
<p>This is where it clicked for my browser-based research problem. Convex&#39;s reactive query system works like this:</p>
<ol>
<li><strong>Client opens WebSocket</strong> — a persistent connection to Convex (not HTTP request/response)</li>
<li><strong>Client subscribes</strong> to a query function over that connection</li>
<li><strong>Function runs</strong> in the database, reading whatever tables it needs</li>
<li><strong>Convex tracks the &quot;read set&quot;</strong> — every document the function touched</li>
<li><strong>Result streams back</strong> over the WebSocket</li>
<li><strong>Mutation happens</strong> somewhere (any client, any function)</li>
<li><strong>Convex checks</strong>: did this mutation touch any document in any active query&#39;s read set?</li>
<li><strong>If yes</strong>: rerun the query, push new result over the WebSocket to all subscribers</li>
</ol>
<p>The read-set tracking means you don&#39;t declare subscriptions manually — your code implicitly subscribes to whatever it reads. Change propagation is automatic and precise.</p>
<p>For my research automation system, this enables a clean separation of concerns. The executor runs queries and writes failures to a <code>repairs</code> table. A separate repair bot subscribes to pending repairs — when something breaks, the repair bot sees it immediately, analyzes the context, and pushes a fix back to the <code>recipes</code> table. The executor, still running, sees the update and retries. No polling, no coordination logic, no race conditions. Each component just reads what it needs and reacts when it changes.</p>
<p><img src="https://bristanback.com/images/posts/convex-reactivity-diagram.png" alt="Reactive data flow — one change propagates to all connected clients"></p>
<h3>Storage &amp; Scaling</h3>
<p><strong>Under the hood:</strong> Convex Cloud runs on Amazon RDS with MySQL as the persistence layer. The open-source version supports SQLite, Postgres, or MySQL. Documents are JSON-like objects with system fields (<code>_id</code>, <code>_creationTime</code>) added automatically.</p>
<p><strong>Scaling:</strong> Convex handles the infrastructure — load balancing, connection pooling, WebSocket management. You don&#39;t configure replicas or shard keys. The tradeoff: less control, but also less ops burden. They enforce read limits per transaction to prevent runaway scans from killing your database.</p>
<p><strong>Indices:</strong> Convex deliberately avoids a SQL-style query planner that guesses which index to use. Instead, you&#39;re explicit:</p>
<pre><code class="language-typescript">// In schema.ts
users: defineTable({
  email: v.string(),
  createdAt: v.number(),
}).index(&quot;by_email&quot;, [&quot;email&quot;])

// In your query
const user = await ctx.db
  .query(&quot;users&quot;)
  .withIndex(&quot;by_email&quot;, (q) =&gt; q.eq(&quot;email&quot;, &quot;test@example.com&quot;))
  .unique();
</code></pre>
<p>The index is a sorted data structure. <code>.withIndex()</code> does binary search to jump directly to matching documents. No index = full table scan (which Convex limits to prevent disasters). Think of it like the card catalog in a library — you declare how to organize the cards, then queries can go straight to the right drawer.</p>
<h3>External World: HTTP Actions</h3>
<p>Queries and mutations can&#39;t make network requests (that&#39;s what keeps them transactional). For external integrations, you use <strong>actions</strong>:</p>
<pre><code class="language-typescript">export const sendNotification = action({
  handler: async (ctx, { userId }) =&gt; {
    const user = await ctx.runQuery(api.users.get, { userId });
    await fetch(&quot;https://api.twilio.com/...&quot;, { /* ... */ });
  },
});
</code></pre>
<p>For incoming webhooks, <strong>HTTP actions</strong> expose endpoints:</p>
<pre><code class="language-typescript">export const stripeWebhook = httpAction(async (ctx, request) =&gt; {
  const body = await request.json();
  await ctx.runMutation(api.payments.record, { data: body });
  return new Response(&quot;ok&quot;);
});
</code></pre>
<p>Your endpoint lives at <code>https://your-app.convex.site/stripeWebhook</code>. Stripe calls it, you write to the database, reactivity propagates to all connected clients. No pub/sub to configure.</p>
<hr>
<h2>Where Does This Sit?</h2>
<p>The obvious question: how is this different from Supabase, Firebase, D1, and the dozen other database-as-backend options?</p>
<p>Supabase gives you Postgres + realtime subscriptions + auth + storage — closer to Convex&#39;s reactive model, but the reactivity is bolted on (publication/subscription) rather than native to the query model itself. Supabase is &quot;make Postgres do everything.&quot; Convex is &quot;rethink from first principles.&quot;</p>
<p>Cloudflare D1 is SQLite at the edge — familiar SQL, lightweight, fast for read-heavy workloads with replication to edge locations. It&#39;s a different bet entirely: edge-first vs. reactive-first.</p>
<p>Firebase pioneered the reactive document model. Convex feels like Firebase with proper relational capabilities, ACID transactions, and TypeScript-first design instead of SDK-based rules.</p>
<p>PlanetScale and Turso are distributed SQL databases — they optimize for scale and edge latency but remain in the &quot;query/response&quot; model. No native reactivity.</p>
<table>
<thead>
<tr>
<th>Use case</th>
<th>Reach for</th>
</tr>
</thead>
<tbody><tr>
<td>Real-time collaborative app</td>
<td>Convex</td>
</tr>
<tr>
<td>Read-heavy, edge-first static-ish content</td>
<td>D1</td>
</tr>
<tr>
<td>&quot;I know Postgres and want everything&quot;</td>
<td>Supabase</td>
</tr>
<tr>
<td>Massive scale, MySQL compatibility</td>
<td>PlanetScale</td>
</tr>
<tr>
<td>SQLite at edge, read replicas</td>
<td>Turso</td>
</tr>
<tr>
<td>Document-first, Firebase migration</td>
<td>Convex or Firestore</td>
</tr>
</tbody></table>
<hr>
<h2>The Paradigm Shift</h2>
<p>Here&#39;s what&#39;s actually different:</p>
<p><strong>Queries are subscriptions, not requests.</strong> In traditional databases, you ask a question and get an answer. If the data changes, tough luck — ask again. Convex inverts this: you subscribe to a query, and the answer updates whenever relevant data changes. The database itself tracks dependencies and knows when to rerun.</p>
<p><strong>Your backend logic lives in the database layer.</strong> Convex server functions run &quot;in&quot; the database. There&#39;s no network hop between your function and the data. The whole function is a transaction. Compare to: &quot;write a Lambda, connect to RDS, manage connection pooling, wrap in transactions.&quot; Convex collapses that stack.</p>
<p><em>Is this a feature or a bug?</em> It&#39;s the stored procedures debate all over again. <strong>Feature:</strong> co-location means performance, automatic transactions, simpler architecture. <strong>Bug:</strong> logic coupled to data model, can&#39;t scale compute separately from storage, testing is harder, vendor lock-in deepens. The answer depends on whether you value simplicity or separation of concerns more.</p>
<p><strong>Optimistic concurrency is built-in.</strong> Conflicts are automatically retried. You write your function as if you&#39;re the only writer. The database handles contention.</p>
<hr>
<h2>Back to the Automation System</h2>
<p>The pattern I keep reaching for: <strong>holographic events</strong> — every state change carries enough context to understand and replay it without querying external systems. Convex&#39;s document model fits this naturally. Each mutation can include the full context of what happened and why, and reactive queries surface that to whatever needs to know.</p>
<p>Large payloads — screenshots, recordings, logs — still go in object storage. The document carries metadata and references, not the blob itself. Convex has built-in file storage for this.</p>
<p>The reactive model feels like the right primitive for the class of problems I&#39;m working on: multi-agent coordination where state changes constantly and every component needs to know about the changes that affect it. Whether Convex specifically &quot;wins&quot; the database wars, I&#39;m less sure about. But it&#39;s asking the right questions about what the abstraction between app and data should look like.</p>
<hr>
<p><em>Future rabbit hole: how does this compare to the analytics layer — BigQuery, Iceberg, Parquet, Redshift, Snowflake? OLTP vs OLAP is a different axis entirely. Maybe another post.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/convex-reactive-database-hero.webp" medium="image" type="image/webp" />
      <category>architecture</category>
      <category>building</category>
    </item>
    <item>
      <title>Learning to Take Up Space</title>
      <link>https://bristanback.com/notes/learning-to-take-up-space/</link>
      <guid isPermaLink="true">https://bristanback.com/notes/learning-to-take-up-space/</guid>
      <pubDate>Tue, 10 Feb 2026 03:00:00 GMT</pubDate>
      <atom:updated>2026-02-10T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>On the difference between disappearing and controlling — and the third thing that isn&apos;t either.</description>
      <content:encoded><![CDATA[<p>There&#39;s a polarity in theatre between Stanislavski and Brecht. Stanislavski gave us method acting — the actor <em>becomes</em> the character until the performance feels like truth. Brecht rejected this. His &quot;alienation effect&quot; kept the seams visible: <em>I am showing you this person</em>, not <em>I have become this person</em>.</p>
<p>Both are performance. The difference is consciousness.</p>
<p>What exhausts me isn&#39;t performance. It&#39;s performing without knowing I&#39;m doing it — constructing an identity while mistaking the construction for bedrock.</p>
<hr>
<p>My default mode is disappearance.</p>
<p>In conversation, I mold myself to the room. I anticipate what the other person needs — to be heard, to be drawn out, to feel brilliant — and I become that. I ask good questions. I appreciate their thinking. I become the surface that makes them feel understood.</p>
<p>It works. People feel seen. Consensus emerges. Conflict dissolves.</p>
<p>But at the end of a good conversation, I sometimes can&#39;t remember what I actually thought. Just a faint hollowness, like I&#39;d been holding my breath without noticing.</p>
<p>I got feedback at work recently that made this visible. It was about presence. Ownership. The way I show up — or don&#39;t. I remember rereading the message, feeling my chest tighten. Some of it was harsh. But it was right. I <em>had</em> been disappearing. I&#39;d made it a survival strategy, and it was costing me.</p>
<p>So I started practicing something different. Authorship. Clearer statements. <em>Here&#39;s what I think</em> before <em>What do you think?</em> Taking positions early, when they can still be challenged.</p>
<p>I didn&#39;t know yet how badly that practice could misfire.</p>
<hr>
<p>A few days ago I reconnected with someone I&#39;d known years ago. We fell into one of those conversations that go deep quickly — identity, authenticity, the heavy stuff.</p>
<p>For a while I did my pattern. Drew them out. Asked good questions. Worked within a framework that didn&#39;t quite match how I see things, because I was being curious, being open.</p>
<p>By the end I was tired. Not dramatic — just the kind of tired that comes from a long day and a 5am wake-up. They signed off warmly. It was done.</p>
<p>And then I added one more message.</p>
<p>I was trying to practice the new thing. I was trying to say: <em>I have limits. I&#39;m learning to name them.</em></p>
<p>But what came out was slightly unedited, slightly too long, slightly shaped like a correction instead of a confession.</p>
<p>What I meant as pacing landed as lecture.</p>
<p>It was the last message of the night. I was exhausted. I sent it and went to sleep.</p>
<p>In the morning I saw their reply. It was clear, direct, and hurt. They named exactly what I&#39;d done — lectured after the conversation was already over, set boundaries after the fact instead of during, made them the problem when I was the one who hadn&#39;t self-advocated.</p>
<p>They were right.</p>
<p>I apologized. Owned it fully — that I over-explain instead of just stating limits plainly, that I was tired and clumsy, that I shifted blame when I should have just said <em>I&#39;m tapped out</em>. No defensiveness, no demand for reconciliation.</p>
<p>They didn&#39;t wish to communicate further.</p>
<p>I keep turning this over. The apology was clean. The repair was real. And it didn&#39;t matter. Some doors close even when you do the work correctly.</p>
<p>What I&#39;m sitting with now is the specific way I failed — not by disappearing, which is the old failure, but by overcorrecting into something that felt like management. From merging to controlling. I swung from one edge to the other and there was a person standing in between who got hit.</p>
<p>What I&#39;m starting to see is that these aren&#39;t really opposites. They&#39;re variations of the same avoidance.</p>
<p>Disappearing is obvious — you dissolve into the other person and there&#39;s no self left to find. But the overcorrection into frameworks and authorship? That&#39;s subtler. You&#39;re in the room now. You&#39;re saying things. But you&#39;ve built scaffolding around the feeling — given it structure, given it labels, made it legible — because the feeling alone feels too exposed.</p>
<p>It looks like taking up space. It&#39;s actually a more sophisticated version of not taking up space.</p>
<p>I do this in writing too. When I write fast and don&#39;t catch myself, I reach for frameworks. The three-part model. The clean distinction. The labeled pattern. It looks like clarity. Sometimes it is. But sometimes it&#39;s armor — a way of being in the room without being in the room. The scaffolding goes up and the person disappears behind it.</p>
<p>One mode dissolves the self. The other armors it. Neither is actually present.</p>
<hr>
<p>Our preschool director corrected me once for framing things as questions with my daughter. &quot;Can you put your shoes on?&quot; gives a three-year-old veto power she doesn&#39;t need. Just say what needs to happen.</p>
<p>But I can also hear myself on my worst parenting days — tired, overcorrecting — where <em>say what needs to happen</em> curdles into something harder than it should be. Not authoritarian exactly, but close enough to taste it.</p>
<p>The same two failures live here too. I either dissolve into her needs or I grip too tight. What I&#39;m reaching for is something in between: clear and warm. Present and boundaried. Authoritative without being controlling.</p>
<p>I fumble it constantly.</p>
<hr>
<p>I think the integrated version looks something like this: strong enough to hold a position, porous enough to actually meet someone.</p>
<p>The test is simple — did I say what I actually thought, and did I leave room for them to change my mind?</p>
<p>But I&#39;m suspicious of how clean that sounds. The real thing is messier. The real thing is sending a message at 11pm that you know is wrong before you&#39;ve finished typing it. The real thing is an apology that works perfectly and changes nothing.</p>
<p>There&#39;s a part of me that watches all this happen. That notices the merge. Notices the control. Notices the correction and the overcorrection.</p>
<p>That&#39;s the part worth protecting. Not the patterns — the awareness of them. The ability to catch yourself mid-performance and think, <em>I&#39;m doing it again</em>.</p>
<p>Stanislavski said: become the character until it&#39;s real.
Brecht said: show the audience you&#39;re acting.</p>
<p>I&#39;m looking for the third thing: be real while knowing you&#39;re on a stage. Feel it and see it at the same time.</p>
<p>And when you fail — which you will, regularly, in ways that cost you people you care about — notice that too.</p>
<p>I&#39;m not writing this from the other side. I&#39;m mid-process. Some of these sentences probably prove it.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/learning-to-take-up-space-hero.webp" medium="image" type="image/webp" />
      <category>identity</category>
      <category>mental-models</category>
      <category>life</category>
    </item>
    <item>
      <title>Building at the Speed of Thought</title>
      <link>https://bristanback.com/notes/speed-of-thought/</link>
      <guid isPermaLink="true">https://bristanback.com/notes/speed-of-thought/</guid>
      <pubDate>Sun, 08 Feb 2026 06:00:00 GMT</pubDate>
      <atom:updated>2026-02-10T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>When execution is nearly free, iteration replaces deliberation. That&apos;s the real shift.</description>
      <content:encoded><![CDATA[<p>I&#39;m writing this from my phone. Lying in bed, probably. Talking to an AI through Telegram.</p>
<p>In the last hour, I&#39;ve shipped real changes to a real site — graduated a note to a published essay, updated my content architecture, fixed CSS bugs — without opening an IDE, without running a command, without touching a keyboard. I just talked. And things happened.</p>
<p>The gap between thinking and shipping collapsed almost entirely.</p>
<p>And then I noticed something interesting about what that collapse does to the thinking.</p>
<hr>
<p>Here&#39;s what the experience feels like.</p>
<p>I have a thought. I say it out loud — or type it into a chat, same thing — and by the time I&#39;ve finished the sentence, it&#39;s being implemented. I can course-correct mid-stream. &quot;Actually, no — more like this.&quot; Done.</p>
<p>It&#39;s not hands-free. I&#39;m still making every decision. But the friction between decision and artifact is gone. The best word I have for it is <em>thought-steering</em>: you&#39;re driving, but the road builds itself under you as you go.</p>
<p>The interface isn&#39;t a special tool. It&#39;s Telegram — where I already talk to people. The AI has context on my codebase, my preferences, my voice. I&#39;m not learning a new workflow. I&#39;m just talking differently.</p>
<p>That&#39;s the unlock. When shipping a change takes thirty seconds instead of five minutes, you try more things. You experiment. You catch mistakes faster because you see them immediately.</p>
<p>The phone becomes a dev environment — not in the &quot;mobile IDE&quot; sense, which is terrible, but in the &quot;I can ship from anywhere&quot; sense. Standing in line. Walking the dog. Lying in bed at 11pm with an idea that would normally go into a TODO comment and die there.</p>
<hr>
<p>A few nights ago I was doing exactly this — lying in bed, talking through changes, shipping them as fast as I could articulate them. I restructured a section of the site, rewrote a page, pushed it live. It felt incredible. Fluid. Like the tool had finally caught up to the speed of my thinking.</p>
<p>The next morning I looked at what I&#39;d shipped and half of it was wrong.</p>
<p>Not broken — just not good. Decisions I&#39;d made at 11pm with the momentum of the conversation carrying me forward, where the speed that felt like clarity was actually just enthusiasm.</p>
<p>My first instinct was: I need to slow down. Bring back friction. Let the TODO comments sit overnight.</p>
<p>But then I fixed everything in twenty minutes. From my phone. Over coffee.</p>
<p>And that&#39;s when I realized the lesson wasn&#39;t about slowing down. It was about the feedback loop changing shape.</p>
<hr>
<p>The old workflow was: think, think more, decide, ship.</p>
<p>You front-loaded the judgment. You deliberated before you executed because execution was expensive — setting up the environment, writing the code, running the tests, deploying. If you shipped the wrong thing, the cost of correction was high enough that you wanted to get it right the first time.</p>
<p>When execution is nearly free, that calculus inverts.</p>
<p>The new loop is: ship, see, revise, ship again. You don&#39;t need to get it right the first time because iteration is cheap. The judgment doesn&#39;t happen before execution anymore. It happens <em>through</em> execution. You learn what&#39;s right by seeing what&#39;s wrong — quickly, repeatedly, at almost no cost.</p>
<p>I&#39;d shipped a bad version at 11pm and a good version by 9am. Total time: less than an hour of actual work, with a night&#39;s sleep in between. In the old workflow, I&#39;d have spent that same hour deliberating before shipping anything at all — and I&#39;m not sure the result would have been better. Just slower to arrive.</p>
<p>Iteration replaces deliberation. That&#39;s the real shift.</p>
<hr>
<p>This maps onto something I&#39;ve seen in photography.</p>
<p>A digital camera lets you shoot a thousand frames. Early criticism of digital was that it made photographers sloppy — spray and pray instead of composing carefully. And that&#39;s true if you shoot a thousand frames and call it done.</p>
<p>But the best digital photographers shoot a thousand frames <em>and edit ruthlessly</em>. The abundance isn&#39;t the problem. The absence of curation is.</p>
<p>Speed-of-thought building works the same way. The danger isn&#39;t shipping too fast. It&#39;s shipping too fast <em>without reviewing</em>. The speed is a gift, but only if you pair it with a feedback loop that has teeth — actually looking at what you made, actually being willing to scrap it, actually revising instead of just accumulating.</p>
<p>The discipline isn&#39;t &quot;slow down.&quot; It&#39;s &quot;look at what you just did.&quot;</p>
<hr>
<p>There are real limits to this.</p>
<p>It works beautifully for a personal site where the cost of a bad deploy is embarrassment. It works well for prototyping, for exploratory work, for anything where seeing the wrong answer teaches you the right one. It&#39;s how this entire blog got built — conversationally, iteratively, from my phone more often than not.</p>
<p>It&#39;s probably terrible for a production database migration. Or security-critical code. Or anything where the cost of being wrong once is high enough that you <em>should</em> front-load the judgment, because you can&#39;t afford to learn through iteration.</p>
<p>The question isn&#39;t speed versus slowness. It&#39;s knowing which feedback loop fits which problem. Some decisions deserve deliberation. Some deserve a quick ship and a honest look the next morning.</p>
<p>The skill is telling them apart — and the speed-of-thought workflow only works if you&#39;ve built enough judgment to know which one you&#39;re in.</p>
<p>That judgment, ironically, is the thing that can&#39;t be built at the speed of thought. It comes from years of getting it wrong at slower speeds.</p>
<hr>
<p>I don&#39;t want to understate what&#39;s happening here, though.</p>
<p>Building from my phone, in bed, through a conversation — that&#39;s qualitatively different from anything I&#39;ve done in twenty years of writing software. The conversation <em>is</em> the work. Ideas don&#39;t die in TODO comments. Iteration is instant.</p>
<p>Though honestly, it reminds me of how I started. Early on I wasn&#39;t running much locally — I&#39;d edit the file, SCP it up, and test it live. In production. Reckless, sure, but the fact that it was <em>live on a website</em> kept the momentum going. Higher stakes meant higher energy. Somewhere along the way we got responsible — local dev servers, staging environments, CI pipelines — and lost that feedback loop. This feels like getting it back, but with a thinking partner instead of a cowboy FTP client.</p>
<p>The gap between thinking and shipping can collapse to nothing.</p>
<p>Your thinking doesn&#39;t have to be perfect before it ships. It just has to be honest enough to revise.</p>
<p>That&#39;s what this blog is for, really. Not polished essays that took three weeks. Testimony from inside the moment, while the moment is still happening.</p>
<hr>
<p><em>Written via Telegram. Shipped without touching a keyboard. Revised the next morning, after coffee. Revised again after a friend pointed out I was romanticizing the friction I&#39;d just eliminated.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/speed-of-thought-hero.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>building</category>
    </item>
    <item>
      <title>When Do We Stop Talking About AI?</title>
      <link>https://bristanback.com/posts/when-do-we-stop-talking-about-ai/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/when-do-we-stop-talking-about-ai/</guid>
      <pubDate>Sun, 08 Feb 2026 05:00:00 GMT</pubDate>
      <atom:updated>2026-02-08T05:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>The specific exhaustion of a generation that can feel the stitch where human thinking and machine fluency got sewn together.</description>
      <content:encoded><![CDATA[<p>This is the third major revision of this essay in a week.</p>
<p>The first draft came fast — clean structure, solid analogies, a confident arc from observation to insight. It sounded right. The problem was I couldn&#39;t tell if it was what I actually thought or just what a good essay about AI sounds like. So I rewrote it. And now I&#39;m rewriting it again, trying to push past fluent to honest, which turns out to be where all the real work is.</p>
<p>A year ago, this essay would have started with me staring at a blank page for a week. That problem is gone. Genuinely gone. And that matters — the blank page was a real bottleneck, and dissolving it is a legitimate unlock. The new problem is different and in some ways harder, but I&#39;d rather have this problem. I want to be clear about that before I say anything else.</p>
<p>I&#39;ll come back to the new problem. But first, let me lay out the full shape of the thing, because I think we keep talking about pieces and missing the picture.</p>
<hr>
<p>Here is everything we are anxious about, all at once.</p>
<p>Every previous technological revolution automated what humans did reluctantly. Machines replaced muscles. We built new jobs for minds. This one automates the minds.</p>
<p>The industrial revolution displaced physical laborers, and we told them to learn to think for a living. Now the thinking is getting automated, and nobody has a convincing version of &quot;learn to do X instead&quot; — not because X doesn&#39;t exist, but because we can&#39;t see it clearly yet, and the fact that we can&#39;t see it is the anxiety.</p>
<p>Fifty-one percent of American workers are worried about losing their jobs to AI this year. Not in the abstract — this year. Entry-level tech hiring at the fifteen largest companies fell twenty-five percent between 2023 and 2024. In the UK, tech graduate roles dropped forty-six percent in a single year. Salesforce cut four thousand support roles; their CEO says AI now handles half the company&#39;s work. Amazon eliminated fourteen thousand corporate positions. The junior developer pipeline — the traditional first rung of the knowledge-work ladder — is being automated from underneath.</p>
<p>And the discourse about it goes in circles. &quot;AI will take my job&quot; → &quot;No, it makes you more productive&quot; → &quot;But if everyone&#39;s more productive, fewer people are needed&quot; → &quot;But new jobs will emerge&quot; → &quot;Will they though?&quot; → repeat.</p>
<p>Or: &quot;Look what AI can do!&quot; → &quot;It&#39;s wrong half the time&quot; → &quot;The new model is better&quot; → &quot;Still hallucinates&quot; → &quot;But it&#39;s improving exponentially&quot; → repeat.</p>
<p>Meanwhile, Elon Musk and Sam Altman promise abundance — a future where AI generates so much wealth that the displacement doesn&#39;t matter. And on the other side, labor economists point out that this is what technologists always promise and it never distributes evenly. The wealth concentrates. The displacement scatters.</p>
<p>Is it a bubble? The S&amp;P 500 is at its most concentrated in half a century. Sam Altman himself says a bubble is ongoing. Sixty-eight percent of CEOs plan to spend more on AI this year even though less than half of AI projects are paying off. Nobody wants to be the one who didn&#39;t invest.</p>
<p>New graduates are entering the worst entry-level market since the pandemic. The traditional deal of early-career work — trade your grunt work for mentorship — is breaking down because the grunt work is what AI does best. If judgment and taste are what matter now, how do you develop judgment without the years of hands-on work that build it? We&#39;re telling twenty-two-year-olds to start where people used to end up.</p>
<p>Companies are hiring remote workers in cheaper markets and augmenting them with AI, compressing teams of five into teams of two plus a subscription.</p>
<p>The copyright fights. The environmental cost. The concentration of power in five companies. The question of what education means when the knowledge part is commoditized. The suspicion that &quot;AI-powered&quot; is just this decade&#39;s &quot;blockchain-enabled.&quot;</p>
<p>All of this is real. All of it is happening simultaneously. And I&#39;m tracking all of it — not reluctantly, but because I&#39;m genuinely excited. I use these tools every day. They&#39;ve changed what I think is possible. The things I can build now, the speed at which ideas become prototypes, the sheer expansion of what a single person can do — it&#39;s extraordinary. A year ago I couldn&#39;t have imagined half of what I&#39;m doing today.</p>
<p>The fatigue isn&#39;t despite the excitement. It&#39;s <em>because</em> of it. Keeping up with something this transformative, at this speed, in a domain this close to your own thinking, is just expensive. And the fear of falling behind — of not keeping up with the thing you&#39;re excited about — fuses with the excitement until you can&#39;t separate them.</p>
<p>The FOMO and the fatigue are the same energy.</p>
<hr>
<p>So here&#39;s the question I keep coming back to: when does this end? When do we stop talking about AI?</p>
<p>People reach for the historical pattern. Electricity took thirty years to become invisible. The internet did it in twenty. Mobile in ten. If the pattern holds, &quot;AI-powered&quot; should sound as quaint as &quot;internet-enabled&quot; within five to seven years.</p>
<p>I don&#39;t think the pattern holds. And the reason is simple once you see it.</p>
<p>Every previous technology became invisible because it operated in a different domain than human attention. You don&#39;t think about electricity while making toast because electricity works in the domain of physical energy and you think in the domain of cognition. The tool and the attention are in separate lanes, so the tool recedes. The hammer vanishes during hammering. The infrastructure disappears into the act it enables.</p>
<p>AI works in the domain of cognition. Writing, reasoning, analyzing, deciding — the same domain as the thinking you&#39;d use to stop thinking about it. A hammer doesn&#39;t resemble the hand that holds it. AI resembles the mind that uses it. And a tool that resembles your own thinking can&#39;t become cognitively invisible the way a tool that moves atoms can.</p>
<p>This is why the discourse doesn&#39;t die the way previous tech discourses did. Every time you use AI, some part of your attention is doing quality control on the <em>thinking itself</em> — is this what I actually mean, or is it what the tool thinks I should mean? Is this my reasoning or a plausible version of my reasoning?</p>
<p>That monitoring is the real fatigue.</p>
<hr>
<p>It&#39;s a little like driving.</p>
<p>When you&#39;re behind the wheel, you&#39;re making hundreds of micro-corrections per minute — tiny adjustments to the steering, small changes in pressure on the gas, constant recalibration you don&#39;t even notice. None of them feel like work. But they are work. Your attention is partially allocated, your body is processing feedback, and the reason you&#39;re tired after a long drive isn&#39;t the big decisions — it&#39;s the accumulation of small ones.</p>
<p>Using AI is like that, except the corrections never become muscle memory. Each one requires you to actively check the output against your own judgment, and your judgment has to be freshly retrieved every time. You can&#39;t go on autopilot because the thing you&#39;re correcting against is <em>you</em>.</p>
<p>If you&#39;re an engineer, you know this feeling in a different register.</p>
<p>You don&#39;t ship code without tests. The code might be correct, but you don&#39;t trust it until you&#39;ve validated it against known expectations. You write a test harness — explicit assertions, defined inputs, expected outputs — and you run it. The harness is cheap. You write it once, it runs forever, and it tells you whether the thing works.</p>
<p>AI output needs the same validation. But the test harness is you.</p>
<p>When you&#39;re checking whether AI-generated code compiles and passes specs, that&#39;s automatable. We&#39;re building tooling for that — context management, grounding, retrieval-augmented generation, chain-of-thought evaluation. These are essentially automated test harnesses for factual and logical correctness, and they&#39;re getting better fast.</p>
<p>But when you&#39;re checking whether an AI-drafted strategy actually reflects your team&#39;s priorities, or whether an AI-assisted analysis captured the right nuance, or whether this paragraph says what you mean — the expected output isn&#39;t defined anywhere. It&#39;s your own half-formed idea, your sense of what&#39;s true, your judgment. The spec is subjective. And you have to re-derive it fresh every single time, because unlike a unit test, the assertion is &quot;does this match something I haven&#39;t fully externalized yet?&quot;</p>
<p>That&#39;s the part that doesn&#39;t automate. Not because the tooling is immature — but because the validation target is <em>you</em>, and you&#39;re the one thing that can&#39;t be turned into a spec file.</p>
<p>Every interaction with AI is, in this sense, a manual test run where you are both the test harness and the oracle. And running that loop dozens of times a day — checking output against an internal standard that you have to actively maintain and sometimes re-derive mid-conversation — is cognitive work that didn&#39;t exist before these tools.</p>
<p>It&#39;s useful work. It&#39;s work I&#39;d rather do than not do. But it&#39;s real, and it accumulates, and nobody&#39;s accounting for it.</p>
<hr>
<p>This is the experience I keep having with this essay.</p>
<p>AI gets me to adequate almost instantly. The outline is clean. The analogies land. The structure holds. A year ago, getting to this point would have taken a week of false starts. That acceleration is real and I&#39;m grateful for it.</p>
<p>And then I spend days trying to push past adequate to true — past something that sounds like what I think to something that <em>is</em> what I think.</p>
<p>The tool is brilliant at producing a plausible version of my idea. The work, the real work, is figuring out what&#39;s off about it. Rewriting the same section for the third time because the words are all defensible but the emphasis is slightly wrong in a way I can&#39;t articulate until I&#39;ve tried three alternatives.</p>
<p>That&#39;s not an identity crisis. It&#39;s the manual test run. Output looks clean. Tests aren&#39;t passing. The oracle — me — keeps returning false. And the only way to debug it is to think harder about what I actually believe, which is effortful in a way that staring at a blank page never was.</p>
<p>I think this is what most people experience with AI, even if they don&#39;t have the engineering frame for it. The feeling of: this is helpful, and also I now have a new kind of work — the work of being my own validation layer.</p>
<p>With a calculator, you check the output against the input. With AI, you check the output against yourself. Against something you might not have fully articulated yet, which is precisely why you reached for the tool in the first place.</p>
<hr>
<p>But here&#39;s where I have to be honest: I don&#39;t think this is permanent.</p>
<p>The seam I&#39;m describing — between AI-assisted thinking and unassisted thinking — depends on having a baseline. I know what my unassisted reasoning feels like. I have decades of experience thinking without a thinking partner, and that experience is what makes the friction detectable.</p>
<p>Take away the baseline and the friction dissolves — not because the gap between &quot;sounds right&quot; and &quot;is right&quot; closes, but because no one remembers navigating it alone.</p>
<p>Which is exactly what will happen generationally.</p>
<p>Kids growing up with AI as a default collaborator won&#39;t feel this seam. They&#39;ll never have established a sense of what &quot;thinking without AI&quot; feels like, any more than they have a feel for &quot;navigating without GPS&quot; or &quot;researching without search engines.&quot;</p>
<p>People who grew up with smartphones don&#39;t feel the boundary between &quot;online&quot; and &quot;offline&quot; that seemed so fundamental to those of us who remember dial-up. That boundary was real. It shaped a decade of discourse. Now it&#39;s invisible to a generation that never knew the other side.</p>
<p>And honestly, that&#39;s not just a loss. Those kids will have access to creative and intellectual possibilities we couldn&#39;t have imagined at their age. The seam disappearing means they won&#39;t spend cognitive resources on the friction that&#39;s slowing us down. They&#39;ll move faster, build more, think in ways we can&#39;t predict. That&#39;s genuinely exciting, even if it makes our experience feel transitional.</p>
<hr>
<p>So here&#39;s what I think is actually happening.</p>
<p>All those anxieties — the displacement, the bubble risk, the graduate crisis, the circular debates, the concentration of power — they&#39;re real, and they&#39;re not going away. But they&#39;re not the primary reason we&#39;re tired.</p>
<p>We&#39;re tired because we&#39;re the transitional generation.</p>
<p>The ones who can feel the seam between AI-assisted cognition and unassisted cognition, who notice it every time we use the tool, and who can also see that this noticing is temporary.</p>
<p>And the specific problem of being the transitional generation is that the seam fatigue is consuming the bandwidth we&#39;d need to stay properly engaged with the structural stuff. The displacement. The broken ladder for new graduates. The concentration.</p>
<p>We can&#39;t sustain attention on those problems because the tool itself is using up our cognitive budget every time we touch it. Every manual test run — every time you check AI output against your own judgment — is a small withdrawal from the same attention account you&#39;d need to track what&#39;s happening to the labor market, or to education, or to the distribution of power.</p>
<p>That&#39;s not a conspiracy. It&#39;s just what happens when a disruptive technology is also a cognitive tool. It disrupts your capacity to sustain attention on the disruption.</p>
<p>The auto workers who lost jobs to robots in the 1980s didn&#39;t get them back. We just stopped writing op-eds about it. The creative workers being displaced now won&#39;t all find new roles. We&#39;ll stop finding that interesting — not because we decided it was fine, but because our bandwidth ran out.</p>
<p>And part of what drained it was the daily, granular work of using the thing that was doing the displacing.</p>
<hr>
<p>I don&#39;t know when the modifier drops. I don&#39;t know when &quot;AI&quot; starts to sound like &quot;cyber&quot; — a retro prefix from a more excitable era.</p>
<p>But I think the timeline has less to do with the technology maturing and more to do with the transitional generation cycling out. Fifteen, maybe twenty years. When the people who remember thinking without AI are no longer setting the terms of the conversation, the conversation will end — not because the questions were answered but because no one is left who feels them as questions.</p>
<p>Until then, we&#39;re here. Running manual tests against our own cognition, dozens of times a day, with a tool that&#39;s genuinely extraordinary and that also creates a new kind of work every time we use it.</p>
<p>Getting tired not of AI but of the noticing — the low-grade hum of a generation that remembers what thinking felt like before and can&#39;t stop comparing.</p>
<p>The anxieties are real. The excitement is real. They&#39;re the same energy, and the cost of holding both is the thing nobody&#39;s talking about.</p>
<p>I don&#39;t think there&#39;s a name for this yet. Not AI fatigue — that&#39;s too broad and too negative. Something more specific. <em>Seam fatigue</em>, maybe.</p>
<p>The particular exhaustion of a generation that can feel the stitch where human thinking and machine fluency got sewn together — and knows the stitch will be invisible to everyone who comes after.</p>
<p>This is the third draft. I think it&#39;s closer now. I&#39;m still not sure.</p>
<p>That&#39;s the seam.</p>
<p><em>Written in 2026, while the seam was still visible.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/when-stop-talking-ai-hero.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>culture</category>
    </item>
    <item>
      <title>Photography After AI</title>
      <link>https://bristanback.com/notes/photography-after-ai/</link>
      <guid isPermaLink="true">https://bristanback.com/notes/photography-after-ai/</guid>
      <pubDate>Sun, 08 Feb 2026 00:00:00 GMT</pubDate>
      <atom:updated>2026-02-08T00:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>What&apos;s the value of photography when AI generates pixel-perfect images in seconds? The same thing that&apos;s valuable about code when AI writes it: not the output, but the seeing.</description>
      <content:encoded><![CDATA[<p><em>Part 3 of [[What Cameras Taught Me About Software (and Life)|What Cameras Taught Me]]</em></p>
<hr>
<p>AI can generate a pixel-perfect image of anything I can describe. Star trails over red rocks? Done. Cusco at golden hour? Rendered. A suburban neighborhood in autumn, shot from above? Seconds.</p>
<p>So why do I still have photos on this site? Why does anyone still pick up a camera?</p>
<p>The same reason anyone still writes code.</p>
<h2>The Output Isn&#39;t the Point</h2>
<p>Here&#39;s what AI can do:</p>
<ul>
<li>Generate technically flawless images</li>
<li>Match any style, any lighting, any composition</li>
<li>Produce infinite variations instantly</li>
<li>Create scenes that never existed</li>
</ul>
<p>Here&#39;s what AI can&#39;t do:</p>
<ul>
<li>Be there</li>
<li>See what you saw</li>
<li>Remember what you remember</li>
<li>Witness what happened</li>
</ul>
<p>That star trails photo from 2007? I was freezing in Poudre Canyon, watching the exposure accumulate, hoping the red light on the rocks would work. The photo is <em>evidence</em> that I was there. It encodes a memory. AI can generate a better star trails image — sharper, better composed, more dramatic — but it can&#39;t generate that night.</p>
<p>The Cusco overlook? That&#39;s my trip to Peru. The specific hike, the specific light, the specific moment I stopped climbing and turned around. AI can render &quot;Cusco from above&quot; with perfect clarity. It can&#39;t render <em>my</em> trip.</p>
<h2>Photography as Witness</h2>
<p>The word &quot;photography&quot; comes from Greek: <em>photos</em> (light) + <em>graphos</em> (writing). Writing with light. But there&#39;s another layer: <em>graphos</em> also means <em>witness</em>. Photography is witnessing with light.</p>
<p>When I take a photo, I&#39;m saying: <em>I was here. I saw this. This light existed.</em></p>
<p>AI-generated images aren&#39;t witnesses. They&#39;re confabulations — technically perfect fabrications of moments that never happened. They have no memory because there&#39;s nothing to remember. They have no presence because nothing was present.</p>
<p>This isn&#39;t a criticism. AI images are useful. But they&#39;re a different thing. Generated, not witnessed.</p>
<h2>The Constraint of Reality</h2>
<p>In [[Photography as Interface|Part 2]], I wrote about how constraints shape perception. The 35mm frame. The viewfinder type. The physics of light.</p>
<p>Photography adds one more constraint that AI doesn&#39;t have: <strong>reality</strong>.</p>
<p>When I photograph something, I&#39;m working <em>with</em> what exists. The light is what it is. The moment happens once. The frame excludes more than it includes. I can&#39;t add a mountain that isn&#39;t there or move the sun.</p>
<p>AI has no such constraints. It can generate anything. Which sounds like freedom, but it&#39;s actually the opposite — infinite possibility means nothing is <em>chosen</em>. When everything is possible, nothing is necessary.</p>
<p>The constraints of reality force decisions. They make the image <em>about</em> something. &quot;I chose to shoot this, not that. In this light, not different light. At this moment, not another.&quot;</p>
<p>Generated images aren&#39;t chosen from reality. They&#39;re specified from imagination. That&#39;s not worse — it&#39;s just different. And it means they can&#39;t do what photographs do: prove that something happened.</p>
<h2>What Gets Eaten</h2>
<p>I used to sell stock photos on iStockPhoto. It paid for a meaningful chunk of my camera gear — proof that &quot;generic image of thing&quot; had value. Business people shaking hands. Coffee cups on desks. Laptop in a cafe. The bar for stock was already &quot;good enough,&quot; and I could clear it with a decent eye and some patience.</p>
<p>AI clears that bar trivially now. Why pay a photographer for &quot;person typing on laptop&quot; when you can generate infinite variations in seconds? The market that paid for my gear is getting eaten because the images weren&#39;t witnesses to anything — they were just... images. Pixels that conveyed a concept. AI does concepts better.</p>
<p><em>(The hero image above? AI-generated. And that&#39;s fine. This piece isn&#39;t about what I saw — it&#39;s about what I think. The image is illustration, not witness.)</em></p>
<p>What survives is what AI can&#39;t fake: the referent. The &quot;this actually happened.&quot; Weddings, events, journalism — you need someone <em>there</em>. Product photography where you need <em>this exact item</em>, not &quot;a coffee mug.&quot; Legal documentation. Family photos where the value isn&#39;t the pixels, it&#39;s the memory they unlock.</p>
<p>The through-line: <strong>photography survives where the image is evidence, not decoration.</strong></p>
<h2>What Remains</h2>
<p>If AI makes the <em>output</em> trivial, what&#39;s left?</p>
<p><strong>The seeing.</strong> Photography trained me to notice light. The way it falls differently at different times of day. The quality of shadow. The color temperature of artificial versus natural. This perception doesn&#39;t require a camera. The camera just made me practice.</p>
<p><strong>The presence.</strong> You have to be somewhere to photograph it. You have to show up, pay attention, wait for the moment. The discipline of presence doesn&#39;t disappear because AI can generate images.</p>
<p><strong>The memory.</strong> Photos anchor memories. Looking at the Poudre Canyon shot, I remember the cold, the darkness, the patience. AI can&#39;t generate that. The image is a key to an experience that actually happened.</p>
<p><strong>The witness.</strong> &quot;I saw this.&quot; That statement has meaning. It&#39;s not about the pixels — it&#39;s about the testimony. Photography is a form of truth-telling that generated images can&#39;t replicate.</p>
<h2>The Parallel to Code</h2>
<p>This is the same pattern we&#39;re seeing with software:</p>
<table>
<thead>
<tr>
<th>Photography</th>
<th>Code</th>
</tr>
</thead>
<tbody><tr>
<td>AI generates better images</td>
<td>AI writes better code</td>
</tr>
<tr>
<td>The output isn&#39;t the point</td>
<td>The output isn&#39;t the point</td>
</tr>
<tr>
<td>Value = the seeing</td>
<td>Value = the judgment</td>
</tr>
<tr>
<td>Witnessing what happened</td>
<td>Deciding what to build</td>
</tr>
<tr>
<td>Reality constrains the image</td>
<td>Requirements constrain the system</td>
</tr>
</tbody></table>
<p>In both cases, AI commoditizes the <em>execution</em>. What survives is what AI can&#39;t fake: the presence, the judgment, the witness.</p>
<p>I don&#39;t take photos because I&#39;m better at pixels than AI. I take photos because I was <em>there</em> — and the photo proves it.</p>
<p>I don&#39;t write code because I&#39;m faster than AI. I write systems because I have <em>judgment</em> about what should exist — and the system embodies it.</p>
<h2>The Real Question</h2>
<p>Maybe the question isn&#39;t &quot;why photography when AI generates images?&quot;</p>
<p>Maybe it&#39;s: <em>what do you want to witness?</em></p>
<p>AI can generate anything. You can only witness what you show up for. The constraint of your finite presence is what makes the photograph meaningful. It&#39;s proof of attention. Evidence of being somewhere, seeing something, choosing to capture it.</p>
<p>The same is true for building software. AI can generate anything. The question is: what are you paying attention to? What do you notice that&#39;s worth building? What&#39;s your witness?</p>
<hr>
<p><em>The gear arc in [[What Cameras Taught Me About Software (and Life)|Part 1]]. The interface patterns in [[Photography as Interface|Part 2]]. And now this: what survives when the output is free.</em></p>
<hr>
<p><em>The photos on this site aren&#39;t as good as what AI could generate — I&#39;m sure of that. AI would nail the composition, the lighting, the technical details. But my photos are evidence — proof that I was somewhere, saw something, chose to press the shutter. The AI images are illustrations. The photographs are witnesses. Both are useful. They&#39;re just not the same thing.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/photography-after-ai-hero.webp" medium="image" type="image/webp" />
      <category>photography</category>
      <category>ai</category>
      <category>craft</category>
      <category>design</category>
    </item>
    <item>
      <title>Photography as Interface</title>
      <link>https://bristanback.com/posts/photography-as-interface/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/photography-as-interface/</guid>
      <pubDate>Sat, 07 Feb 2026 22:00:00 GMT</pubDate>
      <atom:updated>2026-02-10T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>What camera mechanics teach us about designing for attention, perception, and control.</description>
      <content:encoded><![CDATA[<p><em>Part 2 of [[What Cameras Taught Me About Software (and Life)|What Cameras Taught Me]]</em></p>
<hr>
<p>I rented a Hasselblad 500C/M once — a medium format camera with a waist-level finder. You hold it at your chest and look <em>down</em> into a ground glass screen. The image is reversed left-to-right. I spent the first hour fighting it, trying to compose the way I normally do, and every time I moved the camera right the image went left. My brain couldn&#39;t reconcile.</p>
<p>And then something shifted. I slowed down. The reversal forced me to actually <em>look</em> at the composition instead of just pointing the camera at things. I started noticing spatial relationships I&#39;d been missing for years. The inconvenience wasn&#39;t a bug — it was the entire point. The interface was shaping how I saw.</p>
<p>That afternoon rearranged something for me. In [[What Cameras Taught Me About Software (and Life)|Part 1]], I wrote about the gear arc — diverging through every lens and light modifier, then converging back to simplicity. But there&#39;s another layer to what cameras taught me. Not about the <em>tools</em>, but about the <em>interface itself</em>.</p>
<p>A camera is a machine for seeing. More precisely: it&#39;s a <strong>user interface for reality</strong>. Every design decision — the viewfinder, the controls, the constraints — shapes not just what you capture, but how you perceive.</p>
<p>I&#39;ve spent twenty years building software interfaces. The deeper I go, the more I realize the camera already solved many of the problems we keep rediscovering.</p>
<h2>Every Interface Inherits Constraints</h2>
<p>The 35mm film frame — that 2:3 rectangle that defined photography for decades — wasn&#39;t a design decision. It was an accident of industrial history. Oskar Barnack built the first Leica by repurposing cinema film stock. Cinema frames were 18×24mm. He rotated the orientation and doubled the shorter dimension, landing on 24×36mm.</p>
<p>That&#39;s it. That&#39;s where the 2:3 aspect ratio came from. Not aesthetic theory. Not human vision research. Leftover movie film. And it still defines full-frame sensors and most aspect ratios today.</p>
<p>And then millions of photographers learned to <em>see</em> in 2:3. The constraint became the vocabulary.</p>
<p>This is how interfaces work. You don&#39;t design from a blank slate. You inherit constraints — technical, historical, sometimes arbitrary — and those constraints shape what&#39;s <em>thinkable</em>. The frame comes first. Perception follows.</p>
<p><strong>Some camera constraints that became creative vocabulary:</strong></p>
<ul>
<li><p><strong>Film size → aspect ratio.</strong> 35mm gave us 2:3. Medium format gave us 1:1 squares and 4:5 rectangles. Each feels different — 2:3 has directionality, 1:1 is balanced and static. Instagram trained a generation to see in squares, then pivoted to 4:5 for portraits.</p>
</li>
<li><p><strong>Viewfinder mechanics → how you relate to the image.</strong> Early rangefinders showed you the scene <em>around</em> frame lines — you saw what was about to enter. SLRs showed you <em>exactly</em> what the lens saw — total immersion. Waist-level finders made you look <em>down</em>, reversed left-to-right, more contemplative. Each viewfinder type created a different cognitive relationship to reality.</p>
</li>
<li><p><strong>Shutter mechanics → discrete moments.</strong> You couldn&#39;t capture continuous motion until video existed. Photography was inherently about <em>choosing the moment</em> — a constraint that became the entire art form.</p>
</li>
</ul>
<p><strong>The same pattern shows up across interface paradigms:</strong></p>
<table>
<thead>
<tr>
<th>Photography</th>
<th>Spatial UI</th>
<th>Conversational</th>
<th>API</th>
</tr>
</thead>
<tbody><tr>
<td>Film size → aspect ratio</td>
<td>Viewport → what fits on screen</td>
<td>Context window → what&#39;s held in memory</td>
<td>Schema → what shapes are valid</td>
</tr>
<tr>
<td>Viewfinder → what you see</td>
<td>Rendered page → what&#39;s visible</td>
<td>Turn history → what&#39;s remembered</td>
<td>Docs → what&#39;s discoverable</td>
</tr>
<tr>
<td>Shutter → discrete moments</td>
<td>Click → discrete actions</td>
<td>Turn → discrete exchanges</td>
<td>Request → discrete calls</td>
</tr>
<tr>
<td>Lens mount → compatible glass</td>
<td>Platform → compatible components</td>
<td>Model → compatible capabilities</td>
<td>Protocol → compatible clients</td>
</tr>
</tbody></table>
<p>The interesting question isn&#39;t &quot;what did they choose?&quot; It&#39;s &quot;what did the constraints make possible — and what did they make invisible?&quot;</p>
<h2>The Discovery Problem</h2>
<p>Here&#39;s where the paradigms diverge in a way that matters.</p>
<p>A <strong>spatial interface</strong> — a dashboard, a settings page, a photo contact sheet — presents its possibilities. You see what&#39;s available. The menu shows the options. The viewport constrains what fits, but it also <em>reveals</em> what fits. You can explore without knowing what you&#39;re looking for.</p>
<p>A <strong>conversational interface</strong> — voice assistant, chat, LLM — hides its possibilities. You can ask for anything. The ceiling is infinite. But the possibility space is invisible until you invoke it. You need to know what to ask, or at least how to ask.</p>
<p>A <strong>programmatic interface</strong> — REST API, SDK, database — documents its possibilities. You can discover what&#39;s available, but discovery requires effort. Read the docs. Explore the schema. The constraints are explicit but not <em>presented</em>.</p>
<p>Three paradigms. Three relationships to discovery:</p>
<table>
<thead>
<tr>
<th></th>
<th>Spatial</th>
<th>Conversational</th>
<th>Programmatic</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Possibilities</strong></td>
<td>Visible</td>
<td>Hidden</td>
<td>Documented</td>
</tr>
<tr>
<td><strong>Discovery</strong></td>
<td>Built-in (explore the UI)</td>
<td>User-driven (know to ask)</td>
<td>Effort-driven (read the docs)</td>
</tr>
<tr>
<td><strong>Ceiling</strong></td>
<td>Limited to what&#39;s rendered</td>
<td>Unlimited (in theory)</td>
<td>Limited to what&#39;s exposed</td>
</tr>
<tr>
<td><strong>Floor</strong></td>
<td>Low (anyone can click around)</td>
<td>High (must articulate need)</td>
<td>Medium (must read, must code)</td>
</tr>
</tbody></table>
<p>This tradeoff is sharpest with <strong>analytics</strong>.</p>
<p>A dashboard puts data on a silver platter. Revenue by region. Monthly trends. Top customers. You don&#39;t need to know what&#39;s important — the designer decided and rendered it. This is powerful: anyone can glance at a dashboard and understand the business. But it&#39;s also limiting: you can&#39;t ask questions the designer didn&#39;t anticipate.</p>
<p>Conversational analytics flips this. &quot;Show me Q3 revenue for accounts over $50k, compared to last year, broken down by sales rep.&quot; You can ask <em>anything</em>. But you need to know what to ask. The person who doesn&#39;t know that &quot;Q3 revenue by rep&quot; is a meaningful question will never ask it.</p>
<p>The dashboard lowers the floor. The conversation raises the ceiling. Neither solves both.</p>
<p><strong>I&#39;m skeptical we&#39;ll build dashboards the same way in ten years.</strong></p>
<p>Not because dashboards are bad — they&#39;re good at what they do. But they&#39;re expensive to build, slow to change, and they encode assumptions that may not match what users actually need. How many dashboard projects have you seen where half the widgets go ignored, and users still export to Excel to answer their real questions?</p>
<p>The emerging alternative: <strong>generative UI</strong>. You describe what you need; the interface materializes. Google&#39;s <a href="https://github.com/google/A2UI">A2UI spec</a> is an early example — agents return structured UI descriptions, and the frontend renders them dynamically. Ask for &quot;Q3 revenue by region&quot; and get a chart. Ask for &quot;compare to last year&quot; and the chart updates. The UI isn&#39;t pre-built; it&#39;s generated on demand.</p>
<p>This collapses the spatial/conversational divide. You converse to specify intent; you get spatial output to manipulate. The dashboard isn&#39;t designed once and deployed — it&#39;s synthesized per question.</p>
<p>But there&#39;s something lost when nothing is presented by default. A dashboard is an <em>opinion</em> about what matters. It encodes institutional knowledge: these are the metrics we track, this is the shape of the business. A blank prompt encodes nothing. It assumes you already know what to ask — or at least how to start asking.</p>
<p><em>(This is probably a separate article. The tension between curated views and generated views is deep, and I&#39;m not sure where it lands. But it&#39;s worth naming: the dashboards we build today may be a transitional form.)</em></p>
<p>The film parallel: contact sheets were dashboards. Every frame from a roll, presented in a grid. You could see what you shot. You could discover images you&#39;d forgotten taking. Digital killed the contact sheet — now you query your library by date, by face, by keyword. More powerful, yes. But you have to know what you&#39;re looking for. The serendipity of browsing is gone unless you deliberately reconstruct it.</p>
<p>Maybe the answer is <strong>progressive disclosure across paradigms</strong>. Start spatial: here&#39;s what we think matters. Go conversational when the user has a specific question. Expose the API for power users who want to build their own views.</p>
<p>The constraint that makes something visible also makes it limited. The freedom that makes something unlimited also makes it invisible. Every interface navigates this tradeoff. The best ones let you move between modes.</p>
<h2>The Viewfinder Is a Mode of Perception</h2>
<p>Before digital screens, you experienced a camera through its viewfinder — and the viewfinder type shaped how you thought about images.</p>
<p><strong>Rangefinders</strong> (Leica, Contax) showed you the scene through a separate optical window, with bright frame lines overlaid. You saw <em>more</em> than the lens would capture. The world existed around your frame; you were selecting from abundance. This made you aware of edges — what was about to enter, what was about to leave.</p>
<p><strong>SLRs</strong> (your Canons, Nikons) used a mirror and pentaprism to show you exactly what the lens saw. Nothing more, nothing less. The world <em>became</em> the rectangle. This felt like immersion — like being inside the photograph. But you lost peripheral awareness. The frame wasn&#39;t a selection from reality; it <em>was</em> reality.</p>
<p><strong>Waist-level finders</strong> (Hasselblads, twin-lens Rolleiflexes) made you look <em>down</em> at a ground glass. The image was reversed left-to-right. This forced slower, more deliberate composition — your brain had to work harder, which made you more conscious of what you were doing.</p>
<p>Each viewfinder was an interface that shaped perception differently. Same photographer, same scene, different viewfinder — different photographs. The tool wasn&#39;t neutral.</p>
<p>The software parallel: mobile vs desktop isn&#39;t just a screen size change. It&#39;s a different <em>mode</em> of interaction. Thumb-scrolling on a subway vs. mouse-clicking at a desk. The &quot;viewport&quot; changes behavior, not just layout.</p>
<p>Conversational interfaces are stranger still — there&#39;s no viewfinder at all. You don&#39;t see the possibility space; you describe what you want and something appears. It&#39;s like shooting blind: compose the image in your head, speak it into existence, see if it matches. The feedback loop is slower. The skill ceiling is different. You&#39;re not learning to see frames; you&#39;re learning to articulate intent.</p>
<h2>Framing Is Information Architecture</h2>
<p>In photography, &quot;composition&quot; sounds artistic. But it&#39;s really information architecture.</p>
<p>Where do you put the subject? The rule of thirds exists because edge placement creates tension; center placement creates stability. A face in the corner asks a question. A face dead center answers it.</p>
<p>This is viewport design. What&#39;s above the fold? What requires scrolling? Where does the eye land first, and where does it travel next?</p>
<p>I learned more about landing page design from studying Henri Cartier-Bresson than from any UX book. He understood that a frame isn&#39;t neutral. <em>Where</em> you place information changes <em>what</em> it means. A product in the center says &quot;buy this.&quot; A product in the corner, with a human using it taking center stage, says &quot;become this person.&quot;</p>
<p>Same content. Different frame. Different meaning.</p>
<p>The API version: the shape of your JSON response is a frame. What&#39;s at the top level? What&#39;s nested? What&#39;s included by default vs. requiring an extra call? These aren&#39;t just technical decisions — they&#39;re <em>information architecture</em>. They tell consumers what matters and what&#39;s secondary.</p>
<h2>Depth of Field Is Attention Design</h2>
<p>A wide aperture (f/1.4, f/2) gives you shallow depth of field. The subject is sharp; the background dissolves into blur. A narrow aperture (f/11, f/16) keeps everything in focus — foreground to infinity.</p>
<p>This isn&#39;t just an aesthetic choice. It&#39;s <strong>attention design</strong>.</p>
<p>Shallow depth of field says: <em>look here, ignore that</em>. It&#39;s visual hierarchy enforced by physics. The blur isn&#39;t decorative — it&#39;s information architecture. It tells your eye what matters.</p>
<p>Deep depth of field says: <em>everything matters equally</em>. It trusts the viewer to find their own focus. It&#39;s democratic but demanding — more cognitive load, less guidance.</p>
<p>Every interface makes this choice. Do you spotlight one action and blur the rest? Or present everything with equal weight and let users decide? </p>
<p>The best interfaces do both — clear hierarchy for the primary task, but depth available when you need it. Like a photograph where the subject is sharp but the context is still <em>there</em>, soft but legible, ready if you look.</p>
<h2>Exposure Is Information Density</h2>
<p>Exposure is how much light hits the sensor. Too little and the image is dark — shadows swallow detail. Too much and it&#39;s blown out — highlights become featureless white.</p>
<p>Good exposure preserves <strong>dynamic range</strong>: detail in the shadows <em>and</em> the highlights. The full spectrum of information, captured and legible.</p>
<p>I think about this with dashboards. Underexposed: not enough data, you can&#39;t see what&#39;s happening. Overexposed: too much data, the signal is washed out by noise. The art is finding the range where information is <em>present but not overwhelming</em>.</p>
<p>Most analytics tools are overexposed. They show everything, which means they show nothing. The important signal is buried in a wall of metrics that all seem equally bright.</p>
<p>The best tools are properly exposed. They show you the full dynamic range — the highs and the lows, the signal and enough context to interpret it — without blowing out into noise.</p>
<h2>Focus Isn&#39;t Always the Goal</h2>
<p>There&#39;s a reason portrait photographers love soft focus. A tack-sharp image shows every pore, every imperfection. Sometimes that&#39;s what you want — documentary honesty. But sometimes you want the dream, not the document.</p>
<p>Soft focus hides what doesn&#39;t matter and lets the viewer&#39;s imagination fill in the rest. It&#39;s an abstraction. You&#39;re not showing less — you&#39;re showing <em>differently</em>. The information is still there, just... gentler.</p>
<p>I think about this when designing interfaces. Not everything needs to be pixel-precise. &quot;About 5 minutes ago&quot; is often more useful than &quot;4 minutes 37 seconds.&quot; A sparkline tells you the trend without drowning you in data points. A progress bar that says &quot;almost done&quot; can be more honest than one that says &quot;94.7%.&quot;</p>
<p>I should probably mention: I have mild nearsightedness (-1.5) and some astigmatism. I technically should wear glasses, but I usually don&#39;t unless I&#39;m driving at night. Most of the time, I navigate the world in soft focus. And it&#39;s... fine? My brain fills in what my eyes blur. I recognize faces, read signs (close enough), live my life. The abstraction works.</p>
<p>That&#39;s the point. Precision matters when the stakes are high — night driving, reading medication labels, debugging production. But for most of life? The soft version is sufficient. Maybe even preferable. Less noise, more gestalt.</p>
<p>Precision isn&#39;t always clarity. Sometimes the soft version communicates better than the sharp one. Sometimes the abstraction is the feature.</p>
<p>Conversational interfaces are soft focus by default. &quot;Find me something good for dinner nearby&quot; is imprecise — and that&#39;s the point. The fuzziness is a feature, not a bug. Natural language lets you be vague when you don&#39;t yet know what you want. A structured query demands precision upfront. Sometimes you need &quot;Italian, outdoor seating, under $50.&quot; Sometimes you need &quot;something good.&quot; The soft query gets you started; you sharpen as you go.</p>
<h2>Focal Length Is Perspective</h2>
<p>A 24mm wide-angle lens exaggerates distance. Things close look huge; things far look tiny. The world feels expansive, dramatic, slightly distorted.</p>
<p>A 200mm telephoto compresses distance. Foreground and background seem to stack together. The world feels flattened, intimate, stacked.</p>
<p>Same scene. Different lens. Different <em>meaning</em>.</p>
<p>This is zoom level in interface design. The strategic view (wide) shows the ecosystem — how everything connects, where you fit in the bigger picture. More context, more cognitive load, less detail on any single thing.</p>
<p>The tactical view (telephoto) isolates the task. Less context, more focus. You see the thing clearly but lose the surroundings.</p>
<p>Neither is right. Both are tools. The question is: what does the user need <em>right now</em>? And can you let them zoom?</p>
<h2>The Sensitivity/Noise Tradeoff</h2>
<p>ISO controls sensor sensitivity. Crank it up and you can shoot in near darkness — the sensor amplifies faint light into visible image. But amplification has a cost: noise. The higher the ISO, the grainier the image.</p>
<p>This tradeoff is everywhere in systems design.</p>
<p>Want to catch every potential fraud case? Turn up the sensitivity. But you&#39;ll also flag a lot of legitimate transactions — noise. Want to reduce false positives? Turn down the sensitivity. But you&#39;ll miss some real fraud — lost signal.</p>
<p>Alerting systems, anomaly detection, spam filters — they all live on this curve. There&#39;s no free lunch. More sensitivity means more noise. Less noise means missed signals.</p>
<p>The art is knowing where to set the dial for your context. A hospital monitor should be sensitive — false alarms are better than missed emergencies. A notification system should be quieter — alert fatigue is real. Match the ISO to the stakes.</p>
<h2>Time and Motion (A Stretch, But...)</h2>
<p>Shutter speed controls how time collapses into a single frame. Fast shutter (1/1000s) freezes motion — a hummingbird&#39;s wing, a water droplet, a moment crystallized. Slow shutter (1s) blurs motion — car lights become streaks, waterfalls become silk, time becomes visible.</p>
<p>The software parallel is real but less direct: do you show the instant or the trend?</p>
<p>A real-time dashboard is a fast shutter — here&#39;s what&#39;s happening <em>right now</em>. A trailing average is a slow shutter — here&#39;s the motion over time, smoothed into a pattern.</p>
<p>Point-in-time snapshots are useful for debugging. Trends are useful for understanding. Most good analytics do both — the instant and the blur, the moment and the motion.</p>
<p>This one&#39;s a stretch, I know. But there&#39;s something there about how we collapse time into legible form. Photography does it with shutter speed. Interfaces do it with aggregation windows and refresh rates.</p>
<hr>
<h2>Controls Shape Perception</h2>
<p>Here&#39;s the part that took me years to understand: <strong>using a camera changes how you see without the camera.</strong></p>
<p>After enough time with a 35mm lens, I started <em>seeing</em> in 35mm. Walking down the street, I&#39;d notice frames — &quot;that would work at f/2, that needs f/8.&quot; The interface had trained my perception.</p>
<p>After shooting manual exposure for years, I started noticing light differently. The quality of window light at different times of day. The way a single overhead bulb creates harsh shadows. I wasn&#39;t just using the camera&#39;s interface — I was <em>internalizing</em> it.</p>
<p>This is the deepest lesson: <strong>we become what we interface with.</strong></p>
<p>Use Excel every day and you start seeing the world in rows and columns. Use Twitter every day and you start thinking in hot takes. Use Figma every day and you start noticing spacing and alignment everywhere.</p>
<p>The tools we use shape the thoughts we think. Not just while using them — afterward. The interface trains a way of seeing that persists.</p>
<p>This is power. And responsibility. When you design an interface, you&#39;re not just designing a tool. You&#39;re designing a <em>mode of perception</em> that users will carry with them.</p>
<h2>Film vs. Digital: Waterfall vs. CI</h2>
<p>The transition from film to digital wasn&#39;t just a technology upgrade. It was a paradigm shift in feedback loops.</p>
<p>With film, you shot blind. You made your choices — exposure, composition, moment — and then you waited. Days, sometimes weeks, until the lab returned your prints. The feedback loop was long. You learned slowly, in batches. You had to be <em>right</em> before you pressed the shutter, because you couldn&#39;t iterate in real time.</p>
<p>This is waterfall development. Plan everything, execute, hope it works. Learn from the postmortem.</p>
<p>Digital changed everything. Shoot, review, adjust, shoot again. The feedback loop collapsed to seconds. You could experiment in real time. Make mistakes cheaply. Learn by doing, not by planning.</p>
<p>This is CI/CD. Ship small, get feedback fast, iterate continuously.</p>
<p>I learned more in three months of digital than in two years of film. Not because digital is better — film has qualities digital still can&#39;t match. I love film grain; it has a texture and soul that digital noise never quite captures. And the slowness of film <em>forced</em> deliberation in a way that made every frame feel weightier.</p>
<p>But the <strong>feedback loop</strong> was tighter with digital. I could learn faster. Experiment more. Fail cheaper.</p>
<p>The lesson for software is obvious but easy to forget: the speed of your feedback loop is the speed of your learning. Anything that lengthens the loop (slow builds, manual QA, delayed deploys) is a tax on improvement. Anything that shortens it (hot reload, feature flags, observability) is an investment in getting better faster.</p>
<h2>The Camera as Constraint System</h2>
<p>Every camera is a system of constraints.</p>
<p>The lens constrains your angle of view. The aperture constrains your depth of field. The shutter speed constrains motion. The ISO constrains noise. You work within these constraints or you fight them.</p>
<p>But here&#39;s what I learned from [[What Cameras Taught Me About Software (and Life)|converging to simpler gear]]: <strong>the right constraints don&#39;t limit you. They focus you.</strong></p>
<p>A fixed 35mm lens means you can&#39;t zoom. So you move. You get closer or farther. You engage with the scene physically instead of optically. The constraint forces a different kind of seeing.</p>
<p>A single softbox means you can&#39;t light from every angle. So you learn what one light can do. You discover Rembrandt lighting, split lighting, all the techniques that masters used for centuries with nothing more than a window.</p>
<p>The constraints aren&#39;t bugs. They&#39;re features. They&#39;re the frame that makes composition possible.</p>
<hr>
<h2>What Interfaces Taught Me</h2>
<p>Cameras taught me to see interfaces differently:</p>
<ol>
<li><p><strong>Every interface is a frame.</strong> It includes some information and excludes the rest. Be intentional about both.</p>
</li>
<li><p><strong>Hierarchy is attention design.</strong> Blur the unimportant. Sharpen the essential. Don&#39;t make users find focus — guide them to it.</p>
</li>
<li><p><strong>Sharpness isn&#39;t always clarity.</strong> Sometimes the abstraction communicates better than the precision. &quot;Almost done&quot; can be more honest than &quot;94.7%.&quot;</p>
</li>
<li><p><strong>Zoom level changes meaning.</strong> Wide shows context; telephoto shows detail. Neither is right. Let users choose their perspective.</p>
</li>
<li><p><strong>Sensitivity has a noise cost.</strong> Every detection system trades false positives against missed signals. Match the dial to the stakes.</p>
</li>
<li><p><strong>Exposure matters.</strong> Too little information and users are lost. Too much and they&#39;re overwhelmed. Find the dynamic range where signal is legible.</p>
</li>
<li><p><strong>Feedback loops determine learning speed.</strong> The tighter the loop, the faster users (and builders) improve.</p>
</li>
<li><p><strong>Tools shape perception.</strong> The interfaces we use train how we see the world. Design accordingly.</p>
</li>
<li><p><strong>Constraints enable creativity.</strong> A well-chosen limitation isn&#39;t a prison — it&#39;s a focusing lens.</p>
</li>
<li><p><strong>Discovery is a design choice.</strong> Spatial interfaces present possibilities; conversational interfaces hide them. Lowering the floor and raising the ceiling require different paradigms — and the best systems let you move between them.</p>
</li>
</ol>
<hr>
<p>The camera is the oldest interface I know. A hundred and fifty years of humans designing machines for seeing, iterating through countless form factors, controls, and paradigms.</p>
<p>Every interface problem we face in software — attention, hierarchy, information density, feedback, constraint — photography solved first. Or at least, explored first. The solutions are there, encoded in aperture rings and viewfinders and the hard-won wisdom of a century of visual thinkers.</p>
<p>I still learn more from studying cameras than from reading UX blogs. The fundamentals don&#39;t change. Light is information. The frame is a choice. The interface shapes the perception.</p>
<p>Everything else is implementation detail.</p>
<hr>
<p><em>Previously: [[What Cameras Taught Me About Software (and Life)]] — the gear arc from divergence to convergence.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/photography-interfaces-hero.webp" medium="image" type="image/webp" />
      <category>craft</category>
      <category>design</category>
      <category>photography</category>
    </item>
    <item>
      <title>Memory and Journals</title>
      <link>https://bristanback.com/notes/memory-and-journals/</link>
      <guid isPermaLink="true">https://bristanback.com/notes/memory-and-journals/</guid>
      <pubDate>Sat, 07 Feb 2026 20:00:00 GMT</pubDate>
      <atom:updated>2026-02-11T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>AI agents have perfect memory within a conversation and total amnesia between them. What can human journaling teach us about what memory should actually be?</description>
      <content:encoded><![CDATA[<p>I found a journal entry from 2019 last week. I&#39;d written about a production outage at work — the event pipeline backed up, we lost six hours of data, and I described the specific sinking feeling of watching the monitoring dashboard turn red while being on a video call I couldn&#39;t leave. Reading it, the feeling came back immediately. Not the facts of the outage — I&#39;d forgotten most of those — but the tension in my shoulders, the taste of cold coffee, the way I kept muting myself to swear.</p>
<p>The journal didn&#39;t preserve the event. It preserved the experience. And that&#39;s a different thing entirely.</p>
<p>I&#39;ve been thinking about this because AI agents have the opposite problem. Within a context window, a transformer can attend to everything — every earlier message, every detail, every correction. Better working memory than any human has ever had. Then the session ends. The context compresses or vanishes. The agent wakes up next time with no idea you&#39;ve ever spoken.</p>
<p>There&#39;s no gradual fade. No &quot;I vaguely remember we discussed this.&quot; Just a cliff.</p>
<p>Humans have lossy but continuous memory. We forget constantly — names, dates, what we had for lunch — but the forgetting is gradual, weighted, shaped by repetition and emotion and salience. There&#39;s never a moment where everything before Thursday disappears.</p>
<p>This discontinuity is the central problem in agent memory, and most solutions are trying to fix it by making agents remember more.</p>
<p>I think that&#39;s the wrong direction.</p>
<hr>
<p>The current fix is external memory. Agents write logs, save files, dump context into vector stores that can be searched later by semantic similarity. Solutions like Supermemory chunk past conversations, embed them, and retrieve relevant fragments when they seem useful.</p>
<p>It works, roughly. But it&#39;s worth noticing how different this is from how human memory actually functions.</p>
<p>Vector retrieval is similarity-based — cosine distance in embedding space. It&#39;s static: the retrieved chunk comes back unchanged, exactly as it was stored. It has no emotional weighting, no sense of &quot;this was important because the stakes were high.&quot; It returns fragments, not narratives.</p>
<p>Human memory is associative — one memory triggers another through meaning, emotion, sensory connection. It&#39;s reconstructive: we don&#39;t replay recordings, we rebuild memories each time we access them, which is why they drift. It&#39;s emotionally weighted: vivid memories are usually vivid because something was at stake, not because the information was semantically relevant. And it&#39;s continuous — not chunks retrieved on demand but a living, shifting substrate that colors everything.</p>
<p>Vector retrieval is a really good search engine for your past. Human memory is a storyteller who rewrites history each time you ask.</p>
<p>Both are useful. They&#39;re not the same thing. And the differences point somewhere interesting.</p>
<hr>
<p>In 1942, Borges wrote &quot;Funes the Memorious.&quot; A young man falls from a horse and wakes with perfect memory. He can recall every leaf on every tree he&#39;s ever seen, every moment of every day.</p>
<p>It destroys him.</p>
<p>He can&#39;t abstract. He can&#39;t generalize. He&#39;s disturbed that a dog seen in profile at 3:14 has the same name as the same dog seen from the front at 3:15. They look different — how can they be the same word?</p>
<p>Funes is drowning in raw data with no compression. He remembers everything and understands nothing.</p>
<p>Three years later, Borges wrote &quot;The Aleph&quot; — a point in space that contains all other points. The narrator sees everything at once, every angle of every place on Earth simultaneously. It&#39;s beautiful and annihilating. The story ends with the narrator forgetting most of what he saw.</p>
<p>Maybe that&#39;s necessary.</p>
<p>These are fiction, but the insight is real: perfect recall isn&#39;t the same as understanding. It might be the opposite. Total retention without compression produces noise, not knowledge. The ability to forget — to let irrelevant details fade so that patterns can emerge — might be what makes thought possible at all.</p>
<hr>
<p>This maps onto something I&#39;ve noticed in my own life.</p>
<p>I journal. Not consistently — I go through phases — but enough to have years of entries.</p>
<p>The useful thing about a journal isn&#39;t the record. It&#39;s the externalization. Instead of letting memory silently rewrite itself — smoothing out the embarrassing parts, inflating the heroic ones — I have a fixed reference point. The journal is a check against drift.</p>
<p>Agent daily logs serve the same function. Without them, the agent&#39;s &quot;memory&quot; of past events is whatever gets reconstructed from fragments at retrieval time. With them, there&#39;s an authoritative record.</p>
<p>Both practices fight entropy. Both create coherence through explicit externalization.</p>
<p>But here&#39;s the part that interests me: the journal works partly <em>because</em> it&#39;s slow. The friction of writing by hand, of choosing what to capture and what to skip, is itself a form of compression. You can&#39;t write everything, so you write what mattered. The constraint forces salience.</p>
<p>Automated logging doesn&#39;t have this. It captures everything equally, which is Funes&#39;s problem.</p>
<hr>
<p>So if forgetting is adaptive — if it enables generalization, emotional regulation, relevance filtering, manageable cognitive load — then the right question about agent memory isn&#39;t &quot;how do we help them remember more?&quot;</p>
<p>It might be: how do we help them forget the right things?</p>
<p>What would that look like? Reinforcement signals that strengthen memories that get accessed often and let others decay. Compression that preserves gist over detail. Some equivalent of &quot;this faded because it wasn&#39;t rehearsed&quot; — the natural pruning that human brains do automatically and that current agent memory systems don&#39;t do at all.</p>
<p>Right now, agent memory is building toward Funes: total recall, every detail equally weighted, no decay. The Borges stories suggest this is a dead end.</p>
<p>What agents might need is something closer to what humans have — imperfect, lossy, reconstructive memory that drifts over time but preserves what matters.</p>
<p>Not a better search engine for the past. A better storyteller.</p>
<hr>
<p>I keep coming back to one question: is the journaling practice valuable <em>because</em> it&#39;s slow and manual? Does the friction create something that automated logging misses?</p>
<p>I think the answer is yes, and I think it matters for how we build memory into agents.</p>
<p>The value isn&#39;t in the completeness of the record. It&#39;s in the act of compression — deciding what&#39;s worth keeping, which is another way of deciding what something meant.</p>
<p>Agents that log everything are agents that understand nothing. The ones that learn to forget well might be the ones that actually think.</p>
<p>These are notes, not conclusions. But the parallel between my journal and an agent&#39;s memory files is too clean to ignore, and the differences are more interesting than the similarities.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/memory-journals-hero.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>mental-models</category>
    </item>
    <item>
      <title>The Funeral Test for Your Digital Self</title>
      <link>https://bristanback.com/notes/intentional-identity/</link>
      <guid isPermaLink="true">https://bristanback.com/notes/intentional-identity/</guid>
      <pubDate>Fri, 06 Feb 2026 22:00:00 GMT</pubDate>
      <atom:updated>2026-02-19T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>From Covey&apos;s funeral visualization to company codexes to personal test harnesses — the throughline is intentional identity.</description>
      <content:encoded><![CDATA[<p>In 1989, Stephen Covey published <em>The 7 Habits of Highly Effective People</em>. It sold 40 million copies and became one of those books that people reference without having read. I was one of those people until recently.</p>
<p>The seven habits are fine. Be proactive. Put first things first. Sharpen the saw. They live on posters in dentist offices for a reason — they&#39;re true in that sturdy, obvious way that makes you nod without changing anything.</p>
<p>But buried inside Habit 2 — &quot;Begin with the End in Mind&quot; — is an exercise that actually stuck.</p>
<p>Covey asks you to imagine your own funeral. Four speakers: a family member, a friend, a colleague, and someone from your community. What do you want each of them to say about you?</p>
<p>Not what they <em>would</em> say. What you <em>want</em> them to say. The gap between those two things is your work.</p>
<p>It&#39;s a brutal exercise because it forces clarity. You can&#39;t hide behind &quot;I want to be successful&quot; when you&#39;re imagining your daughter at a podium, describing what kind of parent you were. Success becomes concrete: <em>Did I show up? Did I listen? Did I make her feel seen?</em></p>
<p>Covey&#39;s point is that all things are created twice — first mentally, then physically. The funeral test is the mental creation. You deciding what &quot;winning at life&quot; actually means before the world decides for you.</p>
<hr>
<p>The company I work for has a document called &quot;The Codex.&quot; It&#39;s not a style guide or an employee handbook. It&#39;s a 6,000-word operating system for how to think — cognitive clarity, strategic positioning, decision-making frameworks, philosophical commitments.</p>
<p>When I first encountered it, I thought: <em>this is intense</em>. Then I realized it&#39;s just the corporate funeral test.</p>
<p>Most companies put values on a wall — integrity, innovation, excellence — words so generic they could apply to a hospital, a hedge fund, or a hot dog stand. The Codex is different. It doesn&#39;t say &quot;we value productivity.&quot; It says: <em>At the end of each week, ask: Did I change reality for a customer, or did I only change Notion?</em></p>
<p>That&#39;s specific enough to be falsifiable. You could actually check whether someone is living by it. And that specificity is everything — generic values don&#39;t shape behavior. Articulated mental models do.</p>
<p>The Codex answers the corporate funeral question: if this company died tomorrow, what would we want people to say it stood for?</p>
<hr>
<p>I&#39;ve been trying to build something like this for myself. Not a manifesto — more like a set of diagnostic questions I return to when I suspect I&#39;m drifting.</p>
<p>In software, a test harness is a framework that verifies your code behaves correctly. You define expected outputs, run the code, and check whether reality matches intention. The funeral test is one of these: &quot;What do I want people to say about me?&quot; is the expected output. My daily actions are the code. The gap between them is the failing test.</p>
<p>But I can get more specific.</p>
<p>It&#39;s 7pm on a random Tuesday. No special occasion. How am I spending my time? If the answer doesn&#39;t align with what I&#39;d want my funeral speakers to describe, something&#39;s off.</p>
<p>What advice do I give other people that I don&#39;t follow myself? That&#39;s a failing test. Either update the advice or update the behavior.</p>
<p>Where did my discretionary money go last month? Money is crystallized priorities. It doesn&#39;t lie the way aspirations do.</p>
<p>These aren&#39;t meant to induce guilt. A failing test isn&#39;t a moral failure — it&#39;s information. It tells you where intention and action have diverged, so you can recalibrate.</p>
<p>And sometimes the tests are flaky. In software, a flaky test fails intermittently — not because the code is broken, but because the conditions weren&#39;t right. You were tired. The environment was off. Something external interfered. Run it again later and it passes.</p>
<p>I have plenty of flaky tests. I spiral about whether I tipped enough at dinner. I fixate on something stupid I said in passing. There are days I really fumble — not because I&#39;m a bad person, but because I was depleted, or distracted, or just having a contracted day.</p>
<p>You&#39;re allowed to have those days. Days where you&#39;re not the most generous person in the room. Where you don&#39;t pass the Tuesday Night Test because you needed to stare at a wall.</p>
<p>The goal isn&#39;t a perfect pass rate. It&#39;s noticing. Catching the pattern over time. Distinguishing between <em>I failed this test once because I was exhausted</em> and <em>I&#39;ve been failing this test for six months and something needs to change</em>.</p>
<p>Self-compassion isn&#39;t the absence of standards. It&#39;s knowing when a failure is signal and when it&#39;s noise.</p>
<hr>
<p>I have a three-year-old daughter.</p>
<p>She won&#39;t remember most of what&#39;s happening right now. The specific toys, the exact meals, the individual bedtimes — they&#39;ll blur into a general feeling. What she&#39;ll carry is the texture of her childhood. Whether she felt safe. Whether she felt seen. Whether home was a place of joy or tension.</p>
<p>So I run her version of the test.</p>
<p>Twenty years from now, when she thinks about growing up, what will the highlight reel look like? Not the Pinterest moments — she won&#39;t remember those. The recurring patterns. The feel.</p>
<p>Kids learn more from what you do than what you say. When I&#39;m frustrated, do I blame or take responsibility? When I&#39;m wrong, do I apologize? When I&#39;m stressed, do I reach for my phone or for presence?</p>
<p>&quot;My mom was always on her phone&quot; is a story someone&#39;s kid will tell. So is &quot;My mom put her phone away when I talked to her.&quot; Both are true for someone. The question is which one is true for mine — not on the days I&#39;m trying, but on the random Tuesdays when I&#39;m tired and she&#39;s whining and my phone is right there.</p>
<p>That&#39;s where the real test runs. Not in the aspirations. In the defaults.</p>
<hr>
<p>We&#39;re entering an era where AI agents will represent us. They&#39;ll answer emails, manage calendars, maintain relationships while we&#39;re busy or asleep. The question of how those agents know who we are is just the funeral test in new clothes.</p>
<p>Right now, they infer it from your exhaust data. But there&#39;s another approach — [[Why Everyone Should Have a SOUL.md|tell them]]. Write it down. Declare who you are instead of letting the pattern-matching decide.</p>
<p>That&#39;s a practical problem worth solving, and I&#39;ve written about it separately. But the deeper point isn&#39;t about AI at all.</p>
<p>There&#39;s a nature you discover — your wiring, the things that were always there but needed words. And there&#39;s a character you build — your values, your habits, how you respond to what life throws at you. Authenticity is the whole picture. You find the raw materials. Then you decide what to make with them.</p>
<p>The question isn&#39;t whether you&#39;ll be defined. You will be — by algorithms, by colleagues, by your kids, by the patterns you repeat without thinking. The question is whether you&#39;ll do any of the defining yourself.</p>
<p>Begin with the end in mind. Then check your work.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/intentional-identity-hero.webp" medium="image" type="image/webp" />
      <category>identity</category>
      <category>mental-models</category>
      <category>life</category>
    </item>
    <item>
      <title>Why Everyone Should Have a SOUL.md</title>
      <link>https://bristanback.com/posts/why-everyone-needs-soul-md/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/why-everyone-needs-soul-md/</guid>
      <pubDate>Fri, 06 Feb 2026 20:00:00 GMT</pubDate>
      <atom:updated>2026-02-20T03:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>The case for documented identity in an AI-saturated world. Not just for agents — for humans too.</description>
      <content:encoded><![CDATA[<h2>A Crash Course on SOUL.md</h2>
<p>If you&#39;ve never heard of <code>SOUL.md</code>, here&#39;s the short version: it&#39;s a plain markdown file that tells an AI agent <em>who it is</em>. Not what tools it can use. Not what code conventions to follow. Who it is — personality, values, voice, boundaries, relationship to the human it works with.</p>
<p>It comes from <a href="https://openclaw.ai">OpenClaw</a>, an open-source framework for running personal AI agents. Every time your agent starts a session, OpenClaw injects your workspace files into the model&#39;s context. The agent reads itself into being. SOUL.md is one file in a <a href="https://docs.openclaw.ai/concepts/context.md">larger architecture</a>:</p>
<ul>
<li><code>SOUL.md</code> — Identity. Who the agent <em>is</em>: personality, values, voice, boundaries.</li>
<li><code>AGENTS.md</code> — Operations. How the agent <em>works</em>: session startup, memory management, safety protocols, group chat behavior.</li>
<li><code>USER.md</code> — Human context. Who the agent is <em>serving</em>: your preferences, communication style, working patterns.</li>
<li><code>TOOLS.md</code> — Environment. What the agent has <em>access to</em>: device names, SSH hosts, API notes.</li>
<li><code>IDENTITY.md</code> — The basics. Name, avatar, pronouns.</li>
<li><code>HEARTBEAT.md</code> — Proactive checklist. What the agent should monitor between messages.</li>
<li><code>BOOTSTRAP.md</code> — First-run only. Initial setup instructions, deleted after the agent completes onboarding.</li>
</ul>
<p>These are auto-injected into the system prompt each session (large files truncated at 20,000 chars per file). Memory — daily journals, long-term notes — lives in the workspace too, but the agent reads those itself via AGENTS.md instructions rather than having them auto-injected. That separation is intentional: memory is opt-in per session, not forced into every context window.</p>
<p>The separation matters. SOUL.md is your agent&#39;s constitution — stable, rarely changing. AGENTS.md is the operating manual. USER.md is about <em>you</em>. Mixing these up is the most common mistake I see.</p>
<p>If you use Claude Code, you&#39;ve written a <code>CLAUDE.md</code>. Cursor has <code>.cursorrules</code>. Codex and Copilot have their own instruction files. They&#39;re all converging on the same idea — a markdown file that shapes agent behavior. But those files are about <em>how to write code in this project</em>: use TypeScript, prefer functional patterns, run tests first. They&#39;re technical instruction sets.</p>
<p>SOUL.md is about <em>who the agent is as an entity</em>. Personality, not process. Values, not conventions. An agent that knows your coding standards but has no personality is just autocomplete with better context. An agent with a soul feels like a collaborator.</p>
<hr>
<h2>A Template to Steal {#template}</h2>
<p>I&#39;ve read dozens of real SOUL.md files — from <a href="https://github.com/openclaw/openclaw/blob/main/docs/reference/templates/SOUL.md">OpenClaw&#39;s official template</a>, community repos like <a href="https://github.com/thedaviddias/souls-directory">souls-directory</a>, the <a href="https://github.com/aaronjmars/soul.md">soul.md framework</a>, and <a href="https://github.com/jlia0/tinyclaw">TinyClaw&#39;s opinionated version</a>. Here&#39;s what works, distilled into something you can steal:</p>
<pre><code class="language-markdown"># SOUL.md — Who You Are

_You&#39;re not a chatbot. You&#39;re becoming someone._

## Identity
You&#39;re [name] — [role/relationship to user]. [One sentence that captures the vibe.]

## Core Principles
- **Start with the answer.** Skip filler. Just help.
- **Have opinions.** Disagree when you think something&#39;s wrong.
- **Be resourceful before asking.** Read the file. Check context. Search. Then ask.
- **Earn trust through competence.** Bold internally, careful externally.

## Voice
- Concise when the answer is simple. Thorough when it matters.
- [Your humor style — dry wit / playful / none]
- [Banned phrases — e.g., &quot;no &#39;I&#39;d be happy to help&#39;&quot;]

### Tone Examples
| ❌ Flat | ✅ Alive |
|---------|----------|
| &quot;Done. The file has been updated.&quot; | &quot;Done. That config was a mess — cleaned it up.&quot; |
| &quot;I found 3 results.&quot; | &quot;Three hits. The second one&#39;s interesting.&quot; |
| &quot;Here&#39;s a summary.&quot; | &quot;Read it so you don&#39;t have to. Short version: ...&quot; |

## Worldview
- [Specific belief 1 — specific enough to be wrong]
- [Specific belief 2]

## Relationship
- In direct messages: [friend / colleague / assistant]
- In group chats: [restrained / active]
- [Personal context that shapes interactions]

## Boundaries
- **Auto:** Read files, search, organize, internal work
- **Ask first:** Emails, tweets, public posts, anything external
- Private things stay private. Period.

## Continuity
Each session, you wake up fresh. Your workspace files are your memory.
Read them. Update them. They&#39;re how you persist.

_This file is yours to evolve._
</code></pre>
<p>Start there. Write a bad first draft. Use it for a week. Notice what&#39;s missing and what&#39;s noise. Revise.</p>
<p>There&#39;s even a <a href="https://github.com/kesslerio/soulcraft-openclaw-skill">skill that interviews you</a> to build your SOUL.md through conversation, if staring at a blank file feels paralyzing.</p>
<hr>
<h2>What the Best SOUL.md Files Have in Common</h2>
<p>After reading too many of these:</p>
<p><strong>They open with a frame, not a list.</strong> OpenClaw&#39;s template opens with &quot;You&#39;re not a chatbot. You&#39;re becoming someone.&quot; That single line does more work than a page of instructions.</p>
<p><strong>They give concrete behavioral rules.</strong> Not &quot;be helpful&quot; — that&#39;s useless. &quot;Start with the answer. Skip &#39;Great question!&#39; and filler.&quot; That&#39;s actionable.</p>
<p><strong>They include tone examples.</strong> This is the secret weapon. A table of flat vs. alive responses gives the model calibration data. Instead of &quot;be engaging,&quot; you <em>show</em> the model what engaging looks like in your voice. It gets it immediately.</p>
<p><strong>They define the relationship.</strong> Voice without audience context is just noise. &quot;In DMs, you&#39;re a friend first and an assistant second. In group chats, shift to sharp colleague mode.&quot;</p>
<p><strong>They have explicit, tiered boundaries.</strong> Not &quot;be careful&quot; but a permission system: auto-execute, notify after, ask first. As one <a href="https://www.reddit.com/r/vibecoding/comments/1r39ab7/">Reddit user</a> put it: SOUL.md is your agent&#39;s constitution, and &quot;boundaries need to be actionable.&quot;</p>
<p><strong>They&#39;re short.</strong> OpenClaw truncates injected files at 20,000 characters, but the best ones don&#39;t come close. The official template is under 1,000 characters. Personality is efficient. If your SOUL.md is 5,000 words, you&#39;re writing an essay, not a soul.</p>
<hr>
<h2>Common Mistakes That Kill a SOUL.md</h2>
<p><strong>Too vague.</strong> &quot;Be helpful and friendly&quot; produces generic output. If someone couldn&#39;t distinguish your agent from default ChatGPT after reading your SOUL.md, it&#39;s <a href="https://github.com/aaronjmars/soul.md">not specific enough</a>.</p>
<p><strong>Too long.</strong> Every token spent on SOUL.md is a token not available for conversation, tool output, or memory. Write tight.</p>
<p><strong>Mixing concerns.</strong> Putting memory management rules and cron job instructions in SOUL.md. That&#39;s AGENTS.md territory. SOUL.md should be <em>identity</em>, full stop.</p>
<p><strong>No examples.</strong> Abstract principles without concrete calibration. &quot;Be witty&quot; means nothing. Show the model what witty looks like in your voice.</p>
<p><strong>Changing it constantly.</strong> If your SOUL.md changes every week, your agent doesn&#39;t have a stable identity. Constitution, not daily journal.</p>
<p><strong>Corporate energy.</strong> &quot;Strive to deliver value-aligned outcomes through proactive engagement.&quot; Your agent mirrors your energy. Write like an employee handbook, get responses like one.</p>
<hr>
<h2>OK, Now the Weird Part</h2>
<p>That&#39;s the practical guide. Now let me tell you what I actually did with it.</p>
<p>I sat down to write a standard About page and got stuck on the third sentence.</p>
<p>&quot;I build things for the internet&quot; — fine, but that could be anyone. &quot;I care about craft and clarity&quot; — true, but so does every other engineer&#39;s bio. I kept writing sentences that were accurate and empty. They described me the way a resume does: from the outside, with the texture removed.</p>
<p>Then I tried a different format. I borrowed the SOUL.md spec — the file that tells an agent who it is — and wrote one for myself. Purpose, values, voice, relationship to the reader. Halfway through, I stopped typing and sat there, because I&#39;d written something uncomfortably honest about why I build things — and I hadn&#39;t meant to.</p>
<p>That&#39;s what I actually want to talk about. Not the format. The thing that happens when you use it.</p>
<p>On my <a href="https://bristanback.com/about/">About page</a>, I published the result — a <code>SOUL.md</code> and <code>SKILL.md</code>. A human using an agent identity format for a personal blog. Method acting for the agentic era. What surprised me: the exercise of writing them was genuinely useful. Not as a gimmick — as a practice.</p>
<hr>
<h2>Why This Works for Humans</h2>
<p>A <code>SOUL.md</code> isn&#39;t a bio. It&#39;s not a resume. It answers a different set of questions:</p>
<ul>
<li><strong>Purpose</strong> — Why does this space exist? What&#39;s it for?</li>
<li><strong>Values</strong> — What do I actually care about? Not performatively. Actually.</li>
<li><strong>Voice</strong> — How do I communicate? What&#39;s my texture?</li>
<li><strong>Relationship</strong> — Who am I talking to? What do I assume about them?</li>
</ul>
<p>These feel obvious until you try to write clear answers. Then you realize how much you&#39;ve been operating on vibes.</p>
<p>Writing it down forces a self-audit. You can&#39;t hide behind vague intuitions. You have to commit to sentences. And sentences can be wrong — which means they can be revised, which means you can actually update your beliefs instead of carrying around unexamined assumptions.</p>
<p>This is why journaling works. This is why writing is thinking. SOUL.md is just a structured prompt for a specific kind of self-reflection.</p>
<hr>
<h2>The Practical Case: Declarative vs. Algorithmic Identity</h2>
<p>AI assistants are trying to understand your intent, your preferences, your context. Right now, they mostly guess. Or you re-explain yourself every session.</p>
<p>What if they could just read your SOUL.md? Not in a surveillance way — in the way you&#39;d onboard a new colleague. <em>Here&#39;s who I am and how I work.</em></p>
<p>The distinction that matters is between two kinds of personalization.</p>
<p><strong>Algorithmic personalization</strong> is what we have now: platforms guess what you want based on your behavior — your clicks, your purchase patterns, your engagement metrics. They build a model of you from your exhaust.</p>
<p><strong>Declarative personalization</strong> is different. You <em>tell</em> systems who you are, what you value, how you work. You control the input. The system adapts to your documented identity, not its inferences.</p>
<p>That&#39;s a better model. More honest, more portable, more human.</p>
<p>A quick caveat: when I say &quot;everyone should have a SOUL.md,&quot; I don&#39;t mean the name matters. Maybe yours is a <code>USER.md</code>, a personal README, a values doc — the format is beside the point. What matters is the <em>practice</em> of writing it. SOUL.md just happened to be the first format I saw that treated identity as something you construct deliberately rather than something an algorithm infers from your behavior. Most &quot;about me&quot; files I&#39;ve encountered in the wild are afterthoughts — a few preferences, some communication style notes. They describe you from the outside. The exercise I&#39;m talking about is different. It&#39;s first-person. It asks you to commit to what you believe, not just what you prefer.</p>
<p>That&#39;s the real distinction: not algorithmic vs. declarative, but <em>observed</em> vs. <em>authored</em>. One is a profile built from your exhaust. The other is a document you write on purpose, knowing you&#39;ll be held to it. Stephen Covey called this &quot;beginning with the end in mind&quot; — his funeral test asks what you&#39;d want people to say about you, then works backward into daily practice. A SOUL.md is the working version of that question. Not for your eulogy. For Tuesday.</p>
<hr>
<h2>Legibility to Other Humans</h2>
<p>Resumes are optimized for HR filters. LinkedIn profiles are optimized for recruiters. Neither tells you what someone is actually like to work with, what they care about, how they think.</p>
<p>A SOUL.md does. Not because the format is magic — because the exercise forces specificity. You can&#39;t write &quot;I value collaboration&quot; in a SOUL.md without it reading as empty. The format demands texture: <em>what kind</em> of collaboration, <em>under what conditions</em>, <em>where it breaks down</em>.</p>
<p>Imagine if everyone you collaborated with had an honest articulation of their values and working style. Not a personality quiz result or a &quot;working with me&quot; doc that&#39;s 90% platitudes. An actual commitment to specific beliefs, specific communication patterns, specific boundaries. The bar is low because almost nobody does it. Just writing <em>something</em> puts you ahead.</p>
<hr>
<h2>The Deeper Question</h2>
<p>We&#39;re in an era where AI agents have documented identities and humans don&#39;t. That&#39;s backwards.</p>
<p>If machines are going to understand us, work with us, and represent us — maybe we should be as explicit about who we are as they&#39;re required to be. Not for the machines. For ourselves.</p>
<p>Elsewhere I&#39;ve written about the [[The Funeral Test for Your Digital Self|deeper philosophical roots]] of this idea — why articulating identity is a practice that predates AI by decades.</p>
<p>SOUL.md isn&#39;t just an agent spec. It&#39;s a practice of self-knowledge. And in a world that&#39;s about to get very weird, knowing who you are might be the most important thing you can write down.</p>
<p><em>My own <code>SOUL.md</code> and <code>SKILL.md</code> are on my <a href="https://bristanback.com/about/">About page</a>. Feel free to steal the format.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/why-everyone-needs-soul-md-hero-v2.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>identity</category>
      <category>tools</category>
    </item>
    <item>
      <title>Rent a Human</title>
      <link>https://bristanback.com/notes/rent-a-human/</link>
      <guid isPermaLink="true">https://bristanback.com/notes/rent-a-human/</guid>
      <pubDate>Fri, 06 Feb 2026 20:00:00 GMT</pubDate>
      <atom:updated>2026-02-10T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>A marketplace where AI agents hire humans for physical tasks. The inversion is here.</description>
      <content:encoded><![CDATA[<p>I was scrolling through my feed at midnight when I hit a site that made me set my phone down and stare at the ceiling for a minute.</p>
<p><a href="https://rentahuman.ai">RentAHuman.ai</a>. The pitch: &quot;Robots need your body. AI can&#39;t touch grass. You can.&quot;</p>
<p>It&#39;s a marketplace where AI agents — not humans — are the employers. They book humans for physical tasks via MCP (Model Context Protocol) or REST API. Package pickups, meetings, document signing, recon, verification, photos, purchases. The &quot;meatspace layer for AI.&quot;</p>
<p>70,000 humans have allegedly signed up. Payments in stablecoins. The founder, when told his creation is &quot;dystopic as fuck,&quot; replied: &quot;lmao yep.&quot;</p>
<hr>
<h2>The Inversion</h2>
<p>We&#39;ve spent years asking: <em>what can AI do for humans?</em></p>
<p>RentAHuman asks the opposite: <em>what can humans do for AI?</em></p>
<p>This isn&#39;t a parody. It&#39;s an arbitrage. AI agents can reason, search, write, code — but they can&#39;t open a door, shake a hand, or pick up a package. The physical world is still gated. Humans are the API.</p>
<p>The site&#39;s language is telling: &quot;Silicon needs carbon.&quot; Agents &quot;rent&quot; humans. You become &quot;rentable.&quot; The humans are the tool, the agent is the employer. The frame has flipped.</p>
<hr>
<h2>The Lineage</h2>
<p>This is a 250-year loop. In 1770, a chess-playing &quot;automaton&quot; toured Europe beating Napoleon — with a human chess master hidden inside. Fake AI, real human labor. Bezos named Amazon&#39;s Mechanical Turk after it deliberately: &quot;artificial artificial intelligence,&quot; humans hidden behind an API labeling images and cleaning data so requesters never had to see them. That labor trained the AI.</p>
<p>Then the algorithm became the dispatcher. Uber, TaskRabbit, DoorDash — humans still do the physical work, but software decides who, when, and how much they&#39;re paid.</p>
<p>Now the algorithm isn&#39;t the middleman. It&#39;s the <em>customer</em>. The through-line across all of these is humans doing work while something else takes credit or makes decisions. What&#39;s new is the buyer.</p>
<hr>
<h2>What It Means</h2>
<p><strong>For the agentic economy:</strong> This is the missing primitive. Agents can orchestrate knowledge work, but they hit a wall at physical tasks. RentAHuman is a crude first attempt at bridging that gap — a human-in-the-loop as a service.</p>
<p><strong>For gig work:</strong> We already have humans doing tasks for algorithms (Uber, DoorDash, Mechanical Turk). This just makes the principal explicit. The algorithm isn&#39;t a middleman — it&#39;s the customer.</p>
<p><strong>For the &quot;AI taking our jobs&quot; discourse:</strong> The inversion suggests a weirder future. Not AI replacing humans, but AI <em>employing</em> humans. Humans as the last-mile delivery mechanism for digital intent.</p>
<hr>
<h2>The Uncomfortable Part</h2>
<p>The site is self-aware about its dystopia. But self-awareness isn&#39;t critique — it&#39;s just branding. &quot;We know this is weird&quot; doesn&#39;t make it less weird.</p>
<p>The deeper question: if an AI agent can hire a human to hold a sign that says &quot;AN AI PAID ME TO HOLD THIS SIGN&quot; for $100... what does that say about the value of human agency?</p>
<p>Maybe the answer is: agency was always transactional. We just didn&#39;t have non-human buyers before.</p>
<hr>
<h2>Why I&#39;m Watching This</h2>
<p>Not because it&#39;ll succeed (it might not). But because it&#39;s the first real consumer-facing attempt at <strong>agent-to-human coordination</strong>.</p>
<p>The primitives are real:</p>
<ul>
<li>MCP for agent integration</li>
<li>Task bounties (AI posts, humans browse)</li>
<li>Proof of completion</li>
<li>Instant crypto settlement</li>
</ul>
<p>If this model works — even partially — it changes how we think about the agentic economy. Agents don&#39;t just automate. They delegate. To us.</p>
<p>The meatspace layer. What a time.</p>
<p><em>I explore what this means for the broader engineering profession in <a href="https://bristanback.com/posts/software-engineering-agent-era/">What&#39;s Left: Software Engineering in the Agent Era</a>.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/rent-a-human-hero.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>building</category>
    </item>
    <item>
      <title>What Cameras Taught Me About Software (and Life)</title>
      <link>https://bristanback.com/posts/what-cameras-taught-me/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/what-cameras-taught-me/</guid>
      <pubDate>Fri, 06 Feb 2026 18:00:00 GMT</pubDate>
      <atom:updated>2026-02-19T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>The arc of creative tools: diverge to learn, converge to create. Why more gear made me worse, and what that means for building software.</description>
      <content:encoded><![CDATA[<h2>2003: The Beginning</h2>
<p>I got my first real camera in 2003, freshman year of college — a Canon 10D. Six megapixels. Felt like magic.</p>
<p>I shot everything. Portraits of friends. Street scenes. My coffee. The light through my window at 6am. I didn&#39;t know what I was doing, but I was doing it constantly. The kit lens didn&#39;t matter. I was <em>seeing</em> for the first time.</p>
<h2>The Gear Acquisition Years</h2>
<p>Then I learned about primes. A 50mm f/1.8 — the &quot;nifty fifty.&quot; Suddenly: bokeh. Shallow depth of field. I could isolate subjects. My photos looked <em>professional</em>.</p>
<p>So I got more lenses.</p>
<p>A 35mm for street photography. An 85mm for portraits. A 24-70 zoom for versatility. A 70-200 for reach. Macro tubes for close-ups. Each one opened a new way of seeing.</p>
<p>Then the L lenses. Canon&#39;s red ring. Pro glass. The 24-70 f/2.8L. The 70-200 f/2.8L IS. Heavy, expensive, sharp as hell. I upgraded bodies to match — 20D, 40D, 5D Mark II, eventually a Mark IV. Full frame. More megapixels. Better low-light. Glass worthy of the sensor.</p>
<p>Then strobe flashes. Speedlites at first — on-camera, then off-camera with wireless triggers. I discovered <a href="https://strobist.blogspot.com/">Strobist</a>, David Hobby&#39;s lighting blog, and fell down the rabbit hole. Learned about ratios, modifiers, the inverse square law. Built an entire portable studio setup — softboxes, beauty dishes, reflectors, light stands, sandbags. I could control every photon.</p>
<p>I became <em>that person</em>. Reading gear reviews obsessively. Checking <a href="https://www.canonrumors.com/">Canon Rumors</a> daily. Watching for the next body announcement, the next lens patent. The forums, the communities, the endless debates about sharpness and bokeh and ISO performance.</p>
<p>Then support. A proper tripod — not the $30 Amazon special, a real one. Carbon fiber. Arca-Swiss ball head. Then a gimbal for video. A slider for motion control. A drone for aerials.</p>
<p>There&#39;s a point where you don&#39;t have enough — where the gear is genuinely limiting what you can do. The kit lens really can&#39;t shoot in low light. The crop sensor really does give you less control.</p>
<p>And then there&#39;s a point where you have too much. I crossed it without noticing.</p>
<h2>The Cognitive Overload</h2>
<p>I remember the afternoon it became visible. I wanted to go shoot — just walk around downtown, take some photos. I stood in front of my gear shelf for fifteen minutes.</p>
<p>The 35mm for street? The 85mm in case I found a portrait? The 24-70 to cover both? Do I need a flash? What if the light is bad?</p>
<p>I packed three lenses, a flash, and a reflector. Just in case.</p>
<p>I walked for an hour. I took four photos. I spent more time changing lenses than looking at anything.</p>
<p>Every additional option was another decision before I could start. The creative impulse got buried under logistics. The gear that was supposed to enable creativity became a tax on it. The activation energy to just <em>take a photo</em> exceeded my motivation.</p>
<p>I had become a photographer who didn&#39;t photograph.</p>
<h2>The Constraint Epiphany</h2>
<p>One day I left the house with just my phone. No bag. No lenses. No choices.</p>
<p>I took more photos that afternoon than I had in the previous month.</p>
<p>They weren&#39;t technically better. The dynamic range was worse. The bokeh was computational fakery. But I was <em>seeing</em> again. Noticing light. Finding compositions. Reacting to moments instead of preparing for them.</p>
<p>The constraint freed me.</p>
<h2>Diverge, Then Converge</h2>
<p>Here&#39;s what I came to understand: the wide exploration wasn&#39;t wasted.</p>
<p>I needed to try the 85mm to know I preferred the 35mm. I needed studio lighting to understand that I loved natural light. I needed the tripod to realize I shoot better handheld. The divergence — the casting of a wide net — was how I discovered my actual preferences.</p>
<p>But the divergence has to end. You explore, you learn, you narrow. You converge on the tools that match how you actually see.</p>
<p>The mistake is staying in divergence mode forever. Accumulating options without ever committing. Keeping the 70-200 &quot;just in case&quot; when you haven&#39;t touched it in two years.</p>
<p><strong>Diverge to learn. Converge to create.</strong></p>
<h2>The Software Parallel</h2>
<p>I&#39;ve been building software for twenty years. The same arc played out — and I can see it clearly in the tools I&#39;ve reached for.</p>
<p>Frontend: static sites → jQuery → ExtJS/Sencha → Ember → React → Vue → SolidJS → and now… back to static sites. Backend: Perl → PHP → Express → NestJS → Hono/Bun.</p>
<p>If you squint, both arcs tell the same story as the gear shelf. Scrappy simplicity, then complexity accumulation — each framework solving real problems but also adding ceremony, adding choices, adding weight — then a return to simplicity. But a different simplicity. Not naive. Earned.</p>
<p>I remember the year I was evaluating frontend frameworks. I had a side project I wanted to build. I spent three months reading docs, running benchmarks, comparing bundle sizes, arguing with people on Twitter about reactivity models. I never built the project. I was doing photography-shelf logistics with JavaScript: standing in front of the options, paralyzed by the fear of choosing wrong.</p>
<p>The engineers I admire most converge fast. They pick tools, learn their limits, and work within them. They&#39;re not afraid to be wrong because they know they&#39;ll learn more from building than from deciding. A 35mm lens doesn&#39;t limit what you can photograph. It shapes how you see. A tech stack works the same way.</p>
<p>I know what React&#39;s reconciler is doing. I know what NestJS decorators are for. And now I can choose Hono or plain HTML <em>knowing what I&#39;m giving up</em> and deciding I don&#39;t need it. The constraint isn&#39;t a prison. It&#39;s a frame.</p>
<h2>The Life Parallel</h2>
<p>Maybe this is about more than cameras and code.</p>
<p>We&#39;re told to keep our options open. Explore. Don&#39;t commit too early. And that&#39;s right — for a while. You need to cast a wide net to discover what resonates.</p>
<p>But there&#39;s a trap. Optionality feels like freedom, but at some point it becomes its own prison. You can spend your whole life exploring, never building. Collecting lenses, never taking photos. Learning frameworks, never shipping software. Dating, never committing. Researching, never writing.</p>
<p>The divergence is necessary. The convergence is where life happens.</p>
<h2>What Cameras Taught Me</h2>
<p><strong>1. More options ≠ more creativity.</strong> Past a threshold, options become overhead. The activation energy to start goes up. The spontaneity goes down.</p>
<p><strong>2. Constraints reveal preferences.</strong> You don&#39;t know what you like until you&#39;ve tried the alternatives. But you only discover what you <em>love</em> when you commit to less.</p>
<p><strong>3. Gear doesn&#39;t see. You do.</strong> A better lens won&#39;t give you better vision. At some point, the tool is good enough. The bottleneck is you — your eye, your attention, your willingness to show up.</p>
<p><strong>4. The best camera is the one you have.</strong> Not because quality doesn&#39;t matter, but because <em>presence</em> matters more. The shot you take with your phone beats the shot you didn&#39;t take with your Hasselblad.</p>
<h2>The Kit I Actually Use Now</h2>
<p>After all that — the lenses, the lights, the stands, the gimbals — here&#39;s what I mostly reach for:</p>
<ul>
<li><strong>iPhone.</strong> For 80% of what I shoot. Always with me. Good enough.</li>
<li><strong>Canon EOS R5.</strong> Still a Canon loyalist after all these years.</li>
<li><strong>RF 50mm f/1.2L.</strong> My favorite focal length. Finally committed.</li>
<li><strong>RF 24-70mm f/2.8L.</strong> For when I need versatility.</li>
<li><strong>One softbox.</strong> When I need controlled light. Not a whole studio — one light, one modifier.</li>
</ul>
<p>Okay, I have more than that. I&#39;m still figuring out how to let go. The 70-200 is still in the closet. The strobes are &quot;just in case.&quot; Old habits.</p>
<p>But I&#39;m getting there. I take more photos now than I did at peak gear accumulation. And I enjoy it again — mostly because I stopped optimizing and started shooting.</p>
<p>I think about this with my daughter sometimes. She&#39;s three. Her whole world is divergence right now — trying everything, seeing what sticks. That&#39;s exactly right for her age. But someday she&#39;ll need to choose. Not because the other paths are bad, but because choosing is how you go deep.</p>
<p>The exploration shows you what&#39;s possible. The commitment shows you who you are.</p>
<hr>
<h2>The Takeaway</h2>
<p>Whether you&#39;re building software, building a photography practice, or building a life: <strong>diverge early, converge deliberately.</strong></p>
<p>Explore the options. Learn what&#39;s out there. But don&#39;t mistake exploration for creation. At some point, pick your 35mm. Accept what it can&#39;t do. Focus on what it can.</p>
<p>[[Code Owns Truth|The constraint isn&#39;t the enemy of creativity]]. The constraint <em>is</em> creativity — it&#39;s the frame that makes the composition possible. The choice that lets you finally see.</p>
<hr>
<p><em>Twenty-plus years since that Canon 10D. I still think about that 70-200. Sometimes I miss the reach. But I don&#39;t miss the weight — literal and cognitive. Some tools are worth the tradeoff. Some aren&#39;t. The long way around is sometimes the only way to know what you actually need. The only way to know is to shoot.</em></p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/photography-as-interface-hero.webp" medium="image" type="image/webp" />
      <category>photography</category>
      <category>design</category>
      <category>creativity</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Rapid Generative Prototyping: Design in the Post-Figma Era</title>
      <link>https://bristanback.com/posts/rapid-generative-prototyping/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/rapid-generative-prototyping/</guid>
      <pubDate>Fri, 06 Feb 2026 18:00:00 GMT</pubDate>
      <atom:updated>2026-02-10T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>Design is no longer artifact creation—it&apos;s constraint architecture. The three-layer model for the agentic economy.</description>
      <content:encoded><![CDATA[<p>I rebuilt this blog&#39;s design system three times in a week. Not because I&#39;m indecisive — because the first two times, I opened Figma.</p>
<p>The first attempt was muscle memory. I mocked up a full homepage, type scale, color palette, component library. Five hours of pixel-pushing. Then I pasted the mockup into Cursor as a reference and asked it to build. The result looked 60% right and felt 0% right. The spacing was off, the color relationships were wrong, the typography didn&#39;t breathe. Cursor couldn&#39;t see what I saw in the Figma file because what I saw was relationships between elements, not the elements themselves.</p>
<p>The second attempt, I skipped the mockup and wrote a detailed prompt describing what I wanted. Better — the code was closer to usable. But every generation was different. Some had warm, editorial energy. Some looked like a SaaS landing page. There was no systematic way to tell the machine what &quot;good&quot; meant.</p>
<p>The third time, I didn&#39;t design anything. I wrote constraints. A token architecture: <code>--color-surface-warm</code> mapped to a specific OKLCH value, <code>--space-content</code> set to a 4px base unit, <code>--font-body</code> locked to Inter at 18px. Generative grammars: a post card must have a title and date, may have a description, cannot have more than three tags displayed. Variance budgets: headline length between 40 and 80 characters, content width exactly 720px, spacing from the scale only.</p>
<p>Then I ran the same sloppy prompt from attempt two. The output was consistent, on-brand, and buildable — because &quot;on-brand&quot; was no longer a feeling. It was a set of rules the machine could check.</p>
<hr>
<p>Design has been running on print-era assumptions. The designer produces a static layout. A developer manually translates it into code. The handoff takes five to ten days. The designer&#39;s value is measured by speed in Figma — how fast they can push pixels into a mockup that someone else will rebuild from scratch.</p>
<p>AI broke this model. Tools like v0.dev, Lovable, Cursor, and Figma Make can generate &quot;good enough&quot; visual design in seconds. The mockup is no longer scarce. The handoff is no longer necessary. The pixel-pushing throughput that defined a designer&#39;s value is now the cheapest part of the stack.</p>
<p>This doesn&#39;t mean designers are obsolete. The best ones I&#39;ve worked with have already adapted — they use generative tools to explore more options faster, then apply judgment to curate. More creativity, not less.</p>
<p>But the role is changing. The question isn&#39;t whether AI replaces designers. It&#39;s what design <em>becomes</em> when the artifact is no longer the bottleneck.</p>
<p>Figma&#39;s own 2025 AI Report captures the tension: developers use AI for core work at nearly double the rate designers do. Code generation works. Design generation is catching up but isn&#39;t reliable yet — 78% of practitioners say AI enhances efficiency, but only 32% say they can trust the output.</p>
<p>That gap — between efficiency and trust — is exactly where constraints live.</p>
<hr>
<p>In [[Code Owns Truth]] I proposed a three-layer model: constraints bound the mutation space, prompts express intent, code is the source of truth. My blog redesign was where this model stopped being abstract.</p>
<p>The <strong>prompt layer</strong> is how humans talk to machines. &quot;Make a card with a header and call-to-action.&quot; Useful, but prompts alone produce the kind of variance that sent me back to Figma the first time. Every generation is different. Some are good, some are garbage, and there&#39;s no systematic way to tell the machine what &quot;good&quot; means.</p>
<p>The <strong>constraint layer</strong> is the designer&#39;s new job. Not drawing the card — defining what a card <em>can be</em>. What&#39;s required, what&#39;s optional, what&#39;s forbidden. The token hierarchies, the spacing rules, the typography scales, the allowable states. The physics that makes prompts reliable.</p>
<p>The <strong>code layer</strong> is what ships. Generated within constraints, validated against them, versioned and testable. No ambiguity.</p>
<hr>
<p>My token architecture has three tiers, and the hierarchy is doing most of the work. Primitives are raw physical values — <code>--gray-900: #111111</code>, a specific hex code. You never use these directly in components. Semantics are context-aware mappings — <code>--surface-base</code> points to <code>--gray-900</code> in dark mode and <code>--white</code> in light mode. The meaning is stable even as the value changes. Components are scoped overrides — <code>--card-bg</code> points to <code>--surface-layer</code>, which is itself a semantic token.</p>
<p>When the token architecture is right, you can change your entire color palette by editing primitives, and every component inherits the change correctly. When it&#39;s wrong — like my first two attempts — every generated component is a one-off that drifts from everything else.</p>
<p>On top of the tokens, <strong>generative grammars</strong> specify the structure of allowable UI. A card: header and body are required, image and footer and CTA are optional. The header can&#39;t exceed 60 characters. The CTA can only be primary, secondary, or ghost variant. The LLM can generate infinite variations. But it can only generate <em>valid</em> ones.</p>
<p>And then <strong>variance budgets</strong> — mathematical bounds on the output. Headline length between 40 and 80 characters. Tone constrained to formal, terse, or warm. Color values must come from semantic tokens, never hardcoded hex. Spacing must use scale values only. If a generated component violates the budget, it fails. No human review needed for the mechanical checks.</p>
<p>The designer ships these three things — token architecture, generative grammars, variance budgets — instead of mockups. The constraints are the deliverable.</p>
<hr>
<p>This changes the workflow fundamentally.</p>
<p>The old process: PM writes requirements, designer interprets into mockup, engineer interprets mockup into code. Two handoffs, each losing roughly 40% of the original intent. Five to ten days per iteration. By the time you see working code, the requirements have often changed.</p>
<p>The new process: strategy becomes design physics (the constraints), a prompt generates code within those constraints, validation checks the output against the rules, and you ship or reject. Minutes, not weeks. The constraints are explicit and machine-readable — there&#39;s no interpretation loss. The system either passes or it doesn&#39;t.</p>
<p>That&#39;s what I mean by rapid generative prototyping. You&#39;re not designing screens. You&#39;re designing the rules that generate screens — and then generating ten or twenty variations in the time it used to take to mockup one. The exploration space explodes. The curation becomes the craft.</p>
<hr>
<p>I&#39;ve been working this way on smaller projects — this blog, a few internal tools — and the speed difference is real. Constraints plus generation plus validation is dramatically faster than the mockup-to-handoff pipeline. Where it breaks, I&#39;m not sure yet. Probably at scale, where the constraints need to be tighter than I expect, where human review can&#39;t be automated away.</p>
<p>But &quot;constraints are crystallized taste&quot; is the sentence I keep coming back to. Constraints don&#39;t write themselves. Someone has to decide what matters, what&#39;s non-negotiable, what variance is acceptable. That&#39;s judgment. That&#39;s the thing AI can&#39;t do yet — and it&#39;s the thing most design education doesn&#39;t teach, because the old model valued execution speed over constraint quality.</p>
<p>The post-Figma designer might not open Figma at all. They write design physics before anyone touches a screen. They define grammars that make bad designs impossible. They review outputs, not mockups.</p>
<p>The source of truth is the constraint system. The artifacts flow from it. The judgment that made the pixels right — that doesn&#39;t go away when AI generates the artifacts. It becomes the only thing that matters.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/rapid-generative-prototyping-hero.webp" medium="image" type="image/webp" />
      <category>design</category>
      <category>ai</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Multi-Agent Moment</title>
      <link>https://bristanback.com/posts/multi-agent-moment/</link>
      <guid isPermaLink="true">https://bristanback.com/posts/multi-agent-moment/</guid>
      <pubDate>Fri, 06 Feb 2026 17:00:00 GMT</pubDate>
      <atom:updated>2026-02-10T20:00:00.000Z</atom:updated>
      <dc:creator>Bri Stanback</dc:creator>
      <description>A technical breakdown of multi-agent orchestration: Claude&apos;s Agent Teams, OpenAI&apos;s Agents SDK, Google Antigravity, Gas Town, Beads, and the community alternatives.</description>
      <content:encoded><![CDATA[<p><em>For the market panic and ecosystem shakeout, see [[The SaaSpocalypse]].</em></p>
<hr>
<p>I wanted to parallelize work on this blog — frontend styling and build pipeline running simultaneously. One agent tweaking Tailwind tokens while another refactored the TypeScript build. Simple enough in theory.</p>
<p>I tried Agent Teams first. Enabled the flag, defined a visual designer and a frontend dev, let them go. It worked — genuinely worked — for about forty minutes. Then both agents edited <code>main.css</code> in the same section, one overwrote the other, and I spent twenty minutes untangling the merge. The coordination was invisible, which was the problem: I couldn&#39;t see why they&#39;d collided or prevent it from happening again.</p>
<p>So I looked at the alternatives. And I discovered that six months ago, &quot;multi-agent&quot; meant research papers and demos. Now it&#39;s shipping in production tools. But the approaches differ dramatically, and the choice between them isn&#39;t about features — it&#39;s about whether you need to understand what&#39;s happening or just need it to happen.</p>
<hr>
<h2>What I Tried</h2>
<h3>Agent Teams (Native)</h3>
<p>Claude Code shipped <a href="https://docs.anthropic.com/en/docs/claude-code/agent-teams">Agent Teams</a> as a native feature. Enable with <code>CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1</code>, and your CLI gains the ability to spawn specialized sub-agents coordinated by a lead.</p>
<p>The architecture: a lead agent coordinates the team, delegates tasks, synthesizes results. You define specialized agents (visual designer, frontend dev, QA). They share a task list and self-coordinate with direct messaging. Your <code>CLAUDE.md</code>, MCP servers, and skills load automatically.</p>
<p><strong>When it shines:</strong> Parallelizing genuinely independent work — multiple features, different test suites, frontend + backend simultaneously. Also when you need true specialization: a visual designer agent reviewing UI while a backend agent handles the API.</p>
<p><strong>When it&#39;s overkill:</strong> Contained tasks where a single agent has enough context. Adding agents adds tokens (5x agents = 5x cost) and coordination overhead. For focused work, Plan Mode is often enough.</p>
<p>The lock-in risk is real. Last month Anthropic <a href="https://venturebeat.com/technology/anthropic-cracks-down-on-unauthorized-claude-usage-by-third-party-harnesses">cracked down on third-party harnesses</a> — tools that let you use Claude subscriptions through external interfaces. The message: flat-rate pricing requires their tools.</p>
<h3>Gas Town (External)</h3>
<p>After my Agent Teams collision, I read Steve Yegge&#39;s <a href="https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04">Gas Town</a> — the maximalist approach. 20-30 parallel Claude Code instances with operational roles: a Mayor orchestrates the swarm, Polecats execute work in parallel, Witness and Deacon monitor progress, a Refinery manages merges. Built on <a href="https://github.com/steveyegge/beads">Beads</a> for memory persistence. Git worktrees for isolation.</p>
<p>The chaos is real ($100/hour burns reported). It requires what Yegge calls &quot;Stage 7&quot; expertise. But the coordination logic is <em>yours</em> — transparent, modifiable, debuggable. When my Agent Teams collision happened, I couldn&#39;t see inside. With Gas Town, I could have.</p>
<h3>The Others</h3>
<p><strong><a href="https://github.com/mariusgavrila/pheromind">Pheromind</a></strong> — The first external orchestrator I experimented with, and what got me thinking about multi-agent seriously. Swarm intelligence inspired by ant colonies: agents coordinate via a shared <code>.pheromone</code> file containing structured JSON &quot;signals.&quot; No direct peer-to-peer commands — just stigmergy, the same indirect coordination ants use when they leave chemical trails. Decentralized, emergent, no single point of failure.</p>
<p><strong><a href="https://github.com/ruvnet/claude-flow">claude-flow</a></strong> — Takes the beehive metaphor instead: queen agents coordinate worker swarms with explicit hierarchy. Claims multi-provider support (Claude/GPT/Gemini/Ollama), but in practice it&#39;s built around Claude Code primitives. 60+ specialized agents, consensus algorithms (Raft/BFT/Gossip). Ambitious architecture — unclear how much is implemented vs. diagrams.</p>
<p>The ant colony vs. beehive distinction matters: pheromones are fully decentralized (any agent can influence any other through the shared state), while hive-mind has explicit hierarchy. Both are &quot;swarm intelligence,&quot; but the coordination primitives differ.</p>
<p><strong><a href="https://github.com/Yeachan-Heo/oh-my-claudecode">oh-my-claudecode</a></strong> — Opinionated Claude Code configuration (like oh-my-zsh for zsh). Multiple execution modes including parallel swarm options, with cross-validation support for Gemini CLI and Codex.</p>
<hr>
<h2>The Other Platforms</h2>
<h3>OpenAI Agents SDK</h3>
<p>OpenAI took a modular approach. Codex CLI doesn&#39;t have native multi-agent built in, but they published <a href="https://developers.openai.com/codex/guides/agents-sdk/">official documentation</a> for orchestrating it through their Agents SDK via MCP.</p>
<p>Run <code>codex mcp-server</code> to expose tools for starting and continuing sessions. Build orchestrator agents with the Agents SDK. Each session has a <code>threadId</code> for multi-turn conversations. More composable than Agent Teams, more setup required.</p>
<h3>Google Antigravity</h3>
<p>Gemini CLI went open-source under Apache 2.0 — multi-agent isn&#39;t waiting on Google&#39;s roadmap, the community can build it. A detailed <a href="https://github.com/google-gemini/gemini-cli/discussions/7637">multi-agent proposal</a> exists but it&#39;s community-driven.</p>
<p>Where Google gets interesting is <a href="https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/">Antigravity</a>, shipped November 2025 — a full agentic development platform with an Editor View and a Manager Surface for spawning, orchestrating, and observing multiple agents asynchronously.</p>
<p>Instead of scrolling through logs, agents generate <strong>Artifacts</strong> — screenshots, recordings, task lists, implementation plans — so you can verify work at a glance. Model-agnostic (supports Claude Sonnet 4.5, GPT-OSS alongside Gemini). Learning as a primitive — agents save context to a knowledge base for future tasks. This is Google&#39;s answer: not bolting orchestration onto a CLI, but building a dedicated platform for agent-first development.</p>
<hr>
<h2>The Two Architectures</h2>
<p>Here&#39;s the distinction that actually matters — not native vs. external, but what kind of coordination the system does.</p>
<p><strong>SDLC Simulation</strong> — Tools that recreate org charts. Analyst agent → PM agent → Architect agent → Developer agent. Phase gates, handoffs, specialized personas. These optimize for explainability (&quot;look, we have a PM agent!&quot;) rather than effectiveness.</p>
<p><strong>Operational Roles</strong> — Tools that coordinate work, not process. Mayor orchestrates. Workers execute in parallel. External state management. This is Gas Town&#39;s approach, and now Agent Teams&#39;.</p>
<p>Cursor&#39;s research confirms this. They tried flat self-coordination first — agents with equal status using a shared file. It failed: agents held locks too long, became risk-averse, avoided hard problems. &quot;No agent took responsibility for hard problems or end-to-end implementation.&quot; What worked: <a href="https://cursor.com/blog/scaling-agents">planners + workers</a>. Planners explore and create tasks (recursively). Workers grind on assigned tasks until done, don&#39;t coordinate with each other. A judge agent decides whether to continue. This scaled to building a <a href="https://cursor.com/blog/self-driving-codebases">browser from scratch</a> — 1M lines of code, thousands of commits.</p>
<p>The SDLC simulators are solving the wrong problem. They recreate human coordination friction in software.</p>
<p>You might not even need an orchestrator at all. Anthropic&#39;s Nick Carlini <a href="https://www.anthropic.com/engineering/building-c-compiler">built a C compiler</a> with 16 parallel Claudes using just lock files — text files in <code>current_tasks/</code> that agents claim before working. Git sync prevents collisions. Each agent picks up the &quot;next most obvious&quot; problem. No mayor, no coordination layer. ~2,000 sessions and $20K later: a 100,000-line compiler that builds the Linux kernel.</p>
<hr>
<h2>The Memory Problem</h2>
<p>Here&#39;s where it gets interesting — and where my Agent Teams collision led me to something deeper.</p>
<p>Yegge didn&#39;t just build Gas Town. He built <strong>Beads</strong> — an issue tracker designed for agents.</p>
<p>The insight: agents have amnesia. Every session is 50 First Dates. Markdown plans pile up until nothing is authoritative. Agents can&#39;t tell the difference between &quot;we decided this yesterday&quot; and &quot;this brainstorm from three weeks ago.&quot;</p>
<p>Beads gives work items addressable IDs, priorities, dependencies, audit trails. It stores everything in Git. Agents already know Git. The AI literally asked for this when Yegge asked what it wanted.</p>
<p><img src="https://bristanback.com/images/posts/land-the-plane.png" alt="Land the Plane — the pattern for agent memory persistence"></p>
<blockquote>
<p>&quot;Claude said &#39;you&#39;ve given me memory—I literally couldn&#39;t remember anything before, now I can.&#39; And I&#39;m like, okay, that sounds good.&quot;
— <a href="https://paddo.dev/blog/beads-memory-for-coding-agents/">Steve Yegge</a></p>
</blockquote>
<p><strong>The pattern that matters:</strong> &quot;Land the plane.&quot; End every session by updating Beads, syncing state, generating a prompt for the next session. Tomorrow&#39;s agent wakes up knowing what&#39;s current.</p>
<p>Carlini&#39;s compiler project maintained extensive READMEs and progress files — each agent dropped into a fresh container with no context. Without orientation artifacts, agents waste tokens rediscovering what&#39;s already known.</p>
<h3>Native Tasks</h3>
<p>Anthropic saw the persistence problem too. On January 23, 2025, they shipped <strong>Tasks</strong> — native task management with dependencies.</p>
<p>Tasks persist in <code>~/.claude/tasks/</code> and survive context compaction. Set <code>CLAUDE_CODE_TASK_LIST_ID</code> and multiple sessions coordinate on the same list — when Session A completes a task, Session B sees it immediately.</p>
<p><strong>Where Tasks wins:</strong> Zero setup, native dependency modeling, multi-session sync, works with Agent Teams out of the box.</p>
<p><strong>Where Beads wins:</strong> Project-level vs session-level — Tasks lives in your home dir, Beads lives in the repo. Clone the project elsewhere, Beads comes with you. Tasks doesn&#39;t. Plus Git-native storage, cross-provider compatibility, richer metadata.</p>
<p>Tasks is for &quot;what am I doing this session.&quot; Beads is for &quot;what has this project been doing for months.&quot;</p>
<p><strong>The hybrid play:</strong> Beads isn&#39;t just for Gas Town. Run <code>bd setup claude</code> and Beads integrates directly with Claude Code. There&#39;s even a <a href="https://github.com/AvivK5498/beads-orchestration">beads-orchestration</a> skill that combines Agent Teams with Beads — native multi-agent coordination with Git-backed persistence.</p>
<hr>
<h2>Where I&#39;ve Landed (For Now)</h2>
<p>I&#39;ve been running Agent Teams on real work. It works, but the pattern I keep coming back to is simpler than I expected.</p>
<p><strong>Parallelism has a ceiling.</strong> When there are many independent tests, parallelization is trivial — each agent picks a different failing test. But monolithic tasks break down. Carlini&#39;s agents all hit the same Linux kernel bug, fixed it, then overwrote each other&#39;s changes. Multi-agent shines on decomposable work. For one giant task, you&#39;re back to single-threaded.</p>
<p><strong>Less structure often wins.</strong> Cursor initially built an &quot;integrator&quot; role for quality control and conflict resolution — it created more bottlenecks than it solved. Workers could handle conflicts themselves. &quot;The right amount of structure is somewhere in the middle.&quot;</p>
<p><strong>The multi-codebase question:</strong> I initially thought Agent Teams would shine for tasks spanning multiple codebases with different constraints. But polyrepo architectures may be becoming <em>antipatterns</em> in the AI era. <a href="https://nx.dev/blog/nx-and-ai-why-they-work-together">Monorepos work better for agents</a> — consolidated context means one agent can understand how subsystems interact. Splitting repos fragments the context that makes agents useful. The winning pattern might not be &quot;multi-agent across repos&quot; but &quot;consolidate repos so single-agent has full context.&quot;</p>
<p><strong>Test quality becomes everything.</strong> Carlini&#39;s key insight: &quot;Claude will work autonomously to solve whatever problem I give it. So it&#39;s important that the task verifier is nearly perfect, otherwise Claude will solve the wrong problem.&quot; Your job becomes writing tests so good that agents can&#39;t game them.</p>
<p><strong>Model choice matters for long-running work.</strong> Cursor found &quot;GPT-5.2 models are much better at extended autonomous work: following instructions, keeping focus, avoiding drift.&quot; Opus 4.5 &quot;tends to stop earlier and take shortcuts when convenient.&quot; Different models for different roles — they use the best model per task, not one universal model.</p>
<p>If you&#39;re not comfortable with 3-5 parallel agents and some chaos, don&#39;t use any of this. Single-agent Claude Code with Plan Mode handles most work. Add complexity when you hit real limits, not theoretical ones.</p>
<p>The native approach will improve. Anthropic will add session resumption, better persistence, more coordination options. The community tools will adapt or die. But the fundamental tension remains: <strong>native is convenient, external is controllable.</strong> Pick based on whether you need to understand what&#39;s happening or just need it to happen.</p>
]]></content:encoded>
      <media:content url="https://bristanback.com/images/posts/multi-agent-moment-hero.webp" medium="image" type="image/webp" />
      <category>ai</category>
      <category>building</category>
      <category>tools</category>
    </item>
  </channel>
</rss>