Code Owns Truth

I spent an afternoon last month rewriting the same prompt four times. Each version was more precise, more carefully structured, more detailed about what I wanted. Each one produced code that looked right and broke in a different way.

The fifth time, I didn't touch the prompt. I wrote a test. Then two more tests. Then I ran the original sloppy prompt against them — and the output was correct, because now "correct" meant something.

That afternoon rearranged how I think about building with AI.

We obsess over prompt engineering when we should obsess over constraint engineering.

A prompt expresses intent. It's the why, the what, the how — but it's inherently fuzzy. The model interprets it. It mutates within whatever space you've given it. And then the session ends, the context window closes, and the prompt is gone. It was temporary, lossy, context-dependent.

Constraints are different. Constraints survive.

A test suite survives. A type system survives. A feature list, a linter config, a CI pipeline — these persist across sessions, across context windows, across models. They bound the mutation space. They define what "correct" looks like before anyone starts building.

The prompt is a request. The constraints are the physics. The code is what ships.

┌─────────────────────────────────────────────────────┐
│  CONSTRAINTS   →  bounds the mutation space         │
│  (tests, types, specs, taste, judgment)             │
├─────────────────────────────────────────────────────┤
│  PROMPTS       →  expresses intent, inherently fuzzy│
│  (requests, direction, context)                     │
├─────────────────────────────────────────────────────┤
│  CODE          →  source of truth                   │
│  (deterministic, versioned, what ships)             │
└─────────────────────────────────────────────────────┘

"Prompt engineering" as a discipline has always felt slightly wrong to me, and I think this is why. You're not engineering the prompt. You're engineering the constraints that shape what prompts can produce. The prompt is the thing you say to the contractor. The constraints are the building code. One is a conversation. The other is law.

But not all constraints are the same, and the distinction matters for where you spend your time.

Mechanical constraints are the ones machines can verify. Test suites, type systems, linters, CI/CD, feature lists. You run them, they pass or fail, no judgment required. Anyone can set them up. AI can write them.

Human constraints are harder. Taste — what's elegant versus ugly, what belongs versus what's noise. Axioms — the foundational assumptions you build on. Judgment — the wisdom from experience that tells you when to follow the rules and when to break them.

Both survive context windows. Both persist across sessions. But the mechanical constraints are table stakes. The human constraints — taste, judgment, the sense of what a thing should be — those are the differentiator. Those are what make one system thoughtful and another system merely functional.

This is why I keep coming back to judgment as the scarce resource. Not prompting skill. Not execution speed. The ability to set the right constraints — to know what the physics of your system should be before anything gets built.

The pattern underneath all of this is older than AI.

Test-driven development works the same way: you write a failing test (constraint), you write code to pass it (mutation), you verify (the test passes). The spec precedes the implementation. The spec survives context changes. The artifact gets checked against something objective.

Anthropic's approach to long-running agents follows the same structure. Their harness engineering model uses an initializer agent to create a feature list — sometimes 200+ requirements, all marked as incomplete. A coding agent works through them one at a time. A progress file persists across context windows so nothing gets lost between sessions. End-to-end testing verifies the result.

The feature list and progress file are constraints. The coding prompt is intent. The code is truth.

Phil Schmid frames it well: the harness is the operating system, the model is the CPU, context is RAM, the agent is the application. The harness implements what he calls "context engineering" — strategies that survive the model's context limitations. And his key insight lands the point: "The ability to improve a system is proportional to how easily you can verify its output."

If you can verify it, you can improve it. If your constraints are weak — if "done" means "looks done" — your outputs will drift and you won't know until it's too late. That afternoon I spent rewriting prompts? I was optimizing the wrong layer. The constraints are what compound.

This blog is an attempt to practice the human constraint layer.

Taste in what gets published, judgment in how ideas connect, axioms about what matters. The mechanical side — the site builds, the deploys, the formatting — is table stakes. The human side is the work.

Whether it succeeds is a different question. But that's the intent.

To be the physics, not just the prompt.