Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 3, 2026
Most AI agents never reach production
The demo is dazzling. Then the agent never ships. Survey after survey in 2025–26 finds the same cliff: almost everyone has an agent pilot, almost no one has it in production. The reason isn't the model — it's the unglamorous engineering the demo let you skip. Here's what the small minority who actually ship do differently.
- agents
- eval
- methodology
May 10, 2026
Evals or it didn't ship
Why I refuse to ship an agent without a held-out evaluation set, what makes one useful, and the failure mode I keep seeing when teams skip this.
- agents
- eval
- methodology