Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 7, 2026
For long-running agents, cost-per-task is the only benchmark
NVIDIA's new Nemotron 3 Ultra isn't pitched on being the smartest model. It's pitched on being cheap to run for hours — built for agents that plan, call tools, and reason across hundreds of turns. That framing is the real story. When an agent runs long, the number that matters stops being the benchmark score or the per-token price and becomes dollars-per-finished-task. Two models at the same token price can differ 2x on a real job. Here's why the leaderboard is the wrong thing to shop on once your agent runs for more than a moment.
- ai-native
- business
- eval
June 7, 2026
You escaped model lock-in. They moved it to your context.
Keeping the model swappable was the win of the year — you can change providers with a one-line config now. So at Build 2026 Microsoft calmly relocated the lock-in to where you can't swap it: your organization's context. Work IQ, Fabric IQ, Foundry IQ — your company's memory, permissions, and meaning, living inside a vendor's interpretation of your business. As one analyst put it: you can swap the brain. You may not be able to swap the memory. Here's the new trap, and how to keep the thing that actually matters portable.
- architecture
- business
- ai-native
June 7, 2026
When your customer is a bot
Google's agents now book and buy on your behalf, Visa and Mastercard built rails for agents to pay, and a wave of 'agentic commerce' protocols launched with Shopify, Walmart, and Target. The quiet implication: the thing evaluating your product is increasingly software, not a person. AI agents don't browse — 87% of their requests hit product data, almost none touch your beautiful storefront. The web was built for human eyeballs, and the buyer just changed species. Here's what that means for anyone who sells, builds, or ships anything online.
- business
- ai-native
June 6, 2026
Supabase is worth $10.5B because agents need boring databases
Supabase just raised $500M at a $10.5 billion valuation — doubled in eight months — and the reason is almost funny: over 60% of the new databases on its platform are now created by an AI tool, not a human. The flashy part of the AI boom is the agents writing the code. The part that's quietly minting money is the boring, reliable place that code has to put its data. That's not a coincidence — it's the whole lesson about where durable value lives.
- business
- architecture
- ai-native
June 6, 2026
Microsoft can fire its model supplier. Can you?
At Build 2026 Microsoft shipped its own coding and reasoning models — trained from scratch, with what its AI chief called 'zero distillation' from OpenAI — straight into GitHub Copilot. The richest software company on earth just spent billions to stop depending on one supplier. That's the whole lesson for the rest of us, and it costs you nothing: never let the model be the part of your system you can't swap out.
- architecture
- ai-native
- business
June 6, 2026
Vibe coding is over. The hard part was never the demo.
Google now teaches vibe coding to a million-plus people in a free five-day course. When the thing you were proud of becomes a weekend class, that skill just stopped being your edge. But here's the part the headlines miss: vibe coding was always good at the easy 80% — the demo — and useless at the 20% that decides whether software survives. The skill that's actually scarce now isn't generating code. It's the judgment to know whether the code you got is any good.
- careers
- methodology
- ai-native