Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 10, 2026
Your AI provider is going to have a bad day
Two reminders landed this week. Anthropic is retiring Claude Sonnet 4 and Opus 4 from the API on June 15 — if you pinned those versions, your calls just start returning errors, with no automatic failover. And this morning, Gemini went down. Same lesson from opposite directions: the model under your product is a third-party service that will, on a schedule you don't control, change, vanish, or break. The fix isn't strategy. It's the boring resilience engineering most AI products skip.
- architecture
- methodology
June 9, 2026
Disrupted or dead — did you sell the thing AI now gives away?
More than 220 startups that once hit billion-dollar valuations are now worth less than half their peak, and a former DoorDash leader put it bluntly: workflow SaaS will be 'disrupted or dead' within a decade. At the same time, ~80% of the new 'AI wrapper' startups are expected to fail. Two opposite kinds of company are dying for the exact same reason — they sold the thing AI now provides. The survival test is one honest question, and it's worth asking about whatever you're building.
- business
- careers
June 9, 2026
Anthropic writes 80% of its code with AI. You're not Anthropic.
More than 80% of the code Anthropic merged to production in May was written by Claude. That number is about to be quoted in every 'developers are obsolete' argument and every pressure-to-keep-up meeting. So read it carefully, because it argues the opposite of what people think. It's the company with the best model, using that model on itself, reviewed by some of the best engineers alive. The percentage isn't the lesson. The thing that made 80% safe to ship is.
- methodology
- careers
June 9, 2026
You're about to manage a workforce of agents
A platform launched this month that lets companies recruit, onboard, manage, and even pay AI agents — across every major model — under one passport and one audit trail. Its tagline is 'your next hire isn't human.' Strip the marketing and there's a real shift underneath: the job is moving from using an AI tool to managing a team of them. That's a different skill than prompting, most people aren't ready for it, and the mental model you reach for decides whether it works.
- agents
- methodology
June 9, 2026
Agents got smarter. They didn't get more reliable.
A new study ran 14 models through reliability tests and found something the benchmark race hides: two years of soaring capability produced only small reliability gains. Smarter isn't steadier. And the math is brutal — even a 95%-reliable step, run 20 times in a row, finishes the whole task correctly about a third of the time. We keep shopping for agents on intelligence when the thing that decides whether they work is something else entirely, something we barely even measure.
- eval
- agents
June 9, 2026
The day your agent can spend money
MetaMask just gave AI agents a wallet — letting a bot trade across DeFi on your behalf, faster than you could ever click. It's a real milestone, and it should make you a little nervous, because every shaky thing about agents stops being theoretical the moment one holds the keys. A wrong answer you can fix. An irreversible transfer to a stranger you cannot. The interesting part isn't that agents can spend now. It's the one design idea that makes it survivable.
- security
- agents