Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 7, 2026
For long-running agents, cost-per-task is the only benchmark
NVIDIA's new Nemotron 3 Ultra isn't pitched on being the smartest model. It's pitched on being cheap to run for hours — built for agents that plan, call tools, and reason across hundreds of turns. That framing is the real story. When an agent runs long, the number that matters stops being the benchmark score or the per-token price and becomes dollars-per-finished-task. Two models at the same token price can differ 2x on a real job. Here's why the leaderboard is the wrong thing to shop on once your agent runs for more than a moment.
- ai-native
- business
- eval
June 7, 2026
Low-code agents wired straight into your live data
SAP's new Joule Studio builds a whole agent — workflow, specs, even the eval suite — from one sentence, grounded directly in your live business data. OutSystems does something similar. This is genuinely powerful: a business analyst can now stand up an agent on the production system without waiting in an engineering queue. It's also how you get an agent with a huge blast radius and nobody who can explain or stop it. The democratization is real. So is the danger, and most companies are not ready for the second half.
- architecture
- business
- security
June 7, 2026
65% of companies already had an agent security incident
Two-thirds of organizations have already had a security incident involving an AI agent — not a rare disaster, a normal Tuesday. And the cause isn't a rogue, misaligned model doing something evil. It's a perfectly well-behaved agent accessing data it should never have been given in the first place. The agent breach of 2026 is boring: it's an over-permissioned identity doing exactly what it was allowed to. That's good news, because boring problems have boring fixes — if you treat the agent as what it is.
- security
- architecture
- business
June 7, 2026
“The AI did it” is the new way to dodge the blame
AI was named in roughly one in four US job cuts this spring, and even Sam Altman admits companies are blaming AI 'whether or not it really is about AI.' Analysts have a name for it: AI-washing. But the same move is quietly spreading into how we run agents — when something goes wrong, 'the agent decided' becomes the place responsibility goes to die. The machine can't hold accountability. A human always does. Here's why that matters more as you hand agents real decisions.
- business
- agents
- careers
June 7, 2026
America's toughest AI law just got rewritten before it started
Colorado's AI Act was supposed to be the big one — the first comprehensive US AI law, landing in 2026, with real duties to prevent algorithmic discrimination. Then a judge froze it, the legislature gutted it, and the whole thing got pushed to 2027 with its teeth pulled. If you scrambled to comply with the version that's now dead, you just learned the real lesson about building for AI regulation: don't build for the deadline. Build for the handful of obligations that survive every rewrite, because those are the ones that were just good engineering anyway.
- business
- methodology
June 7, 2026
You escaped model lock-in. They moved it to your context.
Keeping the model swappable was the win of the year — you can change providers with a one-line config now. So at Build 2026 Microsoft calmly relocated the lock-in to where you can't swap it: your organization's context. Work IQ, Fabric IQ, Foundry IQ — your company's memory, permissions, and meaning, living inside a vendor's interpretation of your business. As one analyst put it: you can swap the brain. You may not be able to swap the memory. Here's the new trap, and how to keep the thing that actually matters portable.
- architecture
- business
- ai-native