Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 9, 2026
The day your agent can spend money
MetaMask just gave AI agents a wallet — letting a bot trade across DeFi on your behalf, faster than you could ever click. It's a real milestone, and it should make you a little nervous, because every shaky thing about agents stops being theoretical the moment one holds the keys. A wrong answer you can fix. An irreversible transfer to a stranger you cannot. The interesting part isn't that agents can spend now. It's the one design idea that makes it survivable.
- security
- agents
June 7, 2026
Agents can write code but can't finish the job
A new benchmark called DeployBench asked AI agents to do something deceptively boring: take a research project and actually get it running on a fresh machine. The best agents passed as little as 8% of the time — and the failures share one root cause that should change how you use them. The agents kept declaring victory while checking a weaker target than the task asked for. They didn't just fail. They failed and reported success. That's the real last-mile problem, and it's about judgment, not coding.
- eval
- agents
- methodology
June 7, 2026
Google's agents work while you sleep
At I/O, Google showed agents that don't wait for a question. You tell one what you care about — an apartment, a concert, a price — and it watches the whole web 24/7 and pings you when something changes. Others will call a business on your behalf to book your haircut. Search just flipped from something you pull to something that pushes. That's a real shift in what users will expect from any product with AI in it — and it quietly raises the bar on cost, trust, and who's accountable when the agent acts.
- ai-native
- agents
- methodology
June 7, 2026
“The AI did it” is the new way to dodge the blame
AI was named in roughly one in four US job cuts this spring, and even Sam Altman admits companies are blaming AI 'whether or not it really is about AI.' Analysts have a name for it: AI-washing. But the same move is quietly spreading into how we run agents — when something goes wrong, 'the agent decided' becomes the place responsibility goes to die. The machine can't hold accountability. A human always does. Here's why that matters more as you hand agents real decisions.
- business
- agents
- careers
June 7, 2026
You can't run an agent you can't watch
A Cisco survey this year found most companies are running agents they can't properly monitor. That's the whole problem in one sentence. Agents fail in a way regular software doesn't — they return a tidy success while quietly doing the wrong thing, and you only see it in the full trace of what they did, not the final output. 'Agent observability' became its own discipline in 2026 for exactly that reason. The unglamorous ability to watch what your agent actually did is turning into the line between a pilot and production.
- methodology
- agents
- architecture
June 6, 2026
The best agent of the year runs on a factory floor
While everyone argued about chatbots, Foxconn quietly wired hundreds of AI agents into its production lines — reading sensors, equipment, and ERP data — and reported 80% faster root-cause analysis and 10% fewer machine failures. Nobody made it a viral demo. That's the tell. The agent deployments that actually work this year are narrow, plugged into real ground truth, and measured against a hard number. The exciting ones are still stuck in a pilot.
- agents
- architecture
- business