Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 13, 2026
Your agent's plumbing is wide open
The first large-scale scan of remote MCP servers — the connectors that let AI agents reach your tools and data — found that roughly 40% expose their tools with no authentication at all. Censys counted 12,520 internet-reachable MCP services, most of them unauthenticated. A separate sweep of 40,000 server repos produced 67 new CVEs. The agent boom shipped a new layer of plumbing into production faster than anyone secured it, and right now a lot of it is unlocked. Here's the risk in plain terms and what to check today.
- security
- agents
June 12, 2026
Give your coding agent the error, then get out of the way
The biggest difference between a coding agent that's useful and one that's maddening usually isn't the model. It's whether you closed the loop. An agent that writes code and stops is guessing. An agent that runs the code, reads the actual error, and tries again until the tests pass is in a different league — fix rates jump past 90% in a couple of iterations. The agent can only fix what it can see, so the highest-leverage thing you can do is give it eyes. Here's exactly how.
- methodology
- agents
June 11, 2026
Agents that remember
The big agent unlock of 2026 isn't a smarter model — it's memory. Google's ReasoningBank lets an agent learn from its own successes and failures, store the reasoning, and get measurably better over time. That's the leap from a tool that resets every morning to a colleague who compounds. But memory has a second edge: it turns every mistake into a persistent one. A wrong fact, a poisoned instruction, or a belief that quietly went stale now survives across sessions and acts on you later. Memory isn't a feature you switch on. It's a corpus you have to govern.
- ai-native
- agents
June 10, 2026
Why your agent's pull request gets rejected
Researchers studied 33,000 pull requests written by AI coding agents, and about 29% never got merged. The interesting part is why: not mostly because the code was wrong, but because the PR was a bad collaboration artifact — too big, touching too many files, bundling unrelated changes, failing CI, and explaining itself poorly. Getting code accepted turns out to be a different skill than writing it, and it's exactly the skill agents don't have by default. Here's what that means for using them.
- methodology
- agents
June 9, 2026
You're about to manage a workforce of agents
A platform launched this month that lets companies recruit, onboard, manage, and even pay AI agents — across every major model — under one passport and one audit trail. Its tagline is 'your next hire isn't human.' Strip the marketing and there's a real shift underneath: the job is moving from using an AI tool to managing a team of them. That's a different skill than prompting, most people aren't ready for it, and the mental model you reach for decides whether it works.
- agents
- methodology
June 9, 2026
Agents got smarter. They didn't get more reliable.
A new study ran 14 models through reliability tests and found something the benchmark race hides: two years of soaring capability produced only small reliability gains. Smarter isn't steadier. And the math is brutal — even a 95%-reliable step, run 20 times in a row, finishes the whole task correctly about a third of the time. We keep shopping for agents on intelligence when the thing that decides whether they work is something else entirely, something we barely even measure.
- eval
- agents