Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
June 13, 2026
The biggest context window doesn't win
Every model launch brags about a bigger context window — a million tokens, two million, the whole codebase at once. But an analysis of enterprise deployments found that nearly 65% of agent failures came from context drift or memory loss during multi-step work, not from a window that was too small. The teams shipping reliable agents in 2026 aren't the ones with the biggest window. They're the ones who curate hardest what the model actually sees. Here's the difference, and why more is often worse.
- agents
- methodology
June 13, 2026
Your agent works 57% of the time
A March 2026 report looked at 6,259 AI agents running in real production and found an aggregate success rate of 56.6% — barely better than a coin flip. The same studies show a 37% gap between how agents score on benchmarks and how they do in the real world. That gap is the whole story. The demo always works; the job is making the agent work the other 43% of the time. Here's why the number is so low, and what the teams above it actually do differently.
- agents
- methodology
June 13, 2026
2026 is the show-me-the-money year for AI
Global AI spending is forecast at $2.59 trillion this year, up 47% — and a widely-cited MIT study found 95% of enterprise generative-AI pilots delivered no measurable ROI. Those two numbers can't coexist forever. A Menlo Ventures partner called 2026 the 'show me the money' year, and companies are swapping open-ended budgets for spending caps, dashboards, and ROI gates. If you build with AI, the era of 'we're experimenting' as a free pass is ending. Here's what the reckoning actually changes — and how to be on the right side of it.
- business
June 13, 2026
The webpage can give your agent orders
When you give an AI agent a browser and let it read web pages, click buttons, and run commands, you've handed control of it to every page it visits. Researchers have shown agents hijacked by instructions hidden in website text, in pastebin links, even invisibly inside screenshots the agent looks at. It's called indirect prompt injection, and it's the number-one risk on OWASP's list for LLM apps. The agent can't tell your instructions from the page's. Here's why this is so hard to fix, and how to build so a hostile page can't run your agent.
- security
- agents
June 13, 2026
Write it down for the machine
There's now a plain text file every serious coding agent reads before it touches your repo: AGENTS.md. As of early 2026 it's read natively by Claude Code, OpenAI's Codex CLI, Cursor, Aider, Devin, GitHub Copilot, Gemini CLI, Windsurf, and Amazon Q — the closest thing the industry has to a universal instruction format for agents. It's the highest-leverage hour you can spend on AI coding right now, and almost nobody does it. Here's what goes in it and why it works.
- methodology
- ai-native
June 13, 2026
Your agent's plumbing is wide open
The first large-scale scan of remote MCP servers — the connectors that let AI agents reach your tools and data — found that roughly 40% expose their tools with no authentication at all. Censys counted 12,520 internet-reachable MCP services, most of them unauthenticated. A separate sweep of 40,000 server repos produced 67 new CVEs. The agent boom shipped a new layer of plumbing into production faster than anyone secured it, and right now a lot of it is unlocked. Here's the risk in plain terms and what to check today.
- security
- agents