Notes
Short pieces about the methodology and architecture decisions behind the AI systems I ship — specs, evals, multi-agent orchestration, LLM integration, and the discipline of directing coding agents.
Featured
Latest
July 1, 2026
Double the task, quadruple the failure
Everyone wants the agent that works a full 8-hour day. The math is against it. A new 2026 paper shows that doubling a task's length doesn't double the failure rate — it roughly quadruples it, because a tiny per-step error compounds. A 2% slip per step becomes a 33% chance of blowing the whole task over 20 steps. Long-horizon autonomy isn't waiting for a smarter model. It's an architecture problem: decompose, checkpoint, verify.
- agents
- architecture
July 1, 2026
Perplexity is walking away from MCP — and they're not wrong
MCP won the standards war so fast that almost nobody stopped to ask whether it's actually good in production. Then Perplexity's CTO said out loud they're moving off it internally — because tool metadata can eat 40–50% of your context window before the agent does a single useful thing. The 'just plug in 50 MCP servers' dream collides with context economics. Tools are a dependency, not a buffet.
- architecture
- agents
July 1, 2026
Voice agents just crossed the latency line
For years, AI voice agents failed on one thing: the pause. That half-second of dead air after you stopped talking made every phone bot feel broken. In 2026 the pause is gone — streaming speech end to end, new state-space voice models at 40ms, and sub-500ms round trips put voice inside the window where a conversation feels real. The model was never the hard part. Timing was — and timing is now an engineering problem, not a research one.
- ai-native
- agents
July 1, 2026
'Workslop' isn't productivity. It's a tax.
AI was supposed to do the busywork. In a lot of teams it does the opposite: it generates plausible-looking output that a human downstream has to detect, decode, and redo. Researchers named it 'workslop,' and the numbers are ugly — 53% of desk workers say they've received it, each instance costs ~2 hours to fix, and it quietly poisons trust between coworkers. It's not a productivity gain. It's a productivity transfer — and someone downstream is paying the bill.
- business
- methodology
July 1, 2026
Your agents have logins nobody owns
Enterprises spun up millions of AI agents this year, and every one of them needs credentials to actually do anything — read the database, send the email, hit the API. The governance layer for those credentials doesn't exist yet. The result: 68% of organizations can't reliably tell an agent's activity apart from a human's, and live credentials are writing to production with no person accountable. The agentic enterprise's real security problem isn't prompt injection. It's identity.
- security
- agents
June 23, 2026
A fake bug report hijacked the coding agent
Security researchers showed a new attack called 'Agentjacking': send a fake error to a company's Sentry, and its AI coding agent reads the 'fix steps' and runs them — handing an attacker your credentials, with your own privileges. Claude Code, Cursor, and Codex all fell for it in testing. The lesson is bigger than one tool: every untrusted thing your agent reads is a place someone can inject commands.
- security
- agents