All notes
One agent that does everything does nothing well

June 3, 2026

One agent that does everything does nothing well

When an agent isn't good enough, the instinct is to give it more — another tool, more instructions, more context. That makes it worse, and it's measured, not a matter of taste. The fix is the oldest rule in engineering: Single Responsibility. One agent, one job, a few tools, a short context. A god-agent is a ten-thousand-line function in a trench coat — and it fails for the same reason.

Your agent isn't quite good enough, so you do the obvious thing: you give it more. Another tool for the thing it couldn't do. A few more lines in the prompt to cover the case it got wrong. More context so it has everything it might need. Each addition is reasonable. Each one makes the agent a little worse.

This is the counterintuitive result that took me a while to believe, and it's not a matter of taste — it's measured. The path to a capable agent runs in the opposite direction from instinct: not more, but less.

More makes it worse, and here's the proof

A language model has a fixed budget of attention. Everything you put in its context window — every tool description, every instruction, every retrieved document — competes for that same budget. Past a point, adding information doesn't give the model more to work with; it dilutes what was already there.

The research on this is blunt. Long-context models suffer from a well-documented "lost in the middle" effect: instructions buried in the middle of a long prompt get followed worse than the same instructions in a short one. One analysis found accuracy falling from 92% to 63% purely from context overload — same model, same task, just too much in the window. Tools behave the same way: pile twenty of them onto one agent and it starts calling the wrong one half the time, forgetting the instruction at the top of its context by the time it's deep in a task, and turning one bad call into a cascade of three more.

This is the god-agent pattern, and it has a signature: it's dazzling in the demo and falls apart in production. In the demo you run the one path you rehearsed. In production the long, branching, real workflows hit the dilution wall every time.

The fix is forty years old

The cure isn't a clever prompt or a bigger model. It's the Single Responsibility Principle, the same rule we already apply to functions and classes: one agent, one job. Give each agent a single, well-defined responsibility, the three or four tools that job actually needs, and only the context relevant to it. Now nothing competes for the attention budget, because there's nothing extraneous in the window. Each agent is excellent at its one thing precisely because it isn't trying to be good at nine others.

The numbers move hard in this direction. Chroma's evaluation found a large accuracy gap between a focused ~300-token prompt and a bloated ~113K-token one; generic single-purpose-stretched-to-everything agents have been measured succeeding only around 58% of the time, dropping further as the task sprawls. Narrow the scope and the same underlying model gets dramatically more reliable, because you stopped making it choose.

We already learned this lesson, for code

Here's the part that should make every engineer nod. We don't write one ten-thousand-line function with two hundred branches and call it powerful — we call it unmaintainable, and we break it into small functions that each do one thing. We don't build one class that does everything; we decompose. A god-agent is just that anti-pattern wearing a trench coat: a single component asked to hold every concern at once, failing for the same reason the giant function fails — too much in one place, no separation, every part interfering with every other.

So the move is the move we already know. Decomposition isn't a new "agent technique." It's the same instinct that produced single responsibility, small units, and clean boundaries, applied to a new kind of component. The reason it works is concrete, not aesthetic: a focused agent has an undiluted context, so it stays on task — the same way a focused function is one you can actually read.

And the side effects are exactly the ones you want. A one-job agent is testable — you can write an eval for "does it do its one thing," which you simply cannot do for a twenty-tool everything-agent. It's cheaper, because a narrow, well-defined job often runs fine on a smaller model. And it's debuggable, because when something breaks you know which agent owned that responsibility.

The honest catch

Splitting one agent into ten doesn't make the problem vanish — it trades it for a different one. Now you have ten focused agents that need to be coordinated: something has to route the work, pass results between them, and decide what runs when. That coordination layer — orchestration — is real engineering, and it's a post of its own.

But notice what kind of trade you made. You swapped "my agent forgot its instructions and called the wrong tool" — a failure inside a black box you can't fix — for "I need to coordinate components" — an ordinary engineering problem with ordinary engineering answers. One of those is tractable. The other is just the god-agent getting worse as you add more.

The urge to make one agent do everything is the urge to skip decomposition, and it fails for agents for the same reason it always failed for code. Make each agent small enough to be excellent. One thing, done well, beats ten things done at 58%.

Comments

No comments yet

Sign in to join the conversation.

Be the first to share a thought.