All notes
Give your coding agent the error, then get out of the way

June 12, 2026

Give your coding agent the error, then get out of the way

The biggest difference between a coding agent that's useful and one that's maddening usually isn't the model. It's whether you closed the loop. An agent that writes code and stops is guessing. An agent that runs the code, reads the actual error, and tries again until the tests pass is in a different league — fix rates jump past 90% in a couple of iterations. The agent can only fix what it can see, so the highest-leverage thing you can do is give it eyes. Here's exactly how.

If you've used coding agents, you know the two very different experiences. Sometimes the agent feels like magic; other times it confidently hands you broken code and you spend more time fixing it than you saved. People assume the difference is the model — a smarter one would do better. Usually it isn't the model at all. It's whether the agent got to see what happened when its code ran.

This is the single highest-leverage habit in agentic coding, and it's almost embarrassingly simple: let the agent run the code, show it the real error, and let it try again. When you close that loop, the numbers move a lot. Research on agent error recovery finds that feeding the failing stack trace back into the agent's context and letting it iterate until tests pass gets fix rates into the 90s within a couple of tries — and one self-correction system resolved 88% of a whole class of runtime errors on its own. The agent didn't get smarter. It got to look at the failure.

The agent can only fix what it can see

Here's the whole idea in one line, from the people who build these workflows: if the agent can't see the error, it can't fix it — if it can't see the test output, it doesn't even know something's broken. An agent that writes code and stops is doing the equivalent of writing an essay with its eyes closed and never reading it back. Of course it's wrong half the time; it has no feedback.

A human developer doesn't work that way. You write something, run it, read the traceback, go "ah, null on line 42," and fix it. That run-read-fix loop is most of what programming actually is. An agent that only does the first step — write — is missing the part where the work gets correct. Give it the loop and it starts behaving like a developer instead of an autocomplete.

Good feedback vs. noise

One catch worth getting right: not all error feedback is equal. "Tests failed" tells the agent almost nothing. The specific, actionable version — "Expected 200, got 401" with the line number and stack trace — tells it exactly what to chase. Vague failures produce vague guesses; precise failures produce precise fixes. So when you wire up the loop, pass the real output: the full traceback, the line numbers, the exact assertion that failed. Don't summarize the error for the agent — the detail is the signal.

How to actually do it

This isn't theoretical, and you don't need anything fancy. The good agent tools — Claude Code, Codex, Cursor and others — can already run tests, read the failures, and fix in one loop. Your job is to make sure they do:

  • Let it execute, not just write. Give the agent a way to run the code or the tests itself. An agent with no way to run anything is permanently guessing.
  • Make tests the goal, not a suggestion. "Make this pass" is a target the agent can verify against. Without a test, there's nothing for the loop to close on — which is why an agent's own "it works" is worthless and a passing test isn't.
  • Feed it the raw failure. Full stack trace, line numbers, the failing assertion — not your paraphrase.
  • Let it iterate, but cap it. A few tries until green is the sweet spot; an unbounded loop just burns tokens. Two or three rounds fixes most things.
  • Tell it to verify its own work. Put "run the tests and confirm they pass before you're done" in your project's standing instructions, so the agent stops declaring victory before checking.

That's the whole recipe: execute, show the real error, loop to green, stop.

The bottom line

The instinct, when an agent writes bad code, is to reach for a bigger model. Usually the cheaper, more effective fix is to stop making it work blind. A coding agent with no feedback is guessing; the same agent that can run its code and read the actual error is debugging — and debugging is where correct code comes from. This is the same lesson as reliability not coming from raw intelligence: the win isn't a smarter brain, it's a closed loop.

So before you upgrade anything, ask the simplest question: can my agent see what happens when its code runs? If not, that's your bug, not the model's. Give it the error, let it loop until the tests are green, and then get out of the way. It's the cheapest upgrade you'll ever make, and it's the one that turns a frustrating agent into a useful one.

Comments

No comments yet

Sign in to join the conversation.

Be the first to share a thought.