AGENTS · June 3, 2026

I stopped approving the agent's decisions. Now I watch them.

When I started building agents I approved every action — it felt responsible. It wasn't; by the thirtieth 'yes' I was rubber-stamping, which is worse than no checkpoint at all. Real oversight isn't touching every decision, it's setting the policy and watching the outcomes. Here's the shift from in-the-loop to on-the-loop, and the architecture that makes letting go actually safe.

When I built my first real agent, I made it ask permission for everything. Every action it wanted to take popped a prompt: about to call this tool — approve? about to write this file — approve? about to send this — approve? It felt like the responsible thing. I was the human in the loop. Nothing happened without my yes.

It took embarrassingly little time to discover that this was theater.

"Approve?" "Yes." "Approve?" "Yes."

By the thirtieth approval prompt in a session, I wasn't reading them anymore. I was clicking yes the way you click through a cookie banner — pattern-matching on "looks fine" and moving on. The approvals had become a reflex, and a reflex is not oversight. I had built a checkpoint and then automated myself into rubber-stamping it.

And a rubber-stamped checkpoint is worse than no checkpoint at all. With no checkpoint, everyone knows the agent is acting on its own and treats its output with appropriate suspicion. With a checkpoint I wasn't really reading, the agent's mistakes got laundered into my decisions — I approved them, so now they're mine, except I never actually looked. As one 2026 oversight guide puts it bluntly, a poorly engaged reviewer approving flawed outputs is worse than no checkpoint. I had become that reviewer.

Two kinds of oversight

There's a useful vocabulary for this, and it clarified my thinking once I had it. The industry distinguishes human-in-the-loop from human-on-the-loop:

In the loop: the agent pauses before each defined action, asks, and waits for an explicit yes before it proceeds. Nothing happens without a human decision per step.
On the loop: the agent acts on its own within set bounds, while a human watches the stream of what it's doing and can step in — during or after. You oversee the outcomes, not each keystroke.

I had defaulted to in-the-loop for everything, on the theory that more gates meant more safety. What I actually had was one gate I'd stopped manning.

The shift: from clicking yes to watching outcomes

So I moved most of the work to on-the-loop. The agent runs; I watch what it does through a log I actually read, aggregated outcomes, and exceptions that get flagged to me — not a yes/no prompt on every step.

The mental model that unlocked it: this is how you manage a competent engineer, not a suspect. You don't approve each line they write. You agree on the plan, you let them work, and you review the result and anything weird that came up. Per-line approval would insult them and exhaust you, and it wouldn't even produce better code — it would produce a bottleneck and a manager who skims. Oversight of a capable worker has always been on-the-loop. Agents are no different.

What makes letting go safe (this is the real work)

Here's the part that matters, and it's pure architecture, not trust. Moving to on-the-loop is only responsible if the system is built so that watching is enough. Three things make it so:

Thresholds in business terms, not model terms. Not every action is equal, so they don't get equal treatment. The recommended rule is to define gates by consequence: a reversible, low-stakes step (draft this, summarize that, reformat) runs free; spending over a threshold, touching privileged systems, or anything irreversible still stops and asks. You spend your scarce attention only where a mistake is expensive — which means when a prompt does appear, you actually read it, because it's rare and it matters.
Observable, interruptible, bounded. You can't watch what you can't see. The agent has to emit a trace you can read, be stoppable mid-run, and operate inside hard limits it cannot exceed. That's the same constraint-in-the-architecture idea pointed at autonomy: the safety isn't the agent's good behavior, it's the fence around it.
You earn the graduation with evidence. In-the-loop versus on-the-loop isn't a personality choice; it's a calibration you move along per task type as reliability is demonstrated. The careful version of this is to keep an agent recommending-but-not-executing until its judgment has been validated across many real decisions — measured, the same way an eval set tells you a thing is ready before users do. Trust is granted on data, not vibes, and only for the categories that earned it.

Notice that none of that is "approve each action." It's design work you do once, up front, so that running the agent doesn't require a human reflex on every step.

The job changed, it didn't disappear

The thing I want to be clear about: going on-the-loop is not abdication. The oversight didn't go away — it moved up a level, from labor to design. Instead of spending attention per decision (which doesn't scale and decays into rubber-stamping), I spend it on the policy: which actions are reversible, which need a gate, what the agent must log, what an exception looks like, and what evidence graduates a task to more autonomy. That's the same move as reviewing the spec instead of every diff — reason about the system, not each instance.

The goal of oversight was never to have my fingerprints on every decision. It was to be genuinely responsible for the outcomes. Approving everything felt like control and delivered a reflex. Watching, with real boundaries underneath, feels like letting go — and is the first time I was actually in control of what the agent could do.

Comments

No comments yet

Be the first to share a thought.