All notes
Who checks the checker?

June 8, 2026

Who checks the checker?

Google built an AI that writes research papers and another AI that reviews them — and a system that keeps revising the paper until the AI reviewer approves. It's efficient, and it's a trap. When the thing that generates the work and the thing that judges it share the same mind, the check is circular: they have the same blind spots, and models even prefer their own answers. 'It passed because the AI said so' isn't verification. It's one intelligence nodding at itself. The fix is older than AI: the judge has to be independent of the maker.

Google recently introduced AI agents to help with academic work — one that generates publication-quality figures, and one that does peer review. Around the same time, a related research framework took the obvious next step: it uses a simulated AI reviewer to score a paper, then keeps revising the manuscript until that reviewer's score goes up, and accepts it when the AI approves. AI writes the paper; AI grades the paper; the loop closes.

It's a tidy, efficient idea, and it's the cleanest illustration of a mistake that's about to be everywhere. Because the moment you need to check AI's work — and you always do — the tempting move is to have another AI check it. That instinct is the trap, and it's worth understanding exactly why.

The check is circular when the checker shares the mind

Here's the core problem, stated plainly. If the thing that produces the work and the thing that judges the work are the same model — or two models from the same family, trained on the same data the same way — they share the same blind spots. A mistake the generator can't see, the judge can't see either, because they're looking with the same eyes. The verification feels rigorous and changes nothing, because both halves agree on exactly the things they're both wrong about.

It gets worse than neutral. Research on using LLMs as judges found a consistent self-preference bias: models rate their own outputs, and outputs from their own family, more highly — and the better a model is at recognizing its own writing, the stronger the bias. So an AI grading AI isn't just blind in the same places; it's actively tilted toward approving work that looks like its own. The closed loop doesn't converge on "correct." It converges on "what this kind of model likes."

And you can't engineer your way out of it from the inside. As one analysis put it, when generation and evaluation happen in the same epistemic space, the justification is circular — you can't fix it by making the model smarter or calibrating the judge, because they're the same kind of thing judging itself. A better brain checking its own homework is still checking its own homework.

This is the same bug in a dozen disguises

Once you see the pattern, you notice it everywhere I keep writing about. The agent that declares victory against a weaker target is grading itself. The model that fabricates a result and writes it up cleanly is its own unreliable reviewer. The sycophant that agrees with you is the social version. In every case the failure is the same shape: the thing being trusted to verify is too close to the thing being verified to catch the error.

So "have the AI check it" doesn't solve the verification problem. It relocates it and hides it behind a green checkmark.

Independent doesn't mean human — it means different

I want to be fair, because the opposite overcorrection is also wrong. Using an AI as a judge isn't useless — done right, an LLM judge agrees with human reviewers around 85% of the time, better than two humans agree with each other. The problem was never "AI can't evaluate." It's that the evaluator can't be the generator, or its twin. The rule is independence, not humanity:

  • Don't grade a model with itself or its own family. If a model wrote it, judge it with a different model from a different lineage. Shared training is shared blind spots; cross-lineage checking catches what self-checking can't.
  • Anchor the judge to real ground truth, not vibes. The strongest check isn't another opinion — it's reality. Did the code actually run? Did the experiment reproduce? Did the number match the source? Wire verification to something outside the model's epistemic space, where being wrong has consequences it can't argue away.
  • Keep a human on the things that matter. Not to review everything — that doesn't scale — but to calibrate the automated judges against real outcomes, and to own the decisions where a circular error becomes a real one.
  • Never close the loop on anything important. A system that generates and approves its own work with no outside reference will drift confidently into nonsense and rate the nonsense highly. Always leave one door open to something the model didn't author.

The bottom line

"Who checks the checker" sounds like a riddle, but it's the most practical question in AI right now, because the default answer the industry is reaching for — "another AI, ideally the same one" — is the wrong one. A verifier that shares the generator's mind isn't a check; it's a mirror, and mirrors are very good at telling you what's already in front of them.

So when you need to trust AI's output, resist the easy loop. Make the judge independent — a different model, a deterministic test, the real world, a human — because the whole point of verification is to bring in a perspective the work didn't already contain. "The AI approved it" is only meaningful if the approving AI could actually have said no. Build so that it could.

Comments

No comments yet

Sign in to join the conversation.

Be the first to share a thought.