ARCHITECTURE · June 3, 2026

Grounding isn't a feature. It's a constraint.

An LLM is a guesser by design — it will always make things up, and you can't prompt that away. The only reliable fix is architectural: put a deterministic source in charge of the facts and demote the model to a rephraser that may never author one. 'Add RAG' isn't that. Here's the difference, and why it's the line between an AI that sounds confident and one you can trust.

You ask an assistant a factual question. It answers in clean, confident prose — names, numbers, a citation. It reads like the truth. It is completely wrong. No error, no warning, no flicker of doubt. Just a fluent, well-structured, fabricated answer delivered with exactly the same confidence as a correct one.

Everyone who has used an LLM has met this moment. The instinct is to treat it as a bug — something a better model, or a cleverer prompt, will eventually fix. It won't. And understanding why is the whole reason grounding has to be an architectural decision, not a feature you sprinkle on at the end.

Hallucination is not a bug. It's the mechanism.

A language model doesn't store facts and look them up. It predicts the next plausible token, over and over, from patterns in its training data. When it knows the answer, the most plausible continuation happens to be true. When it doesn't, it doesn't stop — it produces the most plausible continuation anyway, which is fluent, confident, and made up.

Andrej Karpathy put it more bluntly than I can (source):

In some sense, hallucination is all LLMs do. They are dream machines... It's only when the dreams go into deemed factually incorrect territory that we label it a "hallucination." Hallucination is not a bug, it is the LLM's greatest feature.

That same machinery is what lets a model write a poem, refactor your code, or explain a concept three different ways. You cannot remove the guessing without removing the thing that makes it useful. The dreaming is the engine.

And it doesn't go away with scale. The Stanford AI Index 2026 measured hallucination rates across 26 leading models and found them ranging from 22% to 94% depending on the task. The best models are far better than the worst, but none of them is zero, and none of them ever will be — because it isn't a defect to patch. It's the nature of a guesser.

So stop trying to fix the model. Fix the system around it.

The architectural move: don't trust the guesser with the facts

Here's the whole idea in one sentence: the model may phrase the truth, but it may never author it.

If hallucination is what the model does whenever it has to produce a fact, then the fix is to never put it in that position. Don't ask the model what's true. Compute or look up what's true with something deterministic — a database, a calculation, an API, a rules engine — and hand those facts to the model with one job: say them well. The instant the model is only ever rephrasing facts it was given, it has nothing left to invent.

That's grounding. Not "the model is smart enough to be right," but "the model is structurally not in charge of being right."

"Add RAG" is not grounding

This is where most teams think they've solved it and haven't. Retrieval-Augmented Generation — fetch some relevant documents, stuff them into the context, hope the model uses them — is the most common attempt at grounding. It helps. It is not the same thing, and treating it as a feature you bolt on is exactly the trap.

RAG anchors the dream; it doesn't forbid it. The model still decides what to say, still fills gaps when retrieval comes up short, and retrieval comes up short constantly. Industry analysis in 2026 found naive RAG pipelines fail at retrieval roughly 40% of the time, and that when a RAG system gives a wrong answer, the failure is in retrieval — not generation — 73% of the time. The model dutifully wrote a confident, well-grounded-sounding answer on top of the wrong documents. No surprise that 77% of data leaders say RAG alone isn't reliable enough for production.

The difference that matters isn't "do you retrieve." It's what the model is allowed to do when the facts aren't there. Bolt-on RAG lets it guess. Real grounding doesn't — the fact either comes from the source or it doesn't get stated. That's not a prompt instruction. It's a property of how the system is built.

What it looks like when you actually enforce it

I build this for a living, so let me make it concrete. I run an astrology API, Astrolinkers, and a consumer product on top of it, Alwenna. Astrology is a perfect stress test for grounding: it's all specific positions, scores, and relationships, and a model is more than happy to invent a confident, plausible, completely fabricated reading.

So the model never gets to. Astrolinkers computes the chart deterministically — real positions, real synastry, real numbers, no LLM anywhere near the math. Only then does the language model enter, with a single job: take those computed facts and phrase them in a warm, plain voice. It is never asked what your chart says; it's told, and asked to say it well. Every claim Alwenna makes traces back to a real, computed value. If the engine didn't produce a fact, the model has nothing to say about it — not "make something up that sounds right."

That rule — the model rephrases, the engine decides — is wired into the design, not written in a prompt. The model isn't asked to behave; it's structurally unable to author a fact, because the facts arrive pre-computed and it sits downstream of them. That's the difference between hoping and enforcing.

Why "constraint," not "feature"

This is the distinction the title is about, and it's not pedantry.

A feature is something you add and can turn off. It lives in a prompt ("please only use the provided context"), it's optional, and it degrades silently — the day retrieval misses, the model quietly guesses and nobody notices until a customer does. A prompt is a polite request to a system whose entire nature is to produce plausible text. Plausible is not true.

A constraint is enforced by the shape of the system. The model is downstream of a source of truth, and there is no path by which it can state a fact the source didn't provide — not because we asked nicely, but because the architecture doesn't offer one. You don't hope it stays grounded; it can't leave the ground.

It's the same discipline as everywhere else in engineering. You don't prevent a class of bugs by asking developers to be careful; you make the bug unrepresentable — with a type, a boundary, an invariant the code can't violate. Grounding is that move applied to the one component that will confidently lie to you. Put the invariant in the architecture, where it holds, not in the prompt, where it's a suggestion.

The payoff

Do this and something quietly flips. The LLM stops being the part of your system you have to second-guess and becomes the part you can rely on — because you've narrowed its job to the one thing it's genuinely excellent at (language) and taken away the one thing it's structurally bad at (knowing what's true).

That's the whole game. An LLM is most useful precisely when it is least trusted with the facts — when a deterministic source owns the truth and the model is left to do what it's brilliant at: make that truth sound human.

Grounding isn't a feature you add to make the model better. It's a constraint you enforce so the model can't be wrong in the way it most wants to be.

Comments

No comments yet

Be the first to share a thought.