Express course · No. 28
When a model doesn't know your domain, there are three ways to fix it: tell it in the prompt, hand it the facts at runtime (RAG), or actually retrain its weights (fine-tuning). They solve genuinely different problems — and most people reach for the most expensive one first. Here's what each really does, the one distinction that decides between them, and how to choose the cheapest thing that works.
Essence only · One picture per idea · Engineering over magic
All three techniques exist to solve one problem: a model is brilliant in general but ignorant of your specifics. Understanding exactly what it lacks is how you pick the right fix.
Its knowledge is frozen and general
A brilliant graduate who read a vast library years ago — but not your company's files, and nothing published since the day they graduated.
A model knows only what was in its training data, frozen at a cutoff date and drawn from the public internet. It has broad general knowledge and no awareness of your private data, your specific domain, or anything that happened after training. So out of the box it can't answer about your product, follow your house style, or use last week's numbers. Everything in this course is a way to close that gap between the model's general knowledge and your particular world.
Three different things you might need
A new hire might need a briefing document, a filing cabinet they can consult, or actual training to change how they work — three different gaps, three different fixes.
The gap comes in different shapes, and they don't have the same fix. Sometimes you need the model to know some facts for this task; sometimes to draw on a large, changing body of knowledge; sometimes to behave differently — a consistent format, tone, or skill. These map to the three techniques: telling it (prompting), giving it a knowledge source (RAG), or retraining it (fine-tuning). Naming which gap you have is the whole decision.
Match the technique to the gap
You don't send someone to a year of training when a one-page memo would do — you match the effort to what's actually missing.
The techniques differ enormously in cost, speed, and what they're good for, so reaching for the wrong one is expensive in both senses. The most common mistake is jumping straight to fine-tuning — the heaviest option — when prompting or RAG would solve the problem faster and cheaper. The skill isn't knowing how to fine-tune; it's diagnosing the gap precisely enough to pick the lightest technique that closes it.
A model's knowledge is frozen and general — it doesn't know your world. Prompting, RAG, and fine-tuning each close a different gap, so diagnosing the gap is how you choose.
The first and cheapest technique is simply putting what the model needs into the prompt. It's instant, free of training, and far more capable than people assume — so it's where you always start.
Put what it needs right in the context
Briefing a sharp temp worker before a task: here's the situation, the rules, an example of good — they're capable, they just needed the context, and now they can do it.
The simplest way to give a model what it lacks is to put it in the prompt — the instructions, the relevant facts, the format you want, a few examples. The model uses that in-context information immediately, no training required. Because models are strong at following clear instructions and examples, a surprising amount of "the model can't do this" is really "I didn't tell it clearly enough." Prompting is the first thing to exhaust, not the last.
Few-shot teaches behaviour, instantly
Showing three finished examples of exactly the output you want, then asking for a fourth — the pattern is learned on the spot, no schooling needed.
You can shape not just what the model knows but how it behaves, right in the prompt, by showing examples — few-shot prompting. Want a specific format, tone, or way of handling a task? Include two or three examples of it, and the model matches the pattern. This means a lot of what people assume requires fine-tuning — a consistent style, a particular output shape — can often be achieved with examples in the context, for free and instantly.
Its limits: the window and repetition
A briefing works for one meeting, but you can only hand over so many pages, and you have to re-hand them every single time — fine for a memo, clumsy for an encyclopedia.
Prompting has real limits. Everything must fit in the context window, so you can't paste in a huge knowledge base. And it's per-call: you send that context every time, paying for it on each request, and the model never retains it between calls. When the knowledge is too big for the window, changes constantly, or you'd be pasting the same large material endlessly, prompting alone stops being enough — and that's exactly where the next technique comes in.
Prompting puts what the model needs in the context — facts, format, examples — instantly and without training. It's powerful and cheap, limited only by the window and by repeating it each call.
When the knowledge is too big or too fresh for the prompt, you don't put it all in — you fetch the relevant piece at the moment you need it. That's retrieval-augmented generation.
Retrieve the relevant facts at runtime
An open-book exam: instead of memorising the whole library, you look up the few relevant pages at the moment of the question and answer from those.
RAG (retrieval-augmented generation) handles knowledge too large to fit in a prompt by fetching only the relevant part on demand. At question time, it searches your documents, pulls the most relevant chunks, and puts those into the context for the model to answer from. So the model works from your actual, current data without ever having memorised it — you give it the right page exactly when it needs it. (The RAG course goes deep; this is its place in the lineup.)
Best for large, changing, private knowledge
A reference library you keep updated: you don't reprint the librarian's brain when a fact changes — you just update the shelf, and the next lookup is current.
RAG shines exactly where prompting struggles: a knowledge base too big for the window, facts that change often, or private data the model was never trained on. Update a document and the next answer reflects it instantly — no retraining. This makes RAG the standard way to ground a model in your specific, current, possibly-confidential information. When the gap is "it needs to know things," especially things that move, RAG is usually the answer.
RAG adds knowledge, not new behaviour
Handing someone better reference books makes them better informed — but it doesn't change their writing style or teach them a new skill. Facts in, not habits.
The crucial thing about RAG: it changes what the model knows for this answer, not how it fundamentally behaves. The model's underlying skills, style, and reasoning are unchanged — you've just handed it better facts to work from. So RAG is the right tool for a knowledge gap and the wrong tool for a behaviour gap. If you need the model to consistently respond in a particular way, retrieving more documents won't get you there — which points at the third technique.
RAG fetches the relevant facts at runtime, grounding the model in knowledge too big, too fresh, or too private for the prompt. It adds what the model knows — not how it behaves.
The heaviest technique actually modifies the model itself. It's the only one that changes the model's ingrained behaviour — and the one people reach for too soon, for the wrong reasons.
Fine-tuning retrains the model on your examples
Not briefing a worker for one task, but sending them through training that changes how they work by default — the new way is baked in, not handed over each time.
Fine-tuning takes an existing model and trains it further on a set of your own examples, adjusting its internal weights so the new behaviour becomes part of the model itself. Unlike prompting and RAG, which add information at runtime and leave the model unchanged, fine-tuning actually changes the model. The result is a model that behaves your way by default, without needing the instructions or examples in every prompt — it learned them.
It teaches behaviour, format, and style
Training that turns a generalist into someone who reliably writes in your house voice or handles your specific task the same way every time — a learned habit, not a reminder.
What fine-tuning is genuinely good for is behaviour: a consistent tone or house style, a specific output format, a specialised task the model does over and over, or matching the way your domain phrases things. When you have many examples of "input like this should produce output like that," fine-tuning can bake that pattern in deeply and reliably, beyond what examples in a prompt achieve. It shapes how the model responds, learned into its weights.
The cost: data, effort, and upkeep
Sending someone to a real training program costs time, money, and a curriculum — and when the job changes, you have to retrain them all over again.
Fine-tuning is the expensive option. It needs a quality dataset of examples (often many), a training process, and expertise — and crucially, it's not a one-time cost: when things change, you retrain. LoRA and other "parameter-efficient" methods make it cheaper by adjusting only a small part of the model rather than all of it, which has lowered the barrier — but it's still far heavier than prompting or RAG. You take on fine-tuning's cost only when its specific benefit is worth it.
Fine-tuning retrains the model's weights so new behaviour is baked in — great for consistent style, format, and repeated tasks. It's the costly option: data, training, and ongoing upkeep, even with LoRA.
One distinction settles most of the confusion between fine-tuning and RAG, and avoids the single most common expensive mistake. Get this and the choice usually makes itself.
Fine-tuning teaches form; RAG teaches facts
You train a person in how to write a report (a learned skill), but you hand them the data to put in it (looked up each time) — the skill is taught, the facts are fetched.
Here is the rule that cuts through it: fine-tuning is for how the model responds; RAG is for what it knows. Fine-tuning teaches form — style, tone, format, the shape of a task — by baking it into the weights. RAG supplies facts — current, specific, private information — by fetching them at runtime. One changes behaviour; the other changes knowledge. Almost every "should I fine-tune or use RAG?" question dissolves once you ask whether the gap is form or facts.
The expensive mistake: fine-tuning to add knowledge
Sending someone to school to memorise a phone book that changes weekly — by the time they've learned it, it's already wrong, and you'll have to retrain for every update.
The most common costly error is fine-tuning to add facts the model should know. It mostly doesn't work well, it's expensive, and worst of all the facts are frozen the moment training ends — the moment your data changes, the fine-tuned model is out of date and you must retrain. Facts that change belong in RAG, where an update is instant. Fine-tuning to inject knowledge is doing the hard, brittle thing where the easy, current thing was right there.
They combine: fine-tune the form, RAG the facts
A specialist trained in how to do the job (fine-tuned), who also consults an always-current reference for the specifics (RAG) — the best of both, each doing what it's good at.
These aren't rivals; the strongest systems often use both. You can fine-tune a model for your domain's behaviour and style, and use RAG to feed it current facts at runtime — form from fine-tuning, facts from retrieval, each handling the gap it's actually suited to. Seeing them as complementary, not competing, is the mark of understanding the distinction: you're not choosing one technique, you're applying each to the part of the problem it fits.
Fine-tuning teaches form — style, format, behaviour; RAG teaches facts — current, specific knowledge. The classic costly mistake is fine-tuning to add facts, which freeze the moment training ends.
Put the three together and they form a ladder of increasing cost and power. The discipline is the same as everywhere in engineering: climb only as high as the problem forces you.
Start at the bottom: prompting
Before booking a training course, you try just explaining the task clearly — most of the time, that was all it needed.
The ladder runs from cheap and instant to expensive and slow: prompting, then RAG, then fine-tuning. Always start at the bottom. Try clear instructions and a few examples first; an enormous share of problems are solved right there, for free, in minutes. Only when prompting genuinely can't close the gap do you climb. Starting at the top — reaching for fine-tuning first — is the expensive mistake that defines beginners with a budget.
Climb only when the rung below can't reach
You move to a bigger tool only when the smaller one has actually failed at the job — not because the bigger one sounds more serious.
Climb to RAG when the knowledge is too big, too current, or too private for the prompt — a clear knowledge gap prompting can't fill. Climb to fine-tuning when you need consistent behaviour that prompting and examples can't reliably get, and you have the data and budget to teach it. Each rung is justified only by the rung below failing. The question at every step is: did the cheaper technique actually fall short here, or am I just assuming it would?
Most products never need the top rung
Most jobs get done with a good briefing and a reference book — full retraining is the exception, reserved for the rare case that genuinely demands it.
The honest reality: the large majority of LLM applications are well served by prompting and RAG, and never need fine-tuning at all. Fine-tuning is a real and powerful tool for the specific case of ingrained behaviour at scale — but it's the exception, not the default. Treating it as a last resort, reached only when the lighter rungs have demonstrably failed, will steer you right far more often than reaching for it because it sounds advanced.
The ladder is prompting, then RAG, then fine-tuning — rising cost and power. Start at the bottom and climb only when the rung below genuinely can't reach. Most products never need the top.
Choosing well comes down to diagnosing the gap honestly and measuring whether your fix worked — the same engineering discipline that governs everything else with models.
Diagnose the gap before picking the tool
A good doctor diagnoses before prescribing — they don't reach for surgery because it's dramatic, they find out what's actually wrong first.
The whole decision turns on naming the gap precisely: does the model lack facts (RAG), need clearer instruction (prompting), or need different behaviour baked in (fine-tuning)? Most wrong choices come from skipping this diagnosis and jumping to a technique. Ask what's actually missing, match it to the lightest tool that fits, and you'll avoid the expensive detours — especially the big one of fine-tuning for facts.
Measure whether it actually worked
You don't assume the training helped — you test the worker afterward and see if the output got better.
Whichever technique you use, verify it with evals: did the change actually improve the outputs, on real cases? A better prompt, a RAG pipeline, a fine-tune — each is a hypothesis you confirm by measuring, not by feeling. This is doubly important for fine-tuning, where the cost is high and a vague "it seems better" isn't enough to justify it. Measurement keeps you honest about whether the heavier technique earned its place over the lighter one.
- What's the gap — facts, clearer instruction, or different behaviour? - Have you tried prompting — clear instructions and a few examples — first? - Is it a knowledge gap — large, current, or private facts — pointing to RAG? - Is it a behaviour gap — consistent form, style, repeated task — pointing to fine-tuning? - For fine-tuning, do you have the example data and budget, including ongoing upkeep? - Are you measuring whether the chosen technique actually improved the outputs?
- training data / cutoff — what the model knows, frozen at a date. - prompting / in-context / few-shot — telling the model what it needs, right in the prompt. - RAG — retrieving relevant facts at runtime to ground the answer. - fine-tuning / weights / LoRA — retraining the model so behaviour is baked in. - form versus facts — fine-tuning changes how it responds; RAG changes what it knows. - the decision ladder — prompting, then RAG, then fine-tuning, by rising cost. - evals — measuring whether the technique actually improved the result.
- You diagnose the gap — facts, instruction, or behaviour — before picking a technique. - You start with prompting and climb only when it genuinely falls short. - You use RAG for knowledge and fine-tuning for behaviour, and never fine-tune to add facts. - You combine them when it helps — form from fine-tuning, facts from RAG. - You measure with evals, and treat fine-tuning as a justified last resort, not a default.
Diagnose the gap, then climb the ladder: prompting for instruction, RAG for facts, fine-tuning for behaviour. Use the lightest technique that works, combine them when it helps, and measure that it did.