决策指南

RAG or fine-tuning?

Answer five questions about your problem and get an opinionated, reasoned recommendation — not a both-have-pros shrug.

本页面尚未完全翻译为您的语言——正在显示英文版本。

Teams reach for fine-tuning far more often than they should. The honest default is: start with retrieval, and fine-tune only when you have a specific reason. These five questions find out whether you have one.

What are you actually trying to change about the model's output?
How often does that underlying information change?
Do the answers need to cite their sources?
How many high-quality training examples do you have?
How tight is your latency and cost budget at inference?

回答所有问题以查看建议。

所有选项一览

Start with RAG

Retrieval is the right first move here. Build a solid retrieval layer, measure grounding, and only revisit fine-tuning if a concrete behaviour gap remains.

在以下情况选择它

  • You're adding knowledge the model doesn't have
  • The information changes, or answers must cite sources
  • You don't yet have a large, clean training set

权衡

  • Longer prompts mean higher per-call cost and latency
  • Retrieval quality becomes its own thing to design and measure
  • Won't, on its own, change deeply ingrained style or format

Fine-tune

This is the rare case fine-tuning is built for: you're changing behaviour, the knowledge is static, you have the data, and you don't need citations.

在以下情况选择它

  • The goal is style/format/behaviour, not new facts
  • You have thousands of clean, representative examples
  • The knowledge is static and citations aren't required

权衡

  • Every knowledge update means another training run
  • Baked-in facts can't be cited or traced
  • Needs real data discipline — garbage in, confidently-wrong out

RAG + light fine-tuning

You need both new knowledge and new behaviour. Let retrieval handle the facts and a light fine-tune handle the behaviour — in that order, so you can tell which is doing what.

在以下情况选择它

  • You need new knowledge and changed behaviour together
  • You have enough data to fine-tune the behaviour

权衡

  • Two systems to build, evaluate, and keep in sync
  • Hardest to debug — isolate retrieval before touching weights