Express course · No. 32
Computers compare text by exact characters; the meaning behind it is invisible to them. Embeddings fix that — they turn a piece of text into a list of numbers, a point in space, placed so that things meaning similar things sit close together. Suddenly 'find me things like this' becomes 'find nearby points,' a problem a computer solves instantly. It's the quiet engine behind search, RAG, recommendations, and much more.
Essence only · One picture per idea · Meaning made computable
The whole field rests on one elegant idea: if you can place meaning in space, you can compute with it. Grasp that and embeddings stop being mysterious.
Computers see characters, not meaning
A machine that can tell two photographs are different pixel by pixel, but has no idea both show a dog — the surface differs, the meaning is invisible to it.
To a computer, text is just characters. It can check whether two strings are identical, but it has no native sense that "car" and "automobile" mean the same thing, or that "the bank approved my loan" and "my mortgage was accepted" are about the same event. Matching exact words misses meaning entirely. For software to work with what text means, rather than how it's spelled, you need a way to turn meaning into something a computer can measure.
An embedding turns text into a point in space
A map where every idea has a location, and ideas that mean similar things sit near each other — "dog" close to "puppy," both far from "tax return."
An embedding is a list of numbers — a vector — that represents the meaning of a piece of text, positioned in a high-dimensional space so that similar meanings land near each other. A model reads the text and produces its coordinates. Now "dog" and "puppy" sit close together, and both sit far from "spreadsheet." Meaning, which was invisible to the computer, has become a position — and positions are something software can compare, sort, and search.
Once meaning has coordinates, you can compute with it
Once every town is a dot on a map, "find the nearest town" becomes simple geometry — the map turned a vague question into a measurable one.
This is the move that makes everything else possible: with meaning expressed as coordinates, fuzzy questions about meaning become precise questions about geometry. "Which documents are about this topic?" becomes "which points are near this point?" "Is this similar to that?" becomes "how far apart are they?" The computer can't understand meaning, but it can measure distance perfectly — and embeddings let it answer questions about meaning by measuring distance instead. That translation is the whole trick.
Computers see characters, not meaning. An embedding turns a piece of text into a point in space, placed so similar meanings sit close — making meaning something software can measure.
With meaning placed in space, comparing two things by meaning becomes measuring how far apart their points are. This simple idea is the engine under everything embeddings do.
Close points mean similar things
On a map, two towns near each other are easy to travel between; two on opposite coasts are far. Nearness on the map mirrors nearness in reality.
Because embeddings place similar meanings near each other, distance between points measures difference in meaning: close together means similar, far apart means unrelated. To ask how alike two pieces of text are, you embed both and measure the gap between their vectors — a small gap signals they mean nearly the same thing. The geometry directly encodes the semantics, so a distance you can compute stands in for a similarity you couldn't.
Find by similarity: the nearest neighbours
A librarian who, handed one book, can instantly point you to the shelf of books most like it — not by title, but by what they're about.
The core operation is nearest-neighbour search: given one point, find the points closest to it. "Show me things like this" becomes "find this item's nearest neighbours in the space." Hand the system a question, a product, a document, and it returns the most semantically similar ones, ranked by closeness. This single operation — find what's nearest — is what powers semantic search, recommendations, and most of what comes later. Everything is some version of "find the near points."
Direction matters more than raw distance
Two arrows pointing the same way are alike even if one is longer — what matters is the direction they point, not their length.
In practice, similarity is usually measured by the angle between vectors rather than straight-line distance — a measure called cosine similarity. Two embeddings pointing in the same direction are treated as similar even if one is "longer," because direction captures meaning while length often just reflects things like text length. You don't need the math to use embeddings, but it's worth knowing the standard measure of "how similar" is about direction in the space, not raw distance.
Similar meanings sit close, so similarity becomes distance. The core operation is nearest-neighbour search — find the closest points — usually measured by direction, via cosine similarity.
Finding nearest neighbours is easy with a handful of points and hard with millions. The vector database is the engine built to do it fast at scale — the practical home of embeddings.
A store built for nearest-neighbour search
A warehouse organised so that, given any item, it can instantly bring you the hundred most similar — not by scanning every shelf, but by how it's arranged.
A vector database stores your embeddings and answers "find the nearest points to this" queries quickly, even over millions of vectors. It's purpose-built for the one operation embeddings need: nearest-neighbour search by similarity. You embed your data once, store the vectors, and the database becomes a searchable index of meaning. When people build search, RAG, or recommendations on embeddings, the vector database is where the vectors live and the searching happens.
Approximate search makes scale fast
To find the closest café you don't measure the distance to every café on Earth — you look in your neighbourhood first. Smart shortcuts beat checking everything.
Comparing a query against every stored vector would be far too slow at scale. So vector databases use approximate nearest-neighbour (ANN) search: clever indexing that finds the closest points without checking them all, trading a tiny bit of accuracy for an enormous speed gain. This is why a vector search over millions of items returns in milliseconds. You rarely need the exact nearest neighbours — "very close" is as good as "closest" — and ANN is what makes embedding search practical at real scale.
Embed once, search many times
You catalogue the library once when books arrive, and after that every lookup is instant — the slow indexing happens up front, the fast searching forever after.
The shape of an embedding system is: embed your data once up front and store the vectors, then search them cheaply on every query. Computing embeddings has a cost, but you pay it when data is added, not on each search. This is why embeddings scale well for retrieval — the expensive step is one-time indexing, and the per-query cost is just a fast nearest-neighbour lookup. Knowing this split helps you reason about both the cost and the freshness of an embedding system.
A vector database stores embeddings and finds nearest neighbours fast, using approximate (ANN) search to scale to millions. Embed your data once up front; search it cheaply on every query.
The most direct use of embeddings is search that understands meaning. It's worth seeing clearly, because it's both the most common application and the foundation of RAG.
Search by meaning, not matching words
A helper who finds what you want even when you describe it in your own words — "the thing that keeps drinks cold" gets you the fridge, no exact match needed.
Semantic search finds results by meaning rather than by matching keywords. You embed the user's query and search for the document chunks whose vectors are nearest to it. So a search for "how do I get my money back" finds a page titled "refund policy," because their meanings sit close — even though they share almost no words. Keyword search would miss that entirely; semantic search is built for it. It finds what you meant, not just what you literally typed.
Embed the query the same way as the data
To compare two measurements you must use the same ruler — measure one in inches and the other in centimetres and the numbers don't line up.
For search to work, the query and the stored documents must be embedded by the same model, into the same space — otherwise their coordinates aren't comparable and "nearest" is meaningless. You embed your documents with a model, and at query time you embed the question with that same model, then find the nearest document vectors. This sounds obvious but it's a common bug: mixing embedding models, or changing the model without re-embedding everything, quietly breaks search because the points no longer share a space.
This is the retrieval under RAG
The open-book exam again: semantic search is how you find the right pages to look at before you answer — the lookup that makes grounded answers possible.
Semantic search is the retrieval step in retrieval-augmented generation (the RAG course). When a RAG system answers from your documents, embeddings are how it finds the relevant chunks to put in the model's context. So the quality of a RAG system rests heavily on the quality of its embedding search — get the right chunks near the query and the answer is grounded; get the wrong ones and the model answers from junk. Embeddings are the engine that makes RAG find the right facts.
Semantic search finds by meaning, not keywords — embed the query the same way as the data and find the nearest vectors. It's the retrieval engine underneath RAG.
Search is the obvious use, but the same "meaning as coordinates" idea quietly powers a surprising range of other tasks. Seeing them shows why embeddings are such a foundational tool.
Clustering and deduplication: grouping by meaning
Sorting a pile of mixed documents into stacks by topic — putting the ones that mean similar things together, without reading every word.
Because similar meanings cluster in space, you can group data by meaning automatically: gather customer feedback into themes, organise articles by topic, find the natural categories in a pile of text. The same idea catches duplicates and near-duplicates — two support tickets describing the same issue in different words sit close together, so you can detect and merge them. Any task that's really "which of these mean the same thing?" is an embedding task.
Recommendation: more like what you liked
A shop assistant who, seeing what you bought, points you to other things people with similar taste enjoyed — similarity in, suggestions out.
Recommendation is nearest-neighbour search in disguise: embed items (products, articles, songs) so similar ones sit close, and "recommend more like this" becomes "find the nearest items to what the user liked." The same machinery that finds documents similar to a query finds products similar to a purchase. Whenever you see "you might also like," there's a good chance embeddings and a nearest-neighbour search are quietly behind it.
Classification and anomaly detection
A security guard who knows what normal looks like and notices the one person behaving unlike everyone else — the outlier stands apart in the space.
Embeddings power classification — a new item is likely the same category as the labelled items nearest to it in the space — and anomaly detection — something whose embedding sits far from everything normal is an outlier worth flagging. Spotting fraud, off-topic content, or unusual inputs all become "find what's far from the cluster." The recurring pattern across all these uses is the same: turn things into points, then reason about nearness and distance. Master that, and embeddings become a tool for far more than search.
Embeddings power far more than search: clustering and deduplication group by meaning, recommendation finds similar items, and classification and anomaly detection reason about nearness and distance.
Embeddings are powerful but easy to misuse in ways that quietly break results. A few traps account for most embedding systems that disappoint.
The embedding model decides quality
A translator who only half-understands a language produces a flawed map of its meaning — every downstream comparison inherits that distortion.
Everything depends on the model that creates the embeddings. A good embedding model places meaning accurately, so similar things really do sit close; a weak or mismatched one produces a distorted space where the distances mislead you. The choice of embedding model is a quality decision, not a detail — and a model trained on general web text may embed your specialised domain (legal, medical, code) poorly, because it doesn't capture the distinctions that matter there. Pick the model deliberately, and check it on your data.
Similar is not the same as relevant
Two passages can be on the same subject yet answer different questions — "how to cancel" and "why people cancel" sit close, but only one is what was asked.
Embedding similarity captures topical closeness, which is usually a good proxy for relevance — but not always. Two texts can be near in meaning while only one actually answers the need. This is why raw embedding search is a strong first pass but not the final word: it finds what's about the right thing, which isn't quite the same as what's useful. Knowing that "nearest" means "most similar," not "most relevant," keeps you from over-trusting the top result and points you toward refinements like re-ranking.
Garbage in, garbage space
If you file documents under sloppy, inconsistent labels, even a perfect search of the cabinet returns a mess — the index is only as good as what went into it.
The quality of an embedding system is bounded by the quality of what you embed. Bad chunking (the chunking from the RAG course), noisy text, or embedding the wrong field produces a space where even perfect search returns poor results — the distances are computed over junk. So much of the work in embedding systems isn't the search; it's preparing clean, well-structured, well-chunked text to embed in the first place. Fix the inputs before you blame the search, because the space inherits whatever you put into it.
Embeddings disappoint in known ways: a weak or mismatched model distorts the space, "similar" isn't always "relevant," and noisy inputs make even perfect search return junk. Choose the model and clean the inputs.
Using embeddings well comes down to choosing the right model, feeding it clean data, and measuring whether the space actually reflects the meaning you care about.
Choose the embedding model for your data
You hire a translator who actually knows the dialect you work in — a generalist might miss the very distinctions that matter most to you.
The most consequential choice is the embedding model, and the right one depends on your data and task. A general model is fine for general text; a specialised domain may need a model that understands it, or one fine-tuned on it, to place its meanings accurately. Consider the language, the domain, the vector size, and the cost. Don't just grab a default — pick the model that builds an accurate space for your meaning, because every comparison you'll ever make rides on it.
Measure whether the space reflects real meaning
You test a new map by checking that places you know are near each other actually come out near each other — if they don't, the map is wrong.
Don't assume an embedding setup works — check it. For a set of cases where you know what should be similar, verify that the search actually returns those as nearest. This is the embedding version of evals: does the space put the right things close, on your real data? Measuring this catches a bad model, a domain mismatch, or a chunking problem before it quietly degrades everything built on top. An embedding space you haven't tested is a guess; one you've measured is a tool you can trust.
- Is the task about meaning — similarity, grouping, retrieval — that embeddings actually fit?
- Is the embedding model right for your data, domain, and language? - Are query and documents embedded by the same model, into the same space? - Is there a vector database with approximate search for your scale? - Is the text you embed clean and well-chunked, not noise?
- Have you measured that the space puts the right things close, on real cases?
- embedding / vector — a piece of text turned into coordinates representing its meaning. - similarity / distance / cosine similarity — meaning closeness measured as nearness, usually by direction. - nearest-neighbour search — finding the closest points to a given one; the core operation. - vector database / approximate (ANN) search — the store and the fast, scalable search engine. - semantic search — finding by meaning, not keywords; the retrieval under RAG.
- clustering / recommendation / classification / anomaly detection — the uses beyond search. - embedding model — what creates the coordinates, and decides the quality of the whole space.
- You reach for embeddings when the task is about meaning, not exact matching. - You chose the embedding model for your data and embed query and docs the same way. - You use a vector database with approximate search at scale. - You embed clean, well-chunked text, and treat "similar" as a strong but imperfect proxy for relevant. - You measured that the space reflects real meaning, instead of assuming it.
Embeddings turn meaning into coordinates so similarity becomes distance. Use them well by choosing the right model, embedding consistently, feeding clean text, and measuring that the space really puts similar things close.