Express course · No. 15

A cache is a small, fast place to keep an answer you already worked out, so the next time you need it you read it instead of redoing the work. It's the single biggest speed trick in computing — and it comes with computing's most famous hard problem: knowing when the remembered answer has gone stale.

Essence only · One picture per idea · Learn the words

§ 01

A cache is one simple idea: keep a copy of an answer somewhere fast, so you don't pay to produce it twice. Get this and every cache you'll ever meet is the same trick in a different place.

A cache remembers the answer

A notepad on your desk next to a slow filing cabinet — once you've looked something up, you jot it down, so next time you read the note instead of crossing the room again.

A cache is a small, fast store that holds the result of expensive work — a database query, a computed page, a downloaded file — so the next request for it is cheap. The expensive thing happens once; the cheap read happens many times. That's the entire principle. Everything else is detail about where the notepad lives and when to trust what's on it.

Hit and miss are the two outcomes

You reach for the note. Either it's there and you read it instantly — or it's blank, and you have to walk to the cabinet after all.

When the answer is in the cache, that's a cache hit — fast and cheap. When it isn't, that's a cache miss — you do the slow work, and usually store the result so next time hits. These two words describe every cache interaction. The whole game of caching is turning misses into hits: making sure the answer is there when you reach for it.

It works because of repetition

A coffee shop memorising its regulars' orders — worth it only because the same people ask for the same thing again and again.

Caching pays off when the same answers are needed repeatedly — and almost everything in computing is repetitive. The same popular page, the same user's profile, the same lookup, asked thousands of times. Because access isn't random but clustered on a few hot items, a small cache of the popular ones can absorb a huge share of the work. No repetition, no reason to cache.

A cache keeps a fast copy of an expensive answer. A hit is reading the note; a miss is the walk to the cabinet. The job is turning misses into hits.

§ 02

Caching isn't one thing in one place — it's a pattern repeated at every layer between a user and the data. The closer the copy sits to who needs it, the faster it is.

The closer the cache, the faster the hit

Supplies you keep on your desk, in the room, in the building, and in a warehouse across town — the nearer it is, the quicker you get it, but the less you can store.

There's a ladder of caches between a user and the truth, each closer and faster than the last: the browser holds files on your device, a CDN holds copies near your city, the application keeps hot data in memory, and the CPU has tiny caches nanoseconds away. Nearer caches are faster but smaller. A request tries the closest first and falls back outward on a miss.

In-memory stores like Redis are the workhorse

A clerk who keeps the most-asked files on the desk in arm's reach, instead of fetching each one from the basement archive every time.

Most apps put a dedicated in-memory cache — a tool like Redis or Memcached — between the application and the database. Memory is dramatically faster than disk, so keeping hot results there turns a slow database query into a near-instant read. When people say "add a cache" to speed up a backend, this is usually what they mean: a fast memory store in front of a slow source of truth.

The CDN caches the web near the user

Popular books stocked in local libraries everywhere, so no one has to mail away to a single central archive.

A CDN (content delivery network) is caching applied to geography: it keeps copies of your pages, images, and files on servers around the world, serving each user from a nearby one. This is why a global site loads quickly everywhere instead of only near its origin server. It's the same hit/miss idea as everywhere else — the copy is just spread across the map to beat distance.

Caching is a pattern repeated at every layer: browser, CDN, in-memory store, CPU. Closer is faster but smaller — and a miss falls outward to the next.

§ 03

A cache is only worth its complexity if it's actually catching most requests. One number — the hit ratio — tells you whether your cache is earning its keep.

Hit ratio is the score that matters

A goalkeeper is judged by the share of shots they stop. A keeper who saves 95% is transformative; one who saves 10% barely changes the game.

The hit ratio is the fraction of requests served from the cache rather than the slow source. A 95% hit ratio means only one request in twenty reaches the database — a massive reduction in load and latency. A 10% hit ratio means the cache is mostly overhead. This single number is how you judge whether a cache is working, and improving it is the main lever you tune.

A cold cache has to warm up

A new shop with empty shelves serves no one fast at first — it stocks up as customers ask, until the popular items are always on hand.

Right after it starts, a cache is cold — empty, so every request misses and goes to the slow source. As real traffic flows, it fills with popular answers and becomes warm, and the hit ratio climbs. This is why performance can look bad right after a restart or deploy, then settle. Some systems pre-warm the cache deliberately, loading known-hot data before users arrive.

A cache trades memory for speed

Renting a bigger desk so more files fit within arm's reach — it costs more space, but you walk to the cabinet far less.

Caching isn't free: you spend memory (which is limited and costs money) to save time. A bigger cache holds more and hits more often, but you can't cache everything — so the art is caching the right things: the hot, expensive, repeatedly-asked answers that give the most speed per byte stored. A cache is a deliberate trade, and the hit ratio tells you if the trade is paying.

The hit ratio is the cache's report card. You spend memory to buy speed — so cache the hot, expensive answers that earn the most hits per byte.

§ 04

A remembered answer can go out of date while you're still trusting it. Knowing when to stop trusting the copy is the central, genuinely hard problem of caching.

Stale data is the price of caching

That phone number you jotted on a sticky note is gold — right up until your friend moves and the note quietly becomes wrong.

The flip side of remembering an answer is that the real answer can change while your copy doesn't. Stale data is a cached value that no longer matches the truth — the old price, the deleted post, last hour's stock count. Every cache risks serving stale data, and deciding how much staleness you can tolerate is the first real design question of any cache.

TTL: let it expire on a timer

Milk stamped with a use-by date — after it, you don't trust it without checking, even if it still looks fine.

The simplest freshness tool is a TTL (time to live): each cached entry gets an expiry, and after it the entry is treated as gone, forcing a fresh fetch. A short TTL means fresher data but more misses; a long TTL means more hits but more staleness. Choosing the TTL is choosing your exact spot on the freshness-versus-speed trade — per kind of data, based on how fast it really changes.

Why this is famously hard

Knowing the instant a fact changed somewhere else in the world, so you can tear up your note at exactly the right moment — easy to say, maddening to do.

There's a running joke that there are only two hard problems in computer science, and cache invalidation — knowing when a cached value must be thrown out — is one of them. It's hard because the cache often doesn't know the underlying data changed; the change happened somewhere else. The next section is the set of strategies people use to fight this exact problem.

Every cached answer can go stale. A TTL expires it on a timer — and knowing exactly when to invalidate is one of computing's genuinely hard problems.

§ 05

Two forces decide what's in a cache and whether it's trustworthy: how you update it when the truth changes, and how you make room when it's full. Both have named strategies worth knowing.

Cache-aside: the app fills the cache

You check your notepad first; if it's blank, you walk to the cabinet, read the file, and jot the answer down before carrying on.

The most common pattern is cache-aside (lazy loading): on a read, the app checks the cache; on a miss, it fetches from the source, stores the result, and returns it. The cache fills on demand with exactly what's actually requested. It's simple and popular — and its weak spot is staleness, since a value sits in the cache until something expires or invalidates it.

Write-through and write-around: keeping writes consistent

When a fact changes, you can update the desk note at the same moment you update the master file — or skip the note and let it be re-fetched fresh next time.

When data is written, you choose how the cache keeps up. Write-through updates the cache and the source together, so the cache is never stale (at the cost of slower writes). Write-around writes only to the source and lets the cache miss and reload later, avoiding caching data nobody re-reads. These names describe where a write lands and when the cache learns about it — and which you pick depends on your read/write mix.

Eviction: deciding what to drop when full

A small desk that's full — to add a new file you must put one away, so you remove the one you haven't touched in ages, not the one you use hourly.

A cache has limited space, so when it fills it must evict something. The most common policy is LRU (least recently used): drop whatever hasn't been touched in the longest time, on the bet that it won't be needed soon. Others exist — LFU drops the least frequently used. Eviction is how a small cache automatically keeps the hot items and sheds the cold ones, holding its hit ratio up without growing forever.

Invalidation decides when a cached value is wrong; eviction decides what to drop when space runs out. Cache-aside, write-through, LRU — names for those two jobs.

§ 06

Caching's power comes with sharp edges. The classic failures aren't exotic — they're predictable, they have names, and knowing them is half of avoiding them.

The stampede when a hot key expires

A popular item sells out, and the instant it does, a hundred customers all rush the counter to ask for it at once — overwhelming the one clerk who restocks.

When a popular cached entry expires, every request that wanted it misses at the same moment and all hit the slow source together — a cache stampede (or thundering herd). The database, shielded for hours, suddenly takes the full crowd at once and can buckle. The fixes are known — let one request refresh while others wait, or stagger expiries — but you have to anticipate it; a stampede appears exactly when traffic is highest.

Stale reads: fast, confident, and wrong

Reading a price off an old sticky note and quoting it confidently — the number is delivered instantly, and it's the wrong number.

A cache will happily serve a stale value fast and with full confidence. Usually that's fine — a slightly old view count harms no one. But for data where correctness is critical — a bank balance, inventory, permissions — a stale read can be a real bug. The skill is knowing which data tolerates staleness and which must always be fresh, and never caching the second kind carelessly.

Some things shouldn't be cached at all

You don't keep a photocopy of a document that's rewritten every minute, or of someone's private letter on a shared desk — the copy is wrong, or it shouldn't be there.

Not everything belongs in a cache. Data that changes constantly gives a poor hit ratio and frequent staleness. Sensitive or per-user data risks leaking if a shared cache hands one user's copy to another. And rarely-requested data just wastes space. Knowing what not to cache — fast-changing, sensitive, or unpopular data — is as important as knowing what to cache.

Caching's classic bites are named: stampedes when a hot key expires, stale reads served with false confidence, and caching things that never should be.

§ 07

Caching is a power tool: enormous speed for a little memory, paid for with the risk of serving the past. Used deliberately, it's one of the best trades in computing.

Cache the hot, expensive, and slow-changing

You memorise the regulars' usual orders and the directions you give ten times a day — not the one-off requests or the things that change by the minute.

The ideal cache target is frequently read, expensive to produce, and rarely changed — that combination yields the most speed for the least staleness risk. A popular page built from a slow query, recomputed identically all day, is perfect. A constantly-changing, rarely-read, sensitive value is the opposite. Most caching wins come from finding the handful of answers that fit the first description and remembering exactly those.

Decide your staleness budget up front

Agreeing in advance how out-of-date is acceptable — a weather widget can be minutes old; a bank balance cannot be seconds old.

Before caching anything, decide how much staleness it can tolerate, because that choice drives everything else — the TTL, whether you need active invalidation, whether it's safe to cache at all. Make it explicit per kind of data. "How wrong is it allowed to be, and for how long?" is the question that turns caching from a source of mysterious bugs into a controlled, deliberate trade.

Before you add a cache
  • Is the data hot — asked for repeatedly enough to give a real hit ratio? - Is it expensive to produce, so a hit actually saves meaningful work? - How stale can it be — what's the TTL, and do I need active invalidation? - What happens when a hot entry expires — could it cause a stampede? - Is it safe to cache — not per-user secret data in a shared cache?
  • How will I know it's working — am I measuring the hit ratio?
The words you now own
  • cache hit / miss — answer found in the cache, or not. - hit ratio — the share of requests served from cache; the score that matters. - cold / warm cache — empty and missing, versus filled and hitting. - TTL / stale — the expiry timer, and a copy that no longer matches the truth. - invalidation / eviction — throwing out wrong data, and dropping data to make room (LRU). - cache-aside / write-through — the app fills it lazily, or writes update it immediately. - stampede / thundering herd — the rush to the source when a hot key expires.
Signs you cache well
  • You can name the hit ratio of your important caches. - Each cached thing has a deliberate TTL matched to how fast it really changes. - Nothing per-user or secret sits in a shared cache by accident. - Your hot keys won't stampede the source all at once on expiry. - You cache the hot, expensive, slow-changing answers — and skip the rest.

Caching trades a little memory for enormous speed, at the risk of serving the past. Done well, it's deliberate: the right answers, with a staleness budget you chose on purpose.

End of express course · 7 chapters · learn the words

Next comes practice: find the slowest repeated operation in something you've built — a query, an API call, a computed page — and put a small cache in front of it with an explicit TTL. Then watch the hit ratio and the latency change. The trade becomes real the moment you measure it. But hold one idea above the rest: caching is just remembering an expensive answer instead of redoing the work — and every hard part is really one question, asked honestly: when is the thing I remembered no longer true?