Express course · No. 24

Performance work goes wrong the same way every time: someone guesses what's slow, optimises the wrong thing, and adds complexity for nothing. The discipline is the opposite — measure to find the real bottleneck, fix that one thing, then measure again. Learn what actually makes software slow, and the words for it, and 'make it faster' becomes a method instead of a guess.

Essence only · One picture per idea · Learn the words

§ 01

The first rule of performance is the one everyone breaks: don't trust your hunch about what's slow. Almost all wasted optimisation comes from skipping this single step.

Your intuition about slowness is usually wrong

A doctor who prescribes surgery based on a guess, without ever running a test — confident, fast, and far too often operating on the wrong thing entirely.

Developers are famously bad at guessing where time goes. The part you're sure is slow is often fine; the real culprit is somewhere you never suspected — a tiny function called a million times, a hidden database call. Acting on a hunch means you optimise the wrong thing, add complexity, and the program stays slow. The first move is never "fix it" — it's find out where the time actually goes.

Profile to find the real cost

An itemised bill that shows exactly where the money went, line by line — so you stop guessing and see that one charge, not the ones you assumed, ate the budget.

A profiler is a tool that measures where your program actually spends its time, function by function. It turns "I think this is slow" into "this one function is 80% of the runtime." With that, you fix what matters instead of what you imagined. Whether it's a profiler, timing logs, or your observability metrics, the principle holds: let measurement, not intuition, point you at the problem.

Measure, fix, measure again

A scientist changes one variable, re-runs the experiment, and checks the result — never assuming a change helped, always confirming it did.

Performance is a loop: measure to find the slow part, change it, then measure again to confirm the change actually helped — and didn't just move the problem or make it worse. Without the second measurement you're guessing whether you improved anything, which is how "optimisations" quietly make things slower. Treat every fix as a hypothesis you verify, the same discipline as evals for AI or tests for code.

Intuition about slowness is usually wrong. Profile to find where time really goes, fix that, and measure again to confirm — never optimise on a guess.

§ 02

Before optimising, know which kind of "fast" you actually want. Two different goals get confused constantly, and improving one can do nothing for the other.

Latency is the wait for one thing

How long a single coffee takes from order to cup — one customer's experience, measured start to finish.

Latency is how long one operation takes from start to finish — one request, one page load, one query. It's what an individual user feels: the wait. When someone says "the site is slow," they almost always mean latency — the delay between doing something and getting a response. Lowering latency is about making each single thing happen faster.

Throughput is how much you handle per second

How many coffees the whole café serves an hour — not how long one takes, but the total flow through the shop.

Throughput is how much work the system handles per unit of time — requests per second, jobs per minute. It's about total capacity, not individual wait. A system can have great throughput (serving thousands at once) while each user still waits a while (high latency), or low latency but limited capacity. They're different goals, and which one you're chasing changes what you fix.

They're different, and sometimes they trade off

A motorbike gets one person there fastest (low latency); a bus moves the most people per trip (high throughput). The best vehicle depends on which you need.

Latency and throughput are independent, and optimising one can hurt the other. Batching — grouping work together — often raises throughput but adds latency, because each item waits for the batch. So you have to decide which matters here: a user-facing page lives or dies on latency; a background data pipeline cares about throughput. Naming your real goal keeps you from optimising the number that doesn't matter to your users.

Latency is the wait for one operation; throughput is how much you handle per second. They're independent goals — and batching often trades latency for throughput.

§ 03

Performance isn't spread evenly. Almost always, one part dominates the time — and optimising anything else is wasted effort until you fix that one.

One slow part usually dominates

A chain is exactly as strong as its weakest link — reinforcing the strong links does nothing; only the weak one decides whether it holds.

In most slow systems, a single bottleneck accounts for the bulk of the time — one query, one slow service, one bad loop. Everything else is already fast enough. This is why guessing is so wasteful: optimise a part that's only 2% of the runtime and the best you can win is 2%. Find the part that's 80%, and fixing it transforms everything. The whole skill is locating the dominant cost, not improving the fast parts.

The system is only as fast as its slowest step

A highway that's wide open until one lane-closure where every car crawls — the whole journey's speed is set by that one chokepoint, not the clear stretches.

A request that flows through ten steps is limited by the slowest one. Speeding up the nine fast steps barely moves the total; the slow step sets the pace. So you target the chokepoint specifically. And note that fixing one bottleneck often reveals the next: remove the slowest step and a new slowest step emerges. Performance work is iteratively finding and clearing whatever is currently the limiting step.

Optimising elsewhere is wasted effort

Polishing the trophy while the car has a flat tyre — effort spent where it changes nothing, while the thing that actually stops you sits ignored.

Time spent optimising anything but the bottleneck is, by definition, time that can't meaningfully help. It often makes things worse, adding complexity and bugs for an invisible gain. This is the discipline that measurement enforces: it stops you from lovingly optimising the part you understand and forces you onto the part that actually costs. Fix the bottleneck, ignore the rest — until the rest becomes the bottleneck.

One bottleneck usually dominates the time, and the system is only as fast as its slowest step. Fix that one thing; optimising anything else is wasted until it becomes the bottleneck.

§ 04

When you do find the bottleneck, the fix is usually one of a small number of classics. These few patterns account for the overwhelming majority of real-world speedups.

A better algorithm beats a faster machine

Two routes to the same city: a faster car on the long winding road still loses to a slower car on the straight shortcut. The route matters more than the engine.

The biggest wins often come from a better algorithm or data structure, not faster hardware or micro-tweaks. An O(n²) loop replaced with an O(n) hash-map lookup can turn minutes into milliseconds — a change no amount of faster machine could match. Before optimising the small stuff, ask whether the approach is wrong. A better shape, as the data-structures course shows, is the cheapest and largest speedup there is.

Stop making so many round trips

Making twenty separate trips to the shop for one item each, instead of one trip with a list — the travel, not the shopping, is eating your afternoon.

A huge amount of slowness is too many round trips — repeated calls to a database or service, each cheap alone but devastating in bulk. The classic is the N+1 query: fetching a list, then making one more query per item, turning one trip into hundreds. The fix is to fetch in bulk — ask for everything at once. Reducing the number of trips usually beats speeding up each one, because the round trip itself is the cost.

Don't do work you can skip or reuse

Re-cooking a meal from scratch every time someone asks for it, instead of keeping a pot warm — the fastest work is the work you don't repeat.

Two more classics. Caching — remembering an expensive result so you don't recompute it — is one of the biggest levers there is (its own course). And doing less: compute things lazily, only when actually needed; avoid loading data you won't use; skip work whose result you'll throw away. The fastest operation is the one you never run. Often the best optimisation isn't doing the work faster — it's not doing it at all.

Most speedups are a few classics: a better algorithm, fewer round trips (kill the N+1), caching expensive results, and simply doing less work. The fastest work is the work you skip.

§ 05

How you measure performance can lie to you. The average is the most comforting and most misleading number, and learning to look past it is what separates real performance work from wishful thinking.

The average hides the slow cases

A room with nine comfortable people and one on fire has a pleasant average temperature — the mean erases the case that actually matters.

Reporting performance as an average is dangerously soothing: it blends the fast majority with the slow few into one happy number. But users don't experience the average — each user experiences their request, and the slow ones feel every millisecond. An average response time of 200ms can hide that one user in twenty waits five seconds. The mean tells you the system is fine while a real slice of your users suffer.

Percentiles show the real experience

A report that says not just the typical wait but "the slowest 1% of people waited this long" — naming the bad experiences instead of averaging them away.

Percentiles capture what the average hides. p50 (the median) is the typical case — half are faster. p99 is the slow tail — 99% are faster, so it's roughly the worst experience a real user hits. Watching p99, not just the average, is how you see the users who are actually suffering. The tail is where the pain lives, and percentiles are how you make it visible.

Tail latency is what users remember

One terrible meal at a restaurant outweighs a dozen fine ones in someone's memory — the worst experiences, not the average, shape how people judge you.

The slow requests — the tail — disproportionately shape how users perceive your product, because a bad experience sticks. And at scale, the tail hits more people than it seems: if every page makes many calls, the chance that at least one is slow grows fast, so almost every user eventually feels the tail. This is why serious teams set targets on p99, not averages: taming the worst cases is what makes a product feel reliably fast.

Averages hide the slow cases that users actually feel. Watch percentiles — p50 for typical, p99 for the painful tail — because the worst experiences are what people remember.

§ 06

There's a failure mode opposite to ignoring performance: chasing it too early, everywhere, before you know it matters. The famous warning about it is famous for a reason.

Premature optimisation is the root of evil

Reinforcing every wall of a house against an earthquake that may never come, before anyone has even moved in — enormous effort spent on a problem you don't yet have.

The famous line — "premature optimisation is the root of all evil" — warns against optimising before you know what, or whether, anything needs it. Optimising code that isn't slow, or isn't even run much, buys nothing and costs plenty: complexity, bugs, and time stolen from things that matter. Most code doesn't need to be fast; it needs to be correct and clear. Speed work is for the parts measurement proves are hot.

Clear and correct first, fast where it counts

You write the recipe so anyone can follow it, and only streamline the one step you make a hundred times a day — not every step, just the hot one.

The order is: make it correct, make it clear, and make it fast only where measurement shows it matters. Clear code is easier to optimise later precisely because you can understand it; tangled "fast" code is hard to fix when the real bottleneck appears elsewhere. Optimising for speed almost always costs readability, so you spend that cost only where it buys a real, measured win — and keep the rest simple.

But don't ignore performance by design

You don't earthquake-proof a shed, but you also don't build a skyscraper on sand — some choices are cheap to get right early and ruinous to fix late.

The warning isn't "never think about performance." Choosing an O(n²) algorithm where an O(n) one was just as easy, or a structure that won't scale, is a design mistake you'll pay for dearly later. The balance: don't micro-optimise early, but don't make architectural choices that are slow by design and expensive to undo. Get the big shape right cheaply up front; optimise the details only when measurement demands it.

Premature optimisation costs clarity for speed nobody needs. Make it correct and clear first, fast only where measurement proves it matters — but don't choose a slow-by-design shape you'll regret.

§ 07

Performance done well is a calm, repeatable method, not a heroic scramble. The whole practice fits in a short loop you run only when there's a real reason to.

The loop: measure, fix the bottleneck, repeat

A mechanic diagnoses with instruments, fixes the one failing part, then re-tests — never replacing random parts hoping the noise goes away.

The method is simple and disciplined: measure to find the bottleneck, fix that one thing, measure again to confirm and to reveal the next bottleneck, and stop when it's fast enough. Each pass targets the current dominant cost and ignores everything else. This loop keeps performance work honest and efficient — you always know you're working on the thing that actually matters, and you always know whether you helped.

Set a target, then stop

You insulate the house until it's warm enough, then stop — you don't keep adding insulation forever past the point anyone can feel.

Performance has no natural end — you can always make something marginally faster. So set a target ("pages under 300ms at p99") and stop when you hit it. Without a goal, optimisation becomes an endless time-sink with diminishing returns, and complexity creeps in for gains nobody notices. Knowing what "fast enough" means — for your users, your use case — is what lets you do exactly enough performance work and then go build something else.

Before you optimise

Have I measured where the time actually goes, or am I guessing? - Latency or throughput — which kind of fast do I actually need? - What's the bottleneck — the one part that dominates the time? - Is it a classic — bad algorithm, too many round trips, missing cache, needless work? - Am I watching p99, not just the average that hides the slow tail? - Is there a target I'm optimising toward, so I know when to stop?

The words you now own

profiler / measure-fix-measure — finding real cost, and the loop that confirms a fix. - latency / throughput — the wait for one thing, versus volume per second. - bottleneck — the one slow part that dominates the total time. - algorithm / round trips / N+1 / caching — the classic sources of, and fixes for, slowness. - average / percentile / p50 / p99 / tail latency — why the mean lies and the tail hurts. - premature optimisation — chasing speed before measurement proves you need it. - batching — grouping work to raise throughput, at the cost of latency.

Signs you optimise well

You measure before touching anything, and measure again after. - You fix the bottleneck and ignore the parts that are already fast enough. - Your wins come from algorithms, fewer round trips, and caching — not micro-tweaks. - You judge speed by p99, not a flattering average. - You optimise toward a target and stop, keeping the rest clear and correct.

Performance is a calm loop: measure, fix the one bottleneck, measure again, stop at a target. Reach for it only where it's proven to matter, and keep everything else clear.