Express course · No. 21

A test is just code that checks your code, automatically. People think its job is to prove today's code works — but the real payoff comes later: tests are what let you change, refactor, and add features tomorrow without breaking what already worked. Learn the kinds and the vocabulary, and testing stops being a chore and becomes the thing that lets you move fast.

Essence only · One picture per idea · Learn the words

§ 01

Almost everyone misunderstands what tests are for. They look like a way to prove code works; their real value is something deeper and more practical — and seeing it changes how you write them.

A test is code that checks code

A second person who re-does your sums on a calculator and flags any that don't match — automatic, tireless, and the same every time.

A test is a small piece of code that runs your code with a known input and checks the result is what you expect. "Given 2 and 3, does add return 5?" If yes, the test passes; if not, it fails and tells you. That's the entire mechanic. The power is that it's automatic and repeatable — you can run thousands of these checks in seconds, every time you change anything, without a human re-verifying by hand.

The real point is change without fear

A safety harness on a climber: it doesn't make the climb, but it lets you attempt a bold move because a slip won't be fatal. You climb harder because you're caught if you fall.

Here's the misunderstanding. Tests aren't mainly about proving today's code is correct — they're about tomorrow. When you change code, refactor it, or add a feature, a good test suite instantly tells you if you broke something that used to work. That safety net is what lets you change a large codebase confidently instead of being afraid to touch anything. Tests are how you keep moving fast without things silently breaking behind you.

They catch regressions automatically

A row of tripwires through a house: the moment something disturbs a room you weren't even looking at, an alarm goes off — you don't have to patrol every room yourself.

A regression is when a change breaks something that used to work — often somewhere far from what you touched. Without tests, regressions hide until a user finds them. With tests, the moment your change breaks an old behaviour, a test goes red and points at it, right then. This is the everyday payoff: you make a change, run the tests, and instantly know whether you also broke something elsewhere. The suite watches the whole house while you work in one room.

A test checks code automatically. Its real value isn't a one-time proof — it's the freedom to change tomorrow without silently breaking what works today.

§ 02

The foundation of testing is the unit test: a fast, focused check of one small piece of code on its own. Most of your tests will be these, and for good reason.

Test one small piece in isolation

Checking a single Lego brick is the right shape and size before building with it — not assembling the whole castle to discover one piece was wrong.

A unit test checks one small unit — usually a single function — in isolation from the rest of the system. Give it an input, check its output, nothing else involved. Because the piece is small and standalone, when the test fails you know exactly what broke and where. Unit tests are the microscope of testing: narrow, precise, and pointed at one thing at a time.

Fast and many is the whole idea

A spell-checker that scans the whole document in a blink — so fast you run it constantly, not a slow review you do once a month.

Because each unit test is tiny and touches nothing external, it runs in milliseconds — so you can have thousands and run them all on every change, constantly. That speed is the point: a test suite you can run in seconds gets run all the time, catching breakage the instant it happens. A slow suite gets skipped, and a skipped suite protects nothing. Fast and plentiful is what makes unit tests the workhorse.

Arrange, act, assert

A simple experiment: set up the conditions, perform the one action, then check the result. Clear setup, one move, one verification.

A good unit test has three plain steps, often called arrange, act, assert: set up the inputs, call the thing you're testing once, then assert the result is what you expected. An assertion is the check itself — "the answer must equal 5." Keeping tests in this clean shape makes them readable and makes a failure obvious. A test that's hard to read is one nobody will trust or maintain.

A unit test checks one small piece in isolation, in milliseconds — so you run thousands constantly. Arrange the input, act once, assert the result.

§ 03

Unit tests check pieces alone, but software is pieces working together. Two broader kinds of test cover that — and the right balance between all three has a shape worth knowing.

Integration tests check pieces working together

After checking each Lego brick, you snap a few together to make sure they actually connect — the bricks were fine alone, but do they fit?

An integration test checks that several parts work correctly together — your code talking to the database, two services exchanging data. Units can each be perfect in isolation and still fail to connect: a mismatched format, a wrong assumption about the other side. Integration tests catch exactly those seams. They're slower than unit tests because they involve more real moving parts, but they verify the joins that unit tests, by design, skip over.

End-to-end tests use the whole system like a user

A dress rehearsal of the entire play, start to finish, on the real stage — not checking lines or props alone, but the whole performance as the audience will see it.

An end-to-end test (e2e) drives the whole system the way a real user would — click the button, fill the form, check the right thing happened across the entire stack. It's the most realistic test and the most valuable confirmation that the product actually works. But it's also the slowest and most fragile: many parts must all be running, and a small UI change can break it. You use e2e tests for the few critical journeys, not for everything.

The test pyramid balances them

A pyramid stands because its base is wide and its top is narrow — flip it onto its point and it topples. The shape itself is the stability.

The test pyramid is the rule of thumb for balance: many fast unit tests at the base, fewer integration tests in the middle, and a handful of slow end-to-end tests at the top. This shape gives you broad, fast coverage cheaply, with just enough realistic checks on top. The anti-pattern — lots of slow e2e tests and few unit tests — is the "ice-cream cone," and it's slow, flaky, and painful. Aim for the pyramid, not the cone.

Unit tests check pieces alone, integration tests check them together, e2e tests check the whole system as a user. Keep the pyramid: many fast, few slow.

§ 04

To test one piece in isolation, you often have to stand in for the real things it depends on. That's what mocks and stubs are — and they're as easy to misuse as they are useful.

Replace a real dependency with a fake

A flight simulator instead of a real plane: fake instruments and views let you practise the one skill safely, without the cost and risk of actually flying.

To test a piece in isolation, you replace its real dependencies — a database, a payment service, an email sender — with fakes that stand in for them. This lets you test your code without the slow, costly, or unpredictable real thing. You don't want a unit test to actually charge a card or send an email; you swap in a fake that pretends to, so the test stays fast, reliable, and self-contained.

Stub gives canned answers; mock checks the call

A stub is a cardboard cutout that just says one line on cue; a mock is an actor who also reports back exactly what you said to them and how.

Two flavours of fake get confused. A stub simply returns a fixed, canned answer — "pretend the database returns this user" — so you can test how your code handles that input. A mock also verifies the interaction — "check that my code called sendEmail exactly once, with this address." A stub feeds your code; a mock watches how your code behaves toward it. Reach for a stub to supply data, a mock to assert a side effect happened.

Over-mocking makes tests that lie

Rehearsing a play entirely with cardboard stand-ins for every other actor — you can nail your lines and still have no idea if the real cast will actually work together.

Fakes are powerful but dangerous in excess. If you mock everything, your test passes against your assumptions about how the other parts behave — not how they actually do. The test goes green while the real integration is broken. So mock the slow, external, or unpredictable dependencies, but don't fake away the very thing you're trying to verify. A test built entirely on mocks can pass while the real system fails — the most dangerous kind of green.

Mocks and stubs stand in for real dependencies so you can test in isolation. A stub supplies data; a mock verifies a call. Over-mock, and the test passes while reality breaks.

§ 05

Beyond the kinds of test, there's the craft of writing good ones — and one discipline, test-driven development, that flips the usual order and changes how you design code.

TDD: write the test first

Drawing the target before you shoot, not after — so you aim at a defined goal instead of painting a bullseye around wherever the arrow happened to land.

Test-driven development (TDD) reverses the usual order: you write the test before the code. The cycle is red, green, refactor — write a failing test for what you want (red), write just enough code to pass it (green), then clean up the code with the test guarding you (refactor). Writing the test first forces you to define exactly what success means before you build, and guarantees the code is testable, because it was born from a test.

Test behaviour, not implementation

Judging a chef by whether the dish tastes right, not by which hand they held the knife in — care about the outcome, not the private details of how they got there.

The most important habit for durable tests: check what the code does, not how it does it. Test that getDiscount returns the right price, not that it called three specific internal helpers in order. Tests tied to the outcome survive refactoring — you can rewrite the insides and they still pass. Tests tied to the implementation break every time you tidy the code, which trains people to distrust and delete them. Test the behaviour, and the test stays useful.

Good tests are documentation

A well-written set of tests reads like a list of promises: "given this, it does that" — clearer than any comment about how the code is meant to behave.

A clear test suite doubles as living documentation. Each test states a promise — "given an empty cart, the total is zero" — and because the tests run, that documentation can never go out of date the way comments do; if behaviour changes, the test breaks. So write tests to be read: clear names, one behaviour each, obvious arrange-act-assert. A newcomer should be able to learn what the code does by reading its tests. That readability is worth as much as the checking.

Write the test first (red-green-refactor), test behaviour not implementation, and make it readable — so tests survive refactors and double as documentation that can't go stale.

§ 06

Two numbers and one nuisance shape whether a test suite is actually trustworthy. Misread coverage or tolerate flakiness, and a green suite stops meaning anything.

Coverage measures what's tested, not how well

A map showing which streets a patrol drove down — useful to spot the neighbourhoods never visited, but driving past a house doesn't mean you checked the locks.

Coverage is the percentage of your code that runs during the tests. It's a useful guide for finding what's completely untested — the streets never patrolled. But high coverage doesn't mean good tests: code can be executed by a test that asserts nothing meaningful, scoring 100% while verifying little. Use coverage to find blind spots, not as proof of quality. A line being touched is not the same as a line being checked.

Don't let coverage become the goal

Paying workers per mile of road painted, and watching them paint long, useless lines in empty lots to hit the number — the metric went up, the roads got no safer.

When a number becomes a target, people optimise the number instead of the thing it measured — Goodhart's law again. Demand "100% coverage" and you get tests written to touch lines, not to catch bugs: hollow tests that assert nothing, gaming the metric. Coverage is a flashlight for finding gaps, not a trophy to maximise. Chase real confidence — do the tests actually catch breakage? — and let coverage be one hint among several, never the goal itself.

A flaky test is worse than no test

A fire alarm that goes off at random for no reason: within a week everyone ignores it completely, so when there's a real fire, no one moves.

A flaky test is one that passes and fails randomly without the code changing — usually from timing, ordering, or a hidden dependency. Flaky tests are poison: when a red result might just be noise, people stop trusting any red, and a real failure gets waved through as "probably just flaky." A test must be deterministic — same code, same result, every time. Fix flaky tests or delete them; a suite you don't trust protects nothing.

Coverage shows what's tested, not how well — don't make it the goal. And a flaky test is worse than none: once red might mean noise, every red gets ignored.

§ 07

Testing is judgement, not ritual. The skill is testing the things that matter, in the right proportions, and wiring it into how you ship — so the suite earns trust instead of becoming busywork.

Test the risky parts, not everything

You triple-check the parachute and the brakes; you don't stress-test the cup holder. Effort goes where failure actually hurts.

Not all code deserves equal testing. Concentrate effort where a bug would be costly or likely: core business logic, money, security, the tricky edge cases, the parts that change often. Trivial, low-risk code can get less. Chasing tests for everything wastes effort and creates a suite so big and slow nobody runs it. Aim for tests that catch the failures you'd genuinely regret, not for a number — coverage of what matters beats coverage of everything.

Run the tests automatically, on every change

A turnstile that won't open unless your ticket is valid — the check is built into the gate, not left to whether someone remembers to look.

Tests only protect you if they actually run. Wire them into your pipeline so they execute automatically on every change, and block the merge if they fail — the CI gate from the deploy course. This is what turns tests from a thing you might run into a guarantee that nothing broken merges. "Tests or it didn't ship" is the standard worth holding: an untested change going to production is a bet you didn't have to make.

Before you trust a test suite

Does it let you change with confidence — would it catch a regression tomorrow? - Is it shaped like a pyramid — many unit, some integration, a few e2e? - Do tests check behaviour, not internal implementation that breaks on refactor? - Are mocks used for real external deps, not faking away the thing under test? - Is it fast and deterministic — no flaky tests eroding trust? - Does it run in CI on every change, blocking broken merges?

The words you now own

test / pass / fail / assertion — a check on code, its outcomes, and the check itself. - regression — a change breaking something that used to work. - unit / integration / end-to-end (e2e) — one piece, pieces together, the whole system. - test pyramid — many fast unit tests, fewer integration, a few e2e. - mock / stub — a fake that verifies a call, a fake that supplies data. - TDD / red-green-refactor — writing the test first, in a cycle. - coverage / flaky / deterministic — what's tested, the random-failing trap, and the cure.

Signs you test well

You change code confidently, trusting the suite to catch what you broke. - Your tests form a pyramid — fast at the base, a few realistic ones on top. - Tests check behaviour and survive refactors instead of breaking on every tidy-up. - The suite is fast and deterministic, and you fix or delete flaky tests. - Tests run in CI on every change, and you treat coverage as a hint, not a target.

Good testing is judgement: test the risky parts, keep the pyramid, check behaviour not implementation, and run it all automatically — so a green suite truly means safe to change.