Express course · No. 16
A program that does one thing at a time is simple and often too slow. So we make software juggle many tasks, or run them on many cores at once. That's where the speed comes from — and where a whole family of strange bugs comes from too, all rooted in one thing: two pieces of work touching the same data at the same moment.
Essence only · One picture per idea · Learn the words
Two words get used as if they mean the same thing, and they don't. One is about juggling; the other is about having many hands. Getting them straight is the foundation for everything else.
Concurrency is juggling; parallelism is many hands
One cook managing four dishes — chopping, then stirring, then checking the oven, switching fast — versus four cooks each making one dish at the same instant.
Concurrency is dealing with many tasks by switching between them — one worker making progress on several things, never truly at the same instant. Parallelism is many tasks literally running at the same instant, on multiple workers. One cook juggling four dishes is concurrent; four cooks is parallel. They often combine, but they're different ideas — concurrency is a way of structuring work; parallelism is a way of executing it.
One core can be concurrent; many cores are parallel
A single ticket clerk can serve a whole queue by handling each person's quick steps in turn; a row of clerks serves several people in the very same second.
A processor core does one thing at a time, but can switch between tasks so fast it appears to do many — that's concurrency on one core. True parallelism needs multiple cores actually running at once. Modern machines have many cores, so real software usually does both: several tasks in flight (concurrent), some genuinely running together (parallel). Knowing which you have explains what speedups are even possible.
Why bother at all
A waiter who took one table's full order, delivered it, and only then greeted the next table would leave a room full of people waiting on nothing.
Doing one thing at a time wastes enormous amounts of waiting. While one task waits — for a network reply, a disk, a user — the worker could be making progress on another. Concurrency reclaims that idle time; parallelism adds raw horsepower for heavy work. Together they're why an app stays responsive while loading, and why a big computation can use every core. The cost is the complexity the rest of this course is about.
Concurrency is juggling many tasks by switching; parallelism is running them at the same instant on many cores. Different ideas — one structures work, the other executes it.
To run things concurrently, you split work into separate lines of execution. There are two main kinds, and the difference between them is mostly about one thing: whether they share memory.
A process is an isolated program
Two separate kitchens in two separate buildings — each with its own ingredients, its own counters, unable to touch the other's supplies without a phone call between them.
A process is a running program with its own private memory, isolated from other processes. Two processes can't accidentally corrupt each other's data because they don't share any — to communicate, they have to pass messages deliberately. That isolation makes processes safe and robust, but heavier: starting one costs more, and talking between them is slower. Think of processes as separate, walled-off programs.
A thread shares memory with its siblings
Several cooks in one kitchen, sharing the same counters and ingredients — fast to coordinate, but they can reach for the same knife at the same moment.
A thread is a line of execution inside a process, and threads in the same process share the same memory. That makes them lightweight and fast to coordinate — but it's also the source of nearly every concurrency bug, because two threads can touch the same data at the same time. Shared memory is a power and a hazard: cheap communication, dangerous collisions. Most of this course is about that hazard.
Switching between tasks has a cost
A worker juggling jobs has to put down one, remember exactly where they were, and pick up another — and all that setting-down and picking-up takes real time.
When one core switches from one task to another, it performs a context switch: saving where it was and loading the next task's state. It's fast but not free, and doing it too often — thousands of tiny tasks thrashing — wastes time in pure overhead. This is why "just make more threads" isn't automatically faster: past a point, the juggling itself costs more than it saves.
Processes are isolated, with private memory and safe-but-heavy messaging. Threads share memory — lightweight and fast, and the root of most concurrency bugs.
Much of the time, a program isn't computing — it's waiting. How you handle that waiting, blocking or asynchronous, decides whether one worker can stay busy or sits idle.
A blocking call stops and waits
Standing at a counter while the clerk goes to the back to fetch your order, doing nothing until they return — you can't help anyone else while you stand there.
A blocking call is one where the worker stops and waits for the result before doing anything else — asking the database and just standing there until it answers. While blocked, that worker accomplishes nothing. For a single quick step it's fine; but if you block on slow things — network, disk, other services — you waste the very time you could have spent on other work.
Async lets one worker stay busy
Taking a number at a busy deli: instead of standing frozen, you start other errands and come back when your number is called.
Asynchronous (non-blocking) work flips it: you start a slow operation, and instead of waiting, you go do something else, picking the result up when it's ready. The keywords async and await in many languages express exactly this — "start this, and let me continue; come back when it finishes." One worker can have hundreds of slow operations in flight at once, staying busy instead of idle. This is how a single thread serves thousands of waiting connections.
The event loop juggles the waiting
A single, very organised receptionist taking each call, kicking off the slow lookups, and handling whoever's ready next — never blocked on any one of them.
Async usually runs on an event loop: one worker that starts tasks, and whenever a slow thing finishes, runs the small piece of code waiting on it, then moves on. It feels like many things happening at once, but it's one worker switching fast between ready tasks. This is the model behind environments like JavaScript and Node — astonishingly efficient at juggling waiting, on a single thread.
I/O-bound loves async; CPU-bound needs parallelism
A job that's mostly waiting for deliveries is helped by a better juggler; a job that's mostly heavy lifting needs more muscle, not better scheduling.
The right tool depends on what your work is waiting on. I/O-bound work — mostly waiting for network, disk, or other services — is perfect for async: one worker juggles all the waiting. CPU-bound work — heavy computation that keeps a core fully busy — gets no help from async (there's no idle time to reclaim) and instead needs parallelism across cores. Misjudging which you have is why some "speedups" do nothing.
Blocking stops and waits; async starts the slow thing and stays busy. Async wins for waiting (I/O-bound); real parallelism wins for heavy compute (CPU-bound).
Here is the heart of why concurrency is hard. The moment two things can touch the same data at the same time, you can get a bug that's intermittent, invisible in testing, and maddening to reproduce.
A race condition: two hands, one item
Two people reaching for the last cookie at the same instant — both see it's there, both grab, and what happens next depends on the exact split-second timing.
A race condition is when the result depends on the precise timing of concurrent operations touching shared data. Two threads both read a counter as 5, both add one, both write 6 — and a count is silently lost, because the right answer was 7. Nobody made a logic mistake; the steps just interleaved badly. This is the defining bug of concurrency, and it comes straight from shared state plus simultaneous access.
The horror is that it's intermittent
A rattle in a car that vanishes the moment you take it to the mechanic — real, intermittent, and impossible to show on demand.
Race conditions are nasty because they're non-deterministic: they depend on timing, so they appear once in a thousand runs, never in your tests, and only under real load in production. The same input can work or fail depending on microscopic scheduling. This is why concurrency bugs have a fearsome reputation — you can't reliably reproduce them, and "it works on my machine" is exactly what you'd expect even when it's broken.
The danger is the shared, mutable state
The collisions only happen over the things both people are allowed to grab and change — lock the shared cupboard, or give each their own, and the fights stop.
Notice the precise source: data that is both shared (more than one task can reach it) and mutable (it can be changed). Remove either property and the race vanishes — data only one task touches is safe, and data nobody changes is safe to share freely. This is the key insight the rest of the course builds on: every fix is really a way to control, restrict, or avoid shared mutable state.
A race condition is two tasks touching shared, mutable data at once, with the result hanging on timing. It's intermittent, invisible in tests — and the defining bug of concurrency.
The classic fix for a race is to make sure only one task touches the shared data at a time. It works — but the tool you use to enforce it brings a famous hazard of its own.
A lock gives one task exclusive access
A single key to a one-person bathroom: whoever holds it goes in; everyone else waits their turn outside. No two people inside at once, ever.
A lock (or mutex, for "mutual exclusion") ensures only one task enters a stretch of code at a time. Before touching the shared data, a task must acquire the lock; others wanting it wait until it's released. The protected stretch is the critical section. Done right, this turns a chaotic free-for-all over shared data into an orderly one-at-a-time queue, and the race condition disappears.
Deadlock: everyone waiting forever
Two people at a narrow doorway, each refusing to step back until the other does — both stuck, politely, permanently.
Locks bring their own classic failure: deadlock. Task A holds lock 1 and wants lock 2; task B holds lock 2 and wants lock 1. Each waits for the other to let go, and neither ever does — both frozen forever. It comes from acquiring multiple locks in different orders. The common cure is discipline: always acquire locks in the same agreed order, so the circular wait can't form.
Locks are correct but costly
One bathroom key keeps order, but if everyone constantly needs it, a queue forms and the whole office slows to the speed of that single door.
Locks fix races, but they have a price: while one task holds the lock, others wait, so heavy locking can erase the speed you wanted from concurrency in the first place. Over-locking creates contention — everyone queuing on the same lock. So you hold locks as briefly as possible, protect as little as possible, and prefer designs that need fewer locks. A correct program that's serialised on one lock isn't much better than a single-threaded one.
A lock gives one task at a time exclusive access, killing the race. But locks can deadlock, and over-locking serialises everything — correct, but slow.
Locks fight the symptoms of shared mutable state. The deeper move is to design so there's less of it to fight — and a few patterns make whole classes of concurrency bugs simply impossible.
Don't share memory — pass messages
Instead of many cooks fighting over one shared counter, each works at their own station and hands finished plates down a pass — no two ever reach for the same thing.
A powerful alternative to shared memory is message passing: tasks don't touch the same data; they send each other messages through a queue. Only one task owns a given piece of data, and others request changes by sending a message. There's a famous slogan for it: "don't communicate by sharing memory; share memory by communicating." Remove the shared access and you remove the race at the root, no locks required.
Immutable data can't be raced
Handing everyone their own photocopy instead of one shared original — they can all read at once, and no one can scribble on what another is reading.
If data never changes after it's created — if it's immutable — then any number of tasks can read it at the same time with zero risk, because the danger was never sharing, it was sharing and changing. Instead of mutating a value in place, you make a new one. Immutability removes one half of the "shared and mutable" condition, and with it a whole category of bugs. It's why functional styles lean on it so heavily.
Atomic operations are indivisible
A turnstile that counts each person as they pass in one indivisible click — there's no half-counted moment for two people to slip through.
Some operations are atomic: they happen as one indivisible step that can't be interrupted halfway, so two tasks can't catch each other mid-update. Languages and databases provide atomic counters, atomic swaps, and transactions for exactly the small shared updates that races love. When you genuinely must share a changing value, an atomic operation is often a cheaper, safer fix than a full lock — the update simply can't be split.
The best concurrency bug is the one made impossible: pass messages instead of sharing, make data immutable, or use atomic operations — and the race has nowhere to live.
Concurrency is power you should add on purpose, not by reflex. The skill is reaching for it only when the problem needs it, and choosing the simplest model that does the job.
Match the tool to the bottleneck
You don't hire more lifters for a job that's all waiting, or a better scheduler for a job that's all heavy lifting — you match the help to where the time actually goes.
Before adding concurrency, know your bottleneck. If the work is I/O-bound — mostly waiting — async on one thread often solves it simply. If it's CPU-bound — mostly computing — you need real parallelism across cores. If it's neither — fast and simple — adding concurrency just buys you bugs for no gain. The most common mistake is reaching for threads when plain, sequential code was already fast enough.
Keep the shared surface tiny
A workshop where each person mostly works alone, with one small, clearly-marked shared tool rack — the less that's shared, the less can go wrong.
Since every concurrency bug comes from shared mutable state, the master move is to minimise sharing. Keep most data owned by one task, make what you can immutable, and confine the genuinely shared, changing parts to a small, carefully-guarded surface. A design with a tiny shared surface is one you can actually reason about; one where everything is shared and mutable is a haunted house. Simplicity here is not optional — it's the whole defence.
- What's the bottleneck — I/O-bound (async), CPU-bound (parallel), or neither (don't)? - What data is shared and mutable, and could two tasks touch it at once? - Can I avoid sharing — message passing, immutability, or one atomic operation? - If I must lock, is the critical section tiny and the lock order consistent? - Could it deadlock, and have I ruled out the circular wait? - Is the added complexity worth it, or was sequential code already fast enough?
- concurrency / parallelism — juggling by switching, versus running at the same instant. - process / thread — isolated with private memory, versus sharing memory (and the risk). - blocking / async / await / event loop — stop-and-wait, versus start-and-continue. - I/O-bound / CPU-bound — waiting-heavy versus compute-heavy work. - race condition — timing-dependent bug over shared, mutable state. - lock / mutex / critical section / deadlock / contention — exclusive access and its hazards. - message passing / immutable / atomic — the patterns that remove the race at the root.
- You reach for async for waiting and parallelism for computing — not by reflex. - You can point to exactly what data is shared and mutable, and it's small. - You prefer message passing and immutability over scattering locks everywhere. - Your locks are short, ordered, and you've reasoned about deadlock. - You don't add concurrency where plain sequential code was already fast enough.
Concurrency is deliberate power: async for waiting, parallelism for computing, and a tiny shared surface guarded on purpose. Every bug traces to shared mutable state — so share as little as you can.