Express course · No. 18
When one part of a system needs another to do something, the simple way is to call it and wait for an answer. The powerful way is to drop a message in a queue and move on, letting the other side pick it up when it's ready. That one shift — from calling and waiting to messaging and continuing — is how systems stay fast, survive failure, and scale.
Essence only · One picture per idea · Learn the words
To see why queues exist, first feel the pain of the simple alternative: one service calling another and waiting for it. It works until the moment it really matters.
A direct call ties two services together
Phoning someone and staying on the line until they finish the task — you can't hang up, can't do anything else, and if they don't pick up, you're stuck.
The obvious way for service A to get service B to do something is a synchronous call: A asks, then waits, holding everything until B answers. This couples them tightly — A's speed is now hostage to B's speed, and A can't continue until B is done. For a quick reply that's fine. For slow or heavy work, A spends its time frozen, waiting on someone else.
If the callee is down, the caller breaks
Calling the warehouse to confirm an order, and because no one answers, you refuse to take any new orders at all — one closed door jams the whole shop.
The deeper problem is failure. If B is down or overloaded when A calls, A's request fails too — the failure propagates straight back up the chain. One slow or broken service can stall everything that depends on it, and a spike in traffic to B drags A down with it. Tight coupling means weakness spreads, and the system is only as available as its shakiest part.
Some work shouldn't block the user
At a checkout, you pay and leave with your receipt — you don't stand at the till while they pack, ship, and email you. The slow parts happen after you walk away.
Lots of work doesn't need to finish before the user gets a response — sending the confirmation email, generating the invoice, resizing the photo. Making the user wait for all of it is slow and pointless. You want to accept the request, answer fast, and let the slow parts happen afterward. A direct call can't do that; it makes everyone wait for everything.
A direct call couples two services in time: the caller waits, and inherits the callee's slowness and failures. For slow or non-urgent work, that's a trap.
The fix is to stop calling and start messaging. A queue is the buffer in the middle that lets one side leave work and the other pick it up — decoupling them in time.
Drop the message and move on
Leaving a voicemail instead of waiting on hold: you say what you need, hang up, and get on with your day — they'll handle it when they can.
Instead of calling B and waiting, A writes a message describing the work and drops it in a queue — a line where messages wait to be processed. A is now free to continue immediately; it doesn't care when B gets to it. The message is a small package of "here's what needs doing," handed off rather than performed on the spot. That handoff is the whole idea.
The queue is a buffer in time
An in-tray on a desk: work piles up in it and gets handled steadily, so a sudden rush of paperwork doesn't overwhelm the person — it just makes the tray taller for a while.
The queue sits between the two sides as a buffer, holding messages until they're processed. This decouples them in time: the sender and receiver no longer have to be fast, available, or even running at the same moment. Work produced in a burst can be consumed steadily. The queue absorbs the mismatch between how fast work arrives and how fast it can be done.
Asynchronous means not waiting for the result
Mailing a letter rather than having a conversation — you send it and carry on, trusting it'll be read and acted on later, without you standing there.
This is asynchronous work: the sender doesn't wait for the result. It fires off the message and continues, and the outcome happens later, out of band. You give up the instant answer of a direct call, and in exchange you stop being blocked. For anything that doesn't need an immediate reply, that trade is the foundation of fast, resilient systems.
A queue is a buffer between sender and receiver. Drop a message and move on — the work is decoupled in time, done asynchronously when the other side is ready.
Three roles make a queue work, and naming them clears up most of the jargon. One side makes messages, one side handles them, and something in the middle holds them safely.
Producers make work; consumers do it
A kitchen: waiters pin up order tickets, and cooks take them down one by one to prepare. The two sides never speak directly — the rail of tickets is the whole interface.
The side that creates messages is the producer (or publisher); the side that processes them is the consumer (or worker). The producer drops a ticket; the consumer picks one up, does the work, and moves to the next. They don't know or wait for each other — they only know the queue. This clean split is what lets each side be built, scaled, and changed on its own.
The broker holds the messages safely
A post office between sender and recipient: it takes your letter, keeps it safe, and holds it until the recipient collects it — so nothing is lost if they're away.
In the middle sits the broker — software like RabbitMQ or Apache Kafka that receives messages, stores them reliably, and delivers them to consumers. It's the post office of the system. Its reliability is the point: even if every consumer is down, the broker keeps the messages safe until someone is ready to process them, so work is parked, not lost.
Add consumers to go faster
When the order tickets pile up, you put more cooks on the line — each takes the next ticket, and the backlog clears faster without changing how orders come in.
Because consumers just pull the next message, you can run many of them in parallel off one queue. Backlog growing? Add more workers, and they share the load automatically — each grabbing a different message. This is one of the great strengths of the pattern: you scale the slow part independently by adding consumers, without touching the producers or the queue itself.
Producers create messages, consumers process them, and a broker holds them safely in between. Need more throughput? Add consumers — they share the queue automatically.
There are two very different things a message can be: an order to do something, or an announcement that something happened. The second one quietly changes how you design whole systems.
A command tells one worker to do a task
A work order handed to one specific department: "resize this image." It names the job, expects exactly one team to do it, and that's the end of it.
A command message is a direct instruction aimed at one consumer: "send this email," "process this payment." It's the queue used as a to-do list — the producer knows what should happen and hands the task off to be done once, by whoever pulls it. This is the simplest use of a queue: offloading specific work to be done later, exactly once.
An event announces that something happened
A company-wide announcement on a notice board: "a new customer just signed up." It doesn't tell anyone what to do — whoever cares reacts in their own way.
An event is different: it states a fact about the past — "order placed," "user registered" — without saying who should do what. The producer just announces it and doesn't know or care who's listening. Maybe nobody reacts; maybe five services do. This flips the relationship: instead of commanding a specific worker, you broadcast that something is true and let interested parties decide.
Pub/sub: one event, many reactions
A newspaper printed once and delivered to every subscriber, each of whom reads it for their own reasons — sports fan, investor, crossword solver — from the same single issue.
This broadcast model is publish/subscribe (pub/sub): a producer publishes an event to a topic, and every interested consumer subscribes and gets its own copy. One "order placed" event can trigger the email service, the analytics service, and the shipping service at once — this is fan-out. The publisher doesn't even know they exist. New reactions can be added later without touching the publisher at all.
Events decouple who knows whom
Adding a new subscriber to the newspaper changes nothing about how the paper is written — the publisher never needs to know who's reading.
This is the deep power of event-driven design: the producer of an event is fully decoupled from everyone who reacts to it. To add a new feature that responds to "order placed," you just subscribe a new consumer — no change to the order service. Systems built this way grow by adding listeners, not by editing the thing being listened to. That's how large systems evolve without everything depending on everything.
A command tells one consumer to do a task. An event announces a fact and lets anyone react — and pub/sub fans one event out to many, decoupling who knows whom.
Queues add moving parts, so they have to earn it. They do, by buying three things that are hard to get any other way: resilience, smooth handling of spikes, and independent scaling.
Resilience: a down consumer just delays work
If the kitchen closes for an hour, the order tickets simply wait on the rail — when it reopens, the cooks work through the backlog. No order is lost.
Because the broker holds messages, a consumer crashing doesn't lose work — the messages wait until it comes back, then get processed. Compare that to a direct call, where a down service means an outright failure. The queue turns "the service is down" from a hard error into a temporary delay. This resilience — work survives a failure and resumes — is often the single biggest reason teams reach for a queue.
Load leveling: absorb the spikes
A reservoir between a flash flood and a town: the surge pours into the reservoir, which releases water at a steady, safe rate downstream.
When a flood of requests arrives at once — a sale, a viral moment — a queue lets you level the load: the spike fills the queue, and consumers drain it at their own steady pace. The slow downstream system never sees the spike, only a manageable, even flow. The backlog handling pressure is called backpressure. Without a queue, the same spike would hammer the system directly and likely topple it.
Scale and evolve each side on its own
You can hire more cooks, swap the kitchen's equipment, or add a new station — all without changing how the waiters take orders, because they only ever touch the ticket rail.
Decoupling means each side can change independently. Scale consumers up or down to match demand; rewrite a consumer in another language; add a brand-new consumer reacting to existing events — all without touching the producers. The queue is a stable seam between parts of the system, and stable seams are what let a large system grow and change in pieces instead of all at once.
Queues buy resilience (work survives a crash), load leveling (spikes get absorbed), and independent scaling — benefits a direct call simply can't give you.
Asynchronous messaging solves real problems and creates new ones. None are dealbreakers, but pretending they don't exist is how queue-based systems get subtle, maddening bugs.
Messages can arrive twice
The post office, to be safe, sometimes delivers a copy of a letter it isn't sure arrived — better twice than never, but now you might act on the same instruction twice.
Most brokers guarantee at-least-once delivery: they'd rather send a message twice than risk losing it, so a consumer can occasionally receive the same message more than once. If "charge the customer" runs twice, that's a real problem. The fix is idempotency — designing the work so that doing it twice has the same effect as doing it once (check "already charged?" first). Assume duplicates; make them harmless.
Order isn't guaranteed
Two letters mailed in sequence can arrive in either order — so if step two shows up before step one, the recipient is confused.
With many messages and many parallel consumers, messages don't necessarily get processed in the order they were sent. If "update address" and "delete account" arrive out of order, you get nonsense. Some systems preserve order within a category at a cost; often it's cheaper to design work that doesn't depend on strict order. Either way, never assume ordering unless you've specifically arranged for it.
The result is eventually consistent
An announcement spreading through an office: for a few moments some people know and others don't, until word reaches everyone and they agree again.
Because reactions happen asynchronously, the system is eventually consistent: right after an event, different parts may briefly disagree — the order exists but the analytics haven't counted it yet. It catches up in moments, but "instantly consistent everywhere" is gone. You design around this gap, showing users sensible in-between states, rather than assuming every part updates in the same instant.
A poisoned message needs somewhere to go
A letter no one can act on — smudged, impossible — can't just be retried forever, jamming the line. It gets set aside in a special tray for someone to look at.
Some messages can never be processed successfully — malformed, or for data that's gone. Left alone, a consumer retries them forever and blocks the queue. The standard answer is a dead-letter queue: after a few failed attempts, the message is moved aside to a separate queue for inspection, so it stops poisoning the flow. Plan for messages that fail, or one bad message stalls everything behind it.
Async messaging brings duplicates (handle with idempotency), no guaranteed order, eventual consistency, and failing messages (use a dead-letter queue). Plan for all four.
A queue is a power tool, not a default. The skill is reaching for it when decoupling genuinely pays, and keeping the direct call when simplicity is worth more.
Use a queue when waiting is the problem
You take a number and leave for slow service you'll collect later — but for a quick yes-or-no you just ask at the desk, because taking a ticket would be sillier than waiting.
Reach for a queue when work is slow, spiky, or non-urgent, when you want the caller to stay fast, or when several services should react to one thing. Keep a direct call when you need an immediate answer to continue — a price lookup, a permission check — because there the simplicity and the instant result are worth more than decoupling. Match the tool to whether waiting is actually the problem.
Don't reach for it by reflex
Installing a full mailroom and sorting system to pass a note to the person at the next desk — the machinery now costs more than the problem it solves.
Queues add real complexity: a broker to run, duplicates and ordering to handle, eventual consistency to design around, and a harder time tracing a request that now hops through messages. For a simple system with a quick call between two parts, that overhead isn't worth it. Add a queue when a concrete benefit — resilience, scale, decoupling — justifies the cost, not because event-driven sounds advanced.
- Is waiting the problem — is the work slow, spiky, or non-urgent enough to decouple? - Command or event — one worker doing a task, or many services reacting to a fact? - Is the work idempotent — safe to run twice, since messages can arrive twice? - Does it assume ordering, and have I arranged that or designed it away? - Can the system tolerate brief eventual inconsistency? - Where do failed messages go — is there a dead-letter queue?
- synchronous / asynchronous — call and wait, versus message and continue. - coupling / decoupling — tied together in time, versus free of each other. - queue / message / broker — the buffer, the unit of work, the software that holds them. - producer / consumer — the side that makes messages, the side that processes them. - command / event — do this task, versus this happened. - pub/sub / topic / fan-out — broadcast one event to many subscribers. - at-least-once / idempotency / dead-letter queue / eventual consistency — the async hazards and their fixes.
- You queue the slow, spiky, non-urgent work and keep direct calls for instant answers. - Your consumers are idempotent, so a duplicate message does no harm. - You don't assume ordering unless you've explicitly arranged it. - You design for eventual consistency instead of expecting instant agreement. - Failed messages land in a dead-letter queue instead of jamming the line.
Queues are deliberate: reach for one when decoupling in time buys resilience, scale, or fan-out — and keep the direct call when an instant answer matters more than all of it.