Software Architect · Module 12

Sync gives you a simple mental model. Async gives you resilience and decoupling — but demands new guarantees.

Queue · event · outbox · backpressure · retry

§ 01

Choosing sync or async is a choice between an immediate answer and durable processing over time.

A synchronous call fits a question

If the cashier asks the price of an item, the answer is needed now. If the store sends a report to accounting, that can take a queue.

A synchronous request is good when the user needs the result right now: check the password, show the price, confirm access. It's simpler for UX and for debugging, but it couples the availability of the caller and the callee.

Async fits work that can be done later: email, indexing, analytics, webhook delivery, report generation, enrichment. It lowers coupling but demands statuses, retries, a dead-letter queue, and observability.

An event is a fact, not a command

"The door is open" is a fact. "Open the door" is a command. You can mix them, but the system will start arguing about intent.

An event describes something that already happened: OrderPlaced, PaymentCaptured, UserRegistered. A command asks for something to happen: CapturePayment, SendEmail. In architecture, it pays not to mix these forms.

If an event is used as a hidden command, the consumer becomes dependent on the producer's internal intent. That breaks loose coupling.

§ 02

Async shouldn't be the place where errors quietly disappear.

Example: transactional outbox

The clerk records the letter in the log first, then sends it. If the mailroom goes down, the record isn't lost.

When an order is created, the service writes the order and the outbox event in one transaction. A separate worker reads the outbox and publishes the event to the broker. If the broker is temporarily unavailable, the event stays in the database and will be sent later.

The outbox closes a classic hole: the data was saved, but the event got lost between commit and publish.

Anti-example: the queue as a trash bin

An inbox isn't a process. If nobody works through it, the work is just hidden.

The team sends every heavy operation to a queue, but doesn't define a retry policy, idempotency, ordering, visibility timeout, DLQ, or alerts. In production, messages hang, duplicate, or take hours to process.

A queue doesn't make the system reliable on its own. It moves the complexity from the request path into background processing.

Self-check
  • Does the user need the result right now? - What happens if a message is delivered twice? - Where is the fact stored that this event must be published? - How does the system show backlog and failed jobs?