Software Architect · Module 13

Performance isn't a feeling that the code is fast. It's measurable targets, profiling, and a clear price for optimization.

Latency · throughput · profiling · caching · budget

§ 01

Optimize the bottleneck — not the place that just looks ugly.

Latency and throughput are different things

A taxi gets one person there fast. A metro moves thousands. Comparing them with one word — "fast" — is meaningless.

Latency is the response time for one operation. Throughput is how many operations the system handles over a period. You can lower the latency of a single request and make the system's throughput worse — for example, with aggressive parallelism that doesn't control resources.

The architect has to state a performance requirement concretely: p95 response time, p99 tail latency, requests per second, batch duration, cold start, memory budget. "It should be fast" isn't a requirement.

A cache is responsibility for staleness

The copy of the menu in the window is useful while the prices match the register. After that, it's a source of arguments.

Caching speeds up reads, but it adds invalidation, TTL, a consistency model, and the risk of stale data. Before adding a cache, ask: what exactly is expensive, how often does it change, can a stale value be shown, how do we invalidate the cache when the source of truth changes.

Caching without understanding the data model is dangerous: you can speed up the wrong answer.

§ 02

Good optimization starts with measurement and ends with verifying the result.

Example: a slow endpoint through profiling

A doctor makes a diagnosis first, instead of jumping to surgery because the patient said "it hurts somewhere."

The /dashboard endpoint has a p95 of 2.8 seconds. Tracing shows 80% of the time goes to N+1 queries against the database and a missing index on workspace_id, created_at. The team adds eager loading, the index, and a performance test.

The result: p95 drops to 380 ms. The fix is targeted, measured, and adds no extra architectural complexity.

Anti-example: rewrite in Go without a profile

Swapping the engine is pointless if the car is stuck in traffic because the road is closed.

The team decides to rewrite the service in another language because "Python is slow." Later it turns out 90% of the time was external APIs and badly designed SQL queries.

The language can be a bottleneck, but it has to be proven. Otherwise optimization turns into an expensive refactor with no business impact.

Self-check
  • Which performance metric matters here specifically? - Where is the bottleneck according to the profiler/tracing? - What budget is acceptable? - What will break when you add caching?