Software Architect · Module 08

Code can be rewritten in a quarter. Data, contracts, and historical mistakes sometimes live for the entire life of the product.

Invariants · normalization · migration · OLTP · OLAP

§ 01

The data model has to protect the meaning of the business, not just conveniently serve the current screen.

Data ages slower than code

You can redo the interior quickly. A foundation poured wrong stays with the house for a long time.

The framework, the UI, even the programming language can change. But tables, events, keys, historical records, and external reports keep living. That's why the data model is one of the most expensive architectural decisions.

The architect has to be explicit about invariants — things that must always be true. For example: an order can't be paid without a payment intent; a ledger entry can't disappear; a balance can't go negative without an allowed overdraft.

Normalization and denormalization aren't a religion

An archive stores the original documents. A storefront makes copies for quick browsing. Both forms are needed — for different jobs.

Normalization reduces duplication and the risk of drift. Denormalization speeds up reads and simplifies the read model. OLTP systems care about transactions and the consistency of current state. OLAP systems care about analytical queries, history, and aggregations.

The mistake is trying to perfectly serve checkout, the realtime dashboard, and the accounting report with one model. Often you need a source-of-truth model plus separate projection / read models.

§ 02

A good model makes impossible states impossible — or at least explicitly illegal.

Example: pricing snapshot on the order

A store receipt records the price at the moment of purchase, even if the item gets more expensive tomorrow.

An order stores not just product_id, but a snapshot of price, currency, taxes, discount, and name at the time of purchase. The catalog can change, but the historical order stays correct.

It's a small denormalization that protects business meaning. Without it, reporting and customer support start depending on the current state of the catalog.

Anti-example: the universal events_json table

A warehouse with no shelves feels flexible — you can put anything anywhere. Then nobody can find what they need.

The team dumps every business entity into one table with a JSON column because "it's more flexible." Fast at first. Then come migrations inside the JSON, implicit schemas in the code, slow queries, no constraints, and no way to tell which data is valid.

JSON is useful for extensible attributes and events, but it shouldn't replace the model where there are strict invariants.

Self-check
  • Which invariants does the database protect, and which only the code? - Which data is the source of truth? - What happens to history when a reference table changes? - What does a safe migration of this model look like?