BUSINESS · July 3, 2026

The model got commoditized. The chip didn't.

For two years the whole game was which model is smartest. That game is ending: Claude Sonnet 5 landed near Opus-class quality at a fraction of the price, the labs are racing on cost instead of IQ, and swapping providers is now a config change. When the capability layer commoditizes, the moat slides down the stack — to the inference silicon and the racks. OpenAI just taped out its own chip. Here's what that means for the rest of us building on top: your per-token price floor is set two layers up, by people you'll never meet, so design like it.

Look at what the labs are actually competing on this quarter. Not "ours can reason and theirs can't." It's price. Claude Sonnet 5 shipped at close to Opus-class agentic quality for $3 per million input tokens against Opus's $5 — and it's the default now. The frontier models are converging on "good enough for almost everything," the gaps between them are shrinking, and — as I keep saying — swapping one for another is a config change, not a rewrite.

That's what commoditization looks like. And when the layer everyone obsessed over turns into a commodity, the interesting money moves somewhere else.

When the top of the stack flattens, the moat drops down it

It dropped to the silicon. Days before Sonnet 5, OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom inference chip — an accelerator built from scratch for LLM inference, aimed at squeezing out far better performance per watt. That's not a vanity project. It's a recognition that if the model is a commodity, the durable advantage is in serving it cheaper than anyone else — which means owning the chip, the rack, the power contract, and the fab allocation.

The model was never your moat. In 2026 it stopped being the lab's moat too. The moat is whoever can run inference for a tenth of a cent less — and that's decided in a foundry, not a prompt.

What that means if you build on top

You are renting compute from people who are now in an arms race to own the physical layer. You don't control the chip, the datacenter, or the price they set on top of them. So stop pretending the token price is a fixed input and start treating it as the volatile, upstream-controlled variable it is:

Right-size relentlessly. The default should be the smallest model that clears the bar, not the biggest one that's available. Reach for the small model first and reserve the expensive one for the calls that genuinely need it.
Route by difficulty, cache by default. Cheap model for the 90%, strong model for the hard 10%, and a cache in front so you never pay twice for the same answer. Cost is an architecture decision, not a billing surprise.
Keep two providers warm. Portability isn't just insurance against a model going dark — it's how you chase the cheapest adequate substrate as the price war plays out. Locked to one vendor, you eat whatever margin they need this quarter.
Own the parts that don't commoditize. Your data, your evals, your product, your taste. The chip and the model are becoming interchangeable utilities. What you build around them is the only thing that isn't.

The bottom line

The "which model is smartest" era is closing, and it's being replaced by a fight over who can serve intelligence cheapest — fought in fabs and datacenters you have no seat in. That's fine. You don't need a seat. You need to build like the thing you're renting is a commodity with a price you don't set.

Treat the model as interchangeable and the token price as upstream weather. Right-size, route, cache, stay portable — and pour your moat into the layer above the commodity, because that's the only layer that's yours.

Comments

No comments yet

Be the first to share a thought.