June 4, 2026
The labs are racing on price now, not IQ
For two years a flagship model reveal had one headline: we're the smartest, here's the benchmark we beat. At Microsoft Build 2026 the headline changed — same league as Opus, but ~10x more output per dollar and 60% fewer tokens. The boast moved from IQ to efficiency, and the whole industry is reorganizing around price, not peak capability. Here's why the axis flipped, and what it means if you build.
For two years, a flagship model launch had exactly one headline: we are the smartest. Here is the benchmark we topped, here is the rival we edged out by a point. Intelligence was the whole scoreboard. At Microsoft's Build conference in 2026, the headline was different, and the difference is the story.
What Microsoft actually bragged about
Microsoft unveiled its first in-house models, led by a reasoning model, MAI-Thinking-1, and on the benchmarks it lands respectably — 97% on AIME, 53% on SWE-Bench Pro, roughly alongside Opus. But notice that the capability number wasn't the pitch. The pitch was the price. Its companion coding model, MAI-Code-1-Flash, solves harder problems with up to 60% fewer tokens — lower latency, lower cost, what Microsoft kept calling "return on token." And Microsoft projected a 10x improvement in output tokens per dollar versus GPT-5.5. The boast moved from "smarter than" to "same quality, a tenth of the cost."
Why the axis flipped
Two forces pushed the competition off the IQ axis and onto the price axis, and I've written about both.
The first is the cost panic. When companies are burning a year's AI budget in four months, "10x cheaper at the same quality" is the sentence that closes the deal — not "two points higher on a benchmark nobody in finance has heard of." Cost became the binding constraint, so cost became the thing labs sell to.
The second is that intelligence is commoditizing. When a free open-weight model is already within a few percent of the frontier, being marginally smarter is worth almost nothing — but being dramatically cheaper at the same quality is worth a fortune. Literally: Google says its Gemini 3.5 Flash could save enterprises more than $1 billion a year, and it's cheap for structural reasons — Google runs its own chips and a token flywheel so large it improves efficiency as it scales. A billion dollars is a better headline than a benchmark point.
The whole industry is reorganizing around price
This isn't one keynote. It's the shape of the field now. Microsoft built its own models specifically to stop paying OpenAI's bill and offer cheaper inference on Azure. Google leans on its own TPUs to undercut on serving cost. NVIDIA's next chip platform is being sold on a 10x reduction in inference cost, not a 10x jump in capability. And per-token prices have been falling on the order of 200x per year. The race is an efficiency race, top to bottom.
What it means if you build
Here's the good news, and it rewards exactly the discipline I keep arguing for. If you never hard-coded the model and never bet your moat on one, then every one of these efficiency releases is a free margin upgrade. A 10x-cheaper model at equal quality ships, you change a config value, and your token bill drops while your product stays the same. You don't have to do anything clever — you just have to be swappable, so the price war happens for you. The teams that welded themselves to one premium frontier model for everything are the ones getting squeezed while the rest of us collect the savings.
The honest caveat
This is not "frontier intelligence stopped mattering." For the genuinely hard slice of a problem, you still want the best brain, and top-end reasoning is still priced like a luxury. What changed is the center of gravity: the marginal IQ point got cheap to match, and the marginal dollar got expensive to waste. So the right shape is the one I described before — a smart model for the hard 10%, cheap efficient models for the rest, and nothing hard-wired, so you can drop in each new cheaper option the day it lands.
The leaderboard everyone screenshots still ranks IQ. But the race the labs are actually running has moved to price, because "who's smartest" is becoming a settled, commoditizing question and "who's cheapest at good-enough" is the open one worth a billion dollars a year. Build so that when the answer changes next quarter — and it will — you collect the savings by changing a line, not by rewriting your product.
Comments
No comments yet
Sign in to join the conversation.
Be the first to share a thought.