All notes
One model for everything is ending

June 10, 2026

One model for everything is ending

Microsoft just shipped seven AI models at once — not one bigger brain, but a reasoning model, a coding model, a transcription model, a voice model, and more, each built for a single job. Meanwhile the frontier generalists keep getting more capable. Both things are true, and the gap between them is the point: the headline race is about one model doing everything, but the thing that actually works in production is a curated stack of specialists. Picking 'the best model' is the wrong question now.

Two stories from this month sit oddly next to each other. Anthropic released Claude Fable 5, its most capable model available to the public — the generalist frontier, climbing as always. And Microsoft shipped not one model but seven at once: a reasoning model, a smaller reasoning model, a coding model, an image model, a transcription model, and two voice models — each built for one capability, not for general breadth.

Those aren't contradictory; they're the two halves of where this is going. The race everyone watches is about one model getting better at everything. The thing that actually ships in production is the opposite: a stack of specialized models, each doing the one job it's best at. And once you see that split, "which model is the best" stops being a sensible question.

The era of the single do-everything model is closing

For a couple of years the mental model was simple: there's a best model, you use it for everything, and when a better one comes out you switch. That's quietly ending. As one roundup put it, the field has fragmented into specialized models, each dominating a modality — and the days when one model handled everything are gone. The practitioners building real systems increasingly run a curated stack of models chosen per task, not a single anointed champion.

Microsoft's seven-model launch is that thesis made concrete by the company that sells to the most enterprises on earth. They didn't try to build one model that transcribes, reasons, codes, and generates voice equally well. They built a transcription model that's great at transcription and a coding model tuned to be cheap and fast at coding, and designed them to work together. That's not a hedge — it's a statement that the best results come from specialists, assembled.

Why "the best model" is the wrong question

Here's the trap the single-model mindset walks into. A general-purpose frontier model is, almost by definition, overqualified and inefficient for most individual jobs. Using your most powerful reasoning model to transcribe audio or reformat a date is like hiring a surgeon to apply a band-aid: it works, and it's absurd. The Cannes AI festival surfaced exactly this — enterprises aren't failing because AI isn't powerful enough, but because they keep forcing general-purpose models into production systems that punish inefficiency.

So the question "which model is best" has no answer, because it's missing the second half: best at what. The right transcription model is bad at coding. The cheapest code model is useless at vision. The frontier generalist is the strongest at the genuinely hard, open-ended stuff — and overkill, slow, and expensive everywhere else. This is the model-layer version of the same lesson that's true for agents: a narrow tool built for one job beats a general one trying to do all of them.

Build a stack, not a bet

The practical move is to stop choosing a model and start composing a stack — and this is exactly what the routing pattern is for. You map jobs to models: the specialist where it wins, the cheap model for the routine, the frontier generalist reserved for the hard core. A few principles:

  • Match the model to the job, deliberately. Transcription to a transcription model, bulk classification to a small fast one, the genuinely hard reasoning to the frontier. Don't default everything to your most powerful model — that's the expensive, slow way to be lazy.
  • Route, don't pick. A product isn't one model call; it's many. Send each one to the smallest, most suitable model that handles it, and escalate only when you must. The stack, not the single choice, is the architecture.
  • Save the generalist for what only it can do. The frontier model earns its cost on open-ended, multi-step, novel problems. For everything narrow and well-defined, a specialist is faster, cheaper, and often better.
  • Keep each slot swappable. Specialists get replaced even faster than generalists. Behind a clean seam, swapping the transcription model or the code model is a config change, and your stack quietly improves part by part.

The bottom line

The headlines will keep crowning a "best model," because a single climbing number is a good story. But the way value actually gets built is moving the other way: toward a stack of specialists, each excellent at one thing, composed into a system — exactly what Microsoft signaled by shipping seven models instead of one. The frontier generalist still matters; it's just one slot in the stack now, not the whole answer.

So when you reach for "the best model," stop and ask the better question: best at which of the jobs my product actually does? Answer that per task, assemble the specialists, and route between them — and you'll get a product that's cheaper, faster, and better than one trying to make a single brilliant generalist do everything. The one-model era was simpler. The stack era is just better.

Comments

No comments yet

Sign in to join the conversation.

Be the first to share a thought.