June 4, 2026
The bill came due
For two years the cost of AI felt like a rounding error. In 2026 the invoice arrived — Uber burned a year's AI budget in four months, Microsoft yanked Claude Code from its own engineers, JPMorgan says tokens are eating internet profits. This isn't a blip. Token billing inverts the economics software was built on, and the cruel twist is that a better agent costs more. Here's what actually changed, honestly.
For about two years, the cost of using AI was something you didn't think about. In a demo it was pennies. In a pitch deck it was a footnote. Everyone built as if the intelligence in their product was, for all practical purposes, free — call the best model for everything, let the agent loop as long as it likes, scale first and worry about the bill never.
In 2026 the bill arrived, and it is not a footnote.
The cost panic is real
The numbers this spring stopped being abstract. Uber burned through its entire annual AI budget in four months. Microsoft quietly revoked Claude Code licenses for most of its own engineers and pushed them to a cheaper in-house tool. An AI consultant told Axios that one of its clients spent half a billion dollars in a single month on Claude. JPMorgan published a note with the cheerful title "AI Token Costs Are Eating Up Internet Profits." And only 14% of CFOs say they can see a clear, measurable return on what they're spending.
The bluntest version came from an Nvidia executive, quoted as AI costs blew past payroll: the cost of compute is now far beyond the cost of the employees. Sit with that. The tool meant to make people cheaper became more expensive than the people.
Why this isn't a blip
It would be comforting to call this a temporary spike — prices will fall, it'll sort itself out. Per-token prices are falling, and it still won't save you, because the problem is structural. Token billing inverts the economics that software was built on.
The magic of software was near-zero marginal cost. You build it once, and the ten-thousandth user costs you almost nothing more than the first. That's why "scale now, monetize later" works — serving more people is nearly free. AI breaks that. Every single use burns tokens, so cost rises with usage and never flattens. Worse, an agent isn't one model call — it loops, plans, calls tools, reflects. Agentic AI can consume up to 1,000× more tokens than a simple query, and Goldman Sachs expects agents to drive a 24-fold rise in token demand by 2030.
Here's the part that should rewire how you think: with tokens, making the agent better often makes it cost more. More reasoning, more steps, more context — more capability — is more tokens. The usual software instinct, "make it smarter," now has a meter attached. One analysis found the token tax eats 23% of revenue at scaling-stage AI companies, locking gross margins about 30 points below the SaaS norm. That gap isn't a bug you optimize away. It's the new physics.
The part we did to ourselves
Now the honest half, because the panic isn't only the model's price — a lot of it is self-inflicted. Several companies incentivized token burn. Meta and others ran internal leaderboards ranking employees by how many tokens they used, and people did the rational thing: they threw everything at the agents all day to climb the board. They called it "tokenmaxxing," and it produced exactly what it measured: maximum spend, not maximum value. The analytics firm Faros AI even found that under heavy AI adoption, "code churn" — lines written and then deleted — jumped more than 800%. A lot of those expensive tokens generated code that was thrown straight back out.
So the bill is two things stacked: the real, structural cost of inference, and a discipline failure on top of it — treating a metered resource as if it were free, and even rewarding people for wasting it.
What it actually means
This is not "AI doesn't work." It's "AI was never free, and we built two years of products pretending it was." The correction is healthy, and it kills exactly the right habits. The lazy playbook — point the most expensive model at everything, let agents loop unbounded, never look at the meter — is what's dying. What replaces it is the oldest virtue in engineering: efficiency, treated as a feature instead of an afterthought.
And notice who isn't panicking. The teams that were already routing the boring 90% of work to cheap models, grounding so the agent doesn't burn tokens flailing, and keeping agents narrow instead of letting one god-agent loop forever — they priced this in from the start, because they treated tokens as the real cost they always were. The cost panic is mostly a reckoning for everyone who didn't.
The era of cheap demos is over. Every AI product now has to answer the question it should have answered all along: is this worth the tokens it burns? That's not a crisis. That's just the math showing up — a little late, and very loud.
Comments
No comments yet
Sign in to join the conversation.
Be the first to share a thought.