All notes
Tokenmaxxing, or Goodhart's law comes for AI

June 4, 2026

Tokenmaxxing, or Goodhart's law comes for AI

Companies wanted 'AI adoption,' so they measured the easiest proxy — token usage — put it on a leaderboard, and got exactly what they measured: people burning tokens to climb the board, not to do better work. It's a fifty-year-old law eating a brand-new strategy, and now it's expensive twice: you pay for the wasted tokens and you poison the signal you wanted. The fix is old too — measure outcomes, not activity.

In 2026, several big tech companies started ranking their own employees by how many AI tokens they used. Amazon built an internal leaderboard; workers responded by running pointless, low-value tasks all day to inflate their scores — not because the work needed doing, but because activity moved the number. At Meta, an employee built a dashboard called "Claudeonomics" that ranked the company's ~85,000 workers by token consumption; in one 30-day window it logged over 60 trillion tokens (neither Zuckerberg nor the CTO cracked the top 250). People started calling it tokenmaxxing, and it is one of the cleanest management own-goals I've seen in years.

This is just Goodhart's law, on a faster meter

What happened here isn't new or mysterious. It's a fifty-year-old principle called Goodhart's law: when a measure becomes a target, it ceases to be a good measure. The instant "tokens used" went from a number someone glanced at to a number people were ranked on, it stopped measuring productivity and started measuring something else entirely — the human capacity to game a leaderboard. As one analysis put it, the metric now measures exactly that, and nothing more.

Why did smart companies walk straight into it? Because "AI adoption" is what the board wanted to see, and usage is the easiest thing in the world to count. Tokens are visible, countable, dashboard-able. Whether the AI actually made the work better is hard to measure. So they measured the easy proxy instead of the hard truth — and got the proxy, maximized. Put plainly: when you measure usage, you get waste.

We have made this exact mistake forever

If this feels familiar, it should. Pay engineers by lines of code and you get bloated, padded code. Set commit quotas and developers split one change into five fragments. Rank a call center by handle time and you get customers rushed off the phone with nothing solved. Tokenmaxxing is the AI-era reskin of the oldest management mistake there is: rewarding activity because it's easy to count, and getting activity instead of results. Nothing about it is new except the meter runs faster.

Except now the vanity metric also burns money

Here's what makes the AI version worse than lines of code. A bad metric used to just waste effort. This one literally burns cash — every gamed token is a token you paid for, which is no small part of why the 2026 cost panic got as loud as it did. So you pay twice: once for the wasted compute, and again in the corrupted signal. You bought a number that tells you nothing about whether AI is helping, and you put it on your own invoice. That's a remarkable way to spend money — funding the destruction of your own data.

The fix is the boring right answer: measure outcomes

The way out isn't a cleverer usage metric. Usage is the wrong thing to measure at all. You have to measure outcomes — did the work actually get done, is the result good, did the customer's problem get solved — which is harder to count and much harder to game. The data backs this up: organizations that connected AI to real outcomes were nearly four times more likely to report AI-driven revenue growth than those still in pilots — and the differentiator was explicitly not who had the most usage.

This is the same discipline I keep arguing for with models, pointed at your org instead. You don't judge an agent by how many times it ran; you judge it on a held-out set of real outcomes. Same rule for people and teams: reward the result, not the token count. And there's a quiet human cost a usage dashboard will never surface — surveys found roughly a quarter of workers would consider leaving over being pushed to use AI in ways they didn't believe in, while only a tiny fraction of employers noticed any resistance at all. People comply visibly and resist quietly, and your leaderboard cheerfully reports the compliance as success.

The takeaway

AI didn't break your metrics. It took the oldest measurement mistake there is and made it faster and more expensive. Usage was never the goal — better work was. The rule is unchanged and unforgiving: whatever number you put on a dashboard, someone will optimize it, so make very sure that number is the thing you actually wanted and not just the thing that was easy to count. Measure whether the work got better — or pay top dollar to watch a leaderboard climb while nothing does.

Comments

No comments yet

Sign in to join the conversation.

Be the first to share a thought.