ARCHITECTURE · June 15, 2026

The pilot was cheap. Production won't be.

Your AI pilot ran great and cost almost nothing. That number lied to you. When teams take an AI feature from pilot to production, infrastructure costs routinely run three to five times the original projection — and it's a big reason 95% of generative-AI pilots never turn into anything that shows up in the P&L. The pilot is cheap because it's small, watched, and runs on the easy cases. Production is none of those. Here's where the 3-to-5x hides, and how to price it before it ambushes you.

The AI pilot goes beautifully. It works, the demo lands, and the bill is so small it's almost a rounding error. So you greenlight production with that number in your head — and that number is the most misleading thing in the whole project. When teams actually scale an AI feature, costs routinely come in at three to five times the original projection, and that gap is a major reason 95% of generative-AI pilots never produce a measurable financial result.

This isn't bad luck or sloppy estimating. The pilot is cheap for structural reasons, and every one of those reasons disappears the moment you go to production. If you don't know where the 3-to-5x comes from, it looks like the project got more expensive. It didn't. It just stopped hiding its real cost. Let me show you where it lives.

A pilot is cheap because it's small, watched, and easy

Three things make a pilot inexpensive, and all three are temporary. It's small — a handful of users, a trickle of requests, a token bill you barely notice. It's watched — a human is right there to catch the weird outputs, so you don't yet need the guardrails, retries, and monitoring that catching them automatically requires. And it runs on the easy cases — the clean, happy-path inputs you naturally test with first.

Production inverts all three. Small becomes thousands of requests a day, and the token bill scales with it linearly and forever. Watched becomes unwatched, so now you're paying for the monitoring, the fallback logic, the second model that checks the first. And the easy cases become the real ones — messy, long, adversarial inputs that need bigger context windows, more retries, and more expensive calls to get right. None of that was in the pilot. All of it is in the bill.

The multipliers nobody puts in the estimate

The 3-to-5x isn't one big surprise. It's a stack of quiet multipliers, each reasonable, that compound:

Retries and failures. Real inputs fail and get retried. Every retry is another paid call, and at scale the failure rate is never zero.
Context growth. The happy-path prompt was short. Real requests drag in history, documents, and context, and you pay per token for all of it, every time.
The checking layer. Production needs to catch its own mistakes — a second model, a validation pass, a guardrail. That can double the calls behind a single user action.
Edge cases that need the expensive model. The easy 80% runs on a cheap model. The hard 20% quietly gets routed to the pricey one, and it's a bigger share of real traffic than of pilot traffic.

Each of these is sensible on its own. Stacked, they're how a pilot that cost pennies becomes a production system that costs real money.

How to price it before it ambushes you

You can't make production as cheap as the pilot, but you can stop being surprised by it:

Estimate cost per request on the hard cases, not the easy ones. Price the messy, long, retried request — that's what production actually looks like.
Add the checking and monitoring calls to your math. If catching mistakes doubles your calls, put the doubling in the estimate now, not in the invoice later.
Multiply by realistic volume, then add a margin. Take your honest per-request cost, scale it to real traffic, and assume it lands higher than that. Planning for 3-to-5x is planning for what usually happens.

A production AI feature can absolutely be worth its cost. But only if you knew the cost going in.

The bottom line

The cheap pilot is the most expensive lie in AI projects, because it sets an expectation production can't meet and makes the real number look like a failure instead of the truth.

A pilot is cheap because it's small, supervised, and tested on the easy cases — and production is none of those, which is where the 3-to-5x comes from. Price the hard case, count the hidden calls, and plan for the multiplier up front. The teams that get ambushed aren't the ones who spent too much. They're the ones who believed the pilot.

Comments

No comments yet

Be the first to share a thought.