ARCHITECTURE · June 10, 2026

Your AI provider is going to have a bad day

Two reminders landed this week. Anthropic is retiring Claude Sonnet 4 and Opus 4 from the API on June 15 — if you pinned those versions, your calls just start returning errors, with no automatic failover. And this morning, Gemini went down. Same lesson from opposite directions: the model under your product is a third-party service that will, on a schedule you don't control, change, vanish, or break. The fix isn't strategy. It's the boring resilience engineering most AI products skip.

Two small news items this week say the same thing. First: Anthropic is retiring Claude Sonnet 4 and Opus 4 from the API on June 15. If your code calls those pinned versions, after that date the requests just return errors — there's no automatic failover to a newer model, the call simply fails. Second: this morning, Google's Gemini went down, serving "something went wrong" to everyone who depends on it.

One is a planned retirement, the other an unplanned outage, but they're the same lesson pointed two ways: the model under your product is a third-party service, and it will fail you on a schedule you don't control. We talk about AI like it's a magic oracle that's always awake and never changes. It's infrastructure now — and infrastructure has bad days. The question is whether yours takes your product down with it.

AI is infrastructure, and infrastructure goes down

This isn't rare. One status tracker logged 47 incidents across major AI providers in a single month late last year — Anthropic and OpenAI each with over 180 hours of total impact. OpenAI had a 34-hour outage in 2025. Even at a healthy 99.76% uptime, that's roughly 16 hours a year where the thing your product is built on simply isn't there. Add deprecations like June 15, where a model doesn't go down but goes away, and the picture is clear: depending on a single provider with no plan is depending on their best day, forever.

Most AI products are built exactly that way — one provider, one model string, called directly, with a happy-path assumption that it always answers. That's fine in the demo. It's a time bomb in production.

Know your failure type before you react

The first thing resilience engineering teaches is that not all failures are the same, and reacting wrong makes it worse. A transient failure — a 429 rate limit, a brief network blip — clears in seconds, and a retry fixes it. A persistent failure — a full provider outage, a model returning garbage for twenty minutes — won't clear by waiting, and retrying it just burns money and time for nothing. Blindly retrying every error is how a provider's bad hour becomes your runaway bill. So the layers stack: retry the transient, fall back on the persistent, and trip a circuit breaker when the whole thing is degrading.

The four moves that keep you up when they go down

You don't need anything exotic — this is standard distributed-systems hygiene, applied to the model call:

Retry with backoff for transient errors. Rate limits and network hiccups are normal; a couple of retries with increasing delay absorbs most of them without a human noticing.
A fallback model or provider for persistent ones. This is the big one, and the math is striking: two independent providers at 99.3% uptime are both down at once only 0.0049% of the time — about 99.995% effective uptime. A second path turns an outage into a slightly worse answer instead of no answer. (It's also why keeping the model behind a swappable seam pays off operationally, not just strategically.)
A circuit breaker so that when a provider is clearly down, you stop hammering it, fail fast, and recover cleanly instead of piling up timeouts.
Graceful degradation to a deterministic path. If the AI feature dies, don't show a spinner forever — fall back to something dumber that still works: keyword search instead of semantic search, a templated response instead of a generated one, a plain form instead of the smart flow. Tell the user "advanced features are briefly offline" and keep the product usable.

That last one matters most and gets skipped most. An AI feature with no non-AI fallback means your whole product's uptime is now capped by your flakiest dependency.

And put deprecation on your calendar

The June 15 retirement is the quieter failure mode, and it catches people who thought they were safe. Pinning a model version feels responsible — you froze the behavior. But a pinned version is a version that can be retired, and when it is, your call doesn't degrade, it dies. So either track the deprecation notices for every model you pin, or don't pin a version you're not prepared to migrate off on the provider's timeline. Migrating is usually a one-line change — but only if you find out before June 15, not from your error logs on June 16.

The bottom line

The exciting story about AI is everything it can do. The unglamorous story, the one this week quietly told twice, is that it's a remote service that will be slow, down, or gone at moments you didn't pick — and a product that assumes otherwise is one bad morning away from an incident. None of the fixes are clever. Retry the blips, fall back on the outages, degrade to something deterministic, and watch the deprecation calendar. It's the same boring reliability work we'd do for any critical dependency — and the model is now exactly that.

So before you ship, ask the question the demo never forces you to: when my provider has its bad day — and it will — what does my user see? If the answer is "an error and a spinner," you haven't built a product on AI. You've built one that borrows its uptime from someone else, and they didn't promise you their good day.

Comments

No comments yet

Be the first to share a thought.