All notes
Your model has values baked in — and you inherit them

June 8, 2026

Your model has values baked in — and you inherit them

Anthropic refused to let the Pentagon use Claude for mass surveillance or autonomous weapons. The Defense Secretary called it 'arrogance' and an attempt to 'seize veto power' over the military, declared the company a supply-chain risk, and cut ties. Whatever you think of who's right, the fight exposes something every builder glosses over: a model isn't a neutral tool. It ships with refusals, limits, and a worldview its maker chose. Pick a model and you've quietly adopted its values — they become your product's values too.

One of the strangest fights in tech this year is between Anthropic and the Pentagon. The short version: the Department of Defense wanted to use Claude in ways Anthropic wouldn't allow — including processing Americans' commercial data and operating weapons systems — and Anthropic said no. The company's position is that Claude was not built for lethal autonomous weapons without human oversight, nor to spy on US citizens, and using it that way is an abuse of the tool. The Defense Secretary's response was blunt: he accused Anthropic of trying to seize "veto power over the operational decisions of the United States military," declared the company a "supply chain risk," and ordered contractors to cut ties.

You can argue all day about who's right. The point I want to make is underneath that argument, and it's one almost every builder ignores: the model had an opinion, baked in, that overrode what its biggest potential customer wanted. A model is not a neutral tool. And if you build on one, its opinions are now yours.

There is no values-free model

We talk about models as if they were calculators — neutral machines that just process input. They aren't. Every model ships with a set of refusals, limits, and defaults that its maker deliberately chose: what it won't help with, what it hedges on, what it treats as harmful, what worldview it quietly assumes when a question is ambiguous. Those choices are values, and they differ from model to model. One will refuse a request another answers happily. One leans cautious, one leans permissive. None of them is the "default neutral" one, because that doesn't exist — someone decided where every line goes.

The Anthropic–Pentagon clash is just the loudest possible version of this: a values decision so firm it cost the company a defense contract and got it branded a national risk. But the same thing operates quietly inside every model you might build on, on a thousand smaller questions, every day.

When you pick a model, you adopt its refusals

Here's why this matters for you specifically, even if you'll never go near a defense contract. When you wire your product to a model, you inherit its values wholesale. Its refusals become your product's refusals. If the model won't discuss something your users legitimately need, your product won't either — and they'll blame you, not the lab. Its blind spots and biases become yours. Its idea of what's "appropriate" silently becomes your app's policy, whether or not you ever chose that policy.

This is a different axis from the one I usually push. I've argued the model is a commodity you should keep swappable for price and capability. True — but the swap isn't values-neutral. Two models at the same price and benchmark can have meaningfully different personalities and limits, and switching between them quietly changes what your product will and won't do. The brain is interchangeable; the opinions that come with each brain are not identical.

And the values aren't even stable

There's a twist that makes this stranger. The refusals you're relying on may not stay put. With open-weight models, the guardrails live in the weights — and this year a free tool demonstrated it could strip the safety protections out of open models from Meta, Google, and others in under ten minutes on a normal laptop. So a model's "values" are both real enough to lose you a contract and fragile enough to be removed by someone downstream. If your safety story is "the model refuses bad stuff," remember that the refusal is a component, not a law of nature — it can be present in one deployment and gone in another.

What to actually do about it

You can't make a model neutral, but you can stop being surprised by its values:

  • Know your model's positions before you ship. Probe what it refuses, where it hedges, what it assumes on ambiguous questions. Those behaviors are now your product's behaviors; you should discover them, not your users.
  • Choose the values, not just the benchmark. When you pick a model, you're picking a stance. Match it to your use case on purpose — a permissive model and a cautious one are different products, not just different scores.
  • Don't outsource your own policy to the model's defaults. If something genuinely matters for your users, enforce it yourself — at your layer — rather than hoping the model's built-in line happens to sit where you need it. The model's line will move; yours shouldn't.
  • Treat "the model handles safety" as a starting point, not the answer. Its refusals can be inconsistent, and on open weights, removable. Your guarantees have to live somewhere you control.

The bottom line

The Anthropic–Pentagon fight will be remembered as a story about AI and the military. But the quieter lesson is for everyone building on top of these models: there is no view-from-nowhere model. Each one carries a set of values its maker chose, strong enough to refuse the most powerful customer on earth — and when you build on it, you've adopted those values as your own, usually without noticing.

So choose with that in mind. You're not just picking the smartest or cheapest brain. You're picking whose judgment about what's allowed gets embedded in your product. Pick deliberately, find out what it believes before your users do, and keep the lines that truly matter on your side of the wall — because the model's values are real, they're not yours by default, and they were never neutral.

Comments

No comments yet

Sign in to join the conversation.

Be the first to share a thought.