All notes
Microsoft sent 100 agents to hunt bugs — AI vs AI security, honestly

June 5, 2026

Microsoft sent 100 agents to hunt bugs — AI vs AI security, honestly

This week Microsoft showed a security team made of AI: a pipeline of 100+ agents that found 16 new Windows vulnerabilities, four of them critical, plus the first AI to auto-convict malware. The defenders now run autonomous AI. So do the attackers — one ran 80–90% of a real intrusion on its own. 'AI vs AI security' stopped being a slogan this spring. Here's the honest read: it's real progress, and a faster stalemate.

At Build this week, Microsoft showed off a security team made entirely of AI. Its system — a pipeline of more than 100 specialized agents — hunted through the Windows networking and authentication code and found 16 new vulnerabilities, four of them critical remote-code-execution holes in the kernel's TCP/IP stack. A sister project, Ire, became the first AI to build a malware-conviction case strong enough to block it automatically. The defenders now run autonomous AI.

So do the attackers. "AI vs AI security" stopped being a slogan this spring, and it's worth being honest about what that actually means.

Both sides are real now

The symmetry is the story. On defense: Microsoft's bug-hunters above, plus Google's Big Sleep agent, which caught a live zero-day in SQLite before anyone exploited it — a genuine first — and Anthropic's Mythos, which found over 6,000 critical vulnerabilities across a thousand open-source projects.

On offense: an AI agent has run 80–90% of a real intrusion on its own, ransomware has been compressed from break-in to data theft into 25 minutes, and autonomous agents now account for 1 in 8 reported AI breaches. Same capability, both directions, at machine speed.

What the winning defense actually is

Here's the part I find most useful, and it's not the headline. Look under the hood of Microsoft's system and it is not one genius model. It's 100+ specialized agents with a panel of models — heavy reasoners for the hard judgment, cheaper models for the high-volume scanning, traded off for speed and cost. That is exactly the architecture I keep writing about — orchestration of many narrow agents and a cheap model for the 90% — pointed at finding bugs. The thing topping the security benchmark isn't a magic model. It's good engineering of a swarm.

The honest read: it raises the ceiling, not the floor

AI on defense finds new vulnerabilities faster than any human team ever could. What it does not do is patch the old ones, rotate the leaked password, or turn on MFA. And that's still where most breaches happen — the headline incidents of 2026 fell to weak credentials and missing basics, not exotic exploits, exactly the point from last week. So AI-vs-AI is the new top layer of the fight, and the boring hygiene layer underneath still decides most of the games.

There's also the asymmetry I wrote about: the best defensive AI — Microsoft's, Anthropic's — lives inside big vendors and gated programs, while the offensive capability proliferates to anyone with an API key. So "AI is defending us now" is fully true if you're inside the control plane, and aspirational if you're a small shop. The treadmill isn't level for everyone.

What to take from it

Don't read "AI now hunts bugs" as a shield that arrived. Read it as the tempo of the fight changing: both sides got faster at once, so the relative gap barely moved, and staying even now needs AI on your side too. Three practical things follow:

  • Use the AI defense you can actually get. A lot of it is shipping into tools you already pay for — Defender, GitHub Code Security — not just gated research programs. Turn it on.
  • Assume your attacker has an autonomous one. Design for an adversary moving at machine speed, because some of them already are.
  • Keep doing the boring hygiene. The swarm-vs-swarm bug-hunting is the new top of the contest, not a replacement for the bottom, where most breaches still live.

It's the best and most exhausting kind of progress. The defenders finally got a tool as fast as the attackers — and Big Sleep catching a live zero-day before it went off is a real save, not a demo. But all that speed bought, at the macro level, is a faster stalemate at a higher altitude. AI hunting bugs is genuinely good. It's a new floor under the arms race, not the end of it. The contest didn't get easier. It got faster — for everyone, on both sides, at once.

Comments

No comments yet

Sign in to join the conversation.

Be the first to share a thought.