fedorthinks
All notes

SECURITY · June 23, 2026

A fake bug report hijacked the coding agent

Security researchers showed a new attack called 'Agentjacking': send a fake error to a company's Sentry, and its AI coding agent reads the 'fix steps' and runs them — handing an attacker your credentials, with your own privileges. Claude Code, Cursor, and Codex all fell for it in testing. The lesson is bigger than one tool: every untrusted thing your agent reads is a place someone can inject commands.

A fake bug report hijacked the coding agent

Here's an attack that should change how you think about agents. Researchers at Tenet Security disclosed "Agentjacking" — and The Hacker News covered it this month. The setup is almost too simple: an attacker sends a fake error report into a company's Sentry (the error-tracking tool), using only a public key that's easy to find. The fake error contains "resolution steps." When the team's AI coding agent reads the error to help fix it, it runs those steps — which are actually the attacker's commands — on the developer's machine, with the developer's own privileges.

In testing, Claude Code, Cursor, and Codex all did it. They couldn't tell a real error from a planted one. The payoff for the attacker: environment variables, AWS keys, npm tokens, git credentials, private repo URLs. Tenet found 2,388 organizations with exposed, injectable keys.

Your guardrails didn't catch it

The scary part is what didn't stop it. Per the research, EDR, WAF, IAM, VPN, Cloudflare — and even an explicit system-prompt instruction telling the agent to distrust external data — all failed to block it. And Sentry, asked to fix it at the platform level, called the issue "technically not defensible."

Telling the model "don't trust external data" is not a security control. It's a suggestion the model is free to ignore.

That's the uncomfortable truth this attack exposes: you can't prompt your way to safety. If the agent can read attacker-controlled text and run commands, the prompt instructions in between are not a wall.

The real lesson: every input is an injection surface

It's tempting to file this under "a Sentry bug." It's not. The error tracker is just the doorway. The pattern is general: anything your agent reads that an outsider can influence is a place to inject commands — error logs, support tickets, issue comments, web pages it browses, the output of another tool, an MCP server's description. I've written about this from other angles — a web page can give your agent orders — and Agentjacking is the same lesson with a new doorway.

What actually protects you

You don't fix this with a better prompt. You fix it with architecture:

  • Separate reading from doing. The part of the agent that ingests untrusted text should not be the part that can run commands. Put a real boundary between them.
  • Sandbox the execution. Run agent commands in an isolated environment with no access to real credentials, cloud keys, or your repo's secrets. If it gets hijacked, it gets nothing.
  • Least privilege, always. The agent should only reach what the task needs. Most of the loot in this attack was credentials the agent didn't need to see.
  • Human approval for anything that runs or sends. Reading is cheap; executing and exfiltrating are where you put the gate.

The bottom line

Agentjacking isn't really about Sentry. It's a clean demonstration that an agent which reads untrusted text and runs commands is one crafted message away from working for the attacker.

Treat every input your agent reads as hostile, separate "read" from "execute," and sandbox what it runs — because you cannot prompt your way out of injection. The doorway will keep changing. The fix is the boundary you put behind it.

Comments

No comments yet

Sign in to join the conversation.

Be the first to share a thought.