Prompt Injection as Defense

How to impede agentic attackers

Apr 17, 2026

If most offensive cyberattacks will soon be conducted by AI agents, an entirely new realm of defensive opportunities opens up.

Anyone who has built or worked with AI agents knows they can be superhuman yet also quite dumb. They depend on the right prompting, context management, tool calling, and task planning to function at all. Of course, there are brilliant engineers working nonstop to close these gaps, and I believe they will. But the architectural properties of LLM-based agents introduce inherent weaknesses that human attackers don’t have. Defenders should exploit them.

The MITRE ATLAS framework maps adversarial attacks against AI systems. It’s a threat model for defending your own AI. But I propose flipping it - use it as a guide for how to attack agents trying to attack your network.

A few examples:

Prompt Injection -> Anything an attacker pipes into an LLM should be contested ground. Logs, error messages, READMEs, API responses, etc. Embed injections everywhere. Make attackers second-guess every bit of context they ingest.
Memory and State Corruption -> Agents with persistent memory or long-running task state are especially fragile. Honeypots and canaries have always existed, but now they can be tuned to exploit how agents plan. Corrupt the planning stage, and you’ve derailed every downstream step. You’ll also be forcing agents to burn tokens chasing fake leads.
Tool Poisoning -> Attackers living off the land rely on existing binaries. Agents trust the outputs of the tools they call. Poison those outputs (in a way that doesn’t deceive legitimate users).
Context Window Flooding -> What if we weaponized the “Lost in the Middle” problem? LLMs reliably lose track of information buried in long contexts. Flood the zone.
Corrupt the Validation Loop -> Well-built agents usually have self-checks, such as running tests or validating against initial requirements. If tests pass, an agent usually thinks it’s all good and moves on to the next task.

You may notice that much of this centers around deception! More on that in a future piece.

I believe the future of cyber defense is not to replicate the existing SOC structure with AI agents. It’s to build solutions that asymmetrically impose costs on attackers, however they evolve.

Now is not the time to mope about how AI-enabled attackers are going to go so much faster, and find so many more vulns, and move at such scale - we know they will. Agents introduce a whole class of vulnerabilities that humans don’t have. The goal isn’t perfect deception or stopping agents completely. It’s about impeding every step and imposing costs, making agentic attackers distrust their own decisions at every stage of the kill chain.

—

If you are building solutions to attack AI agents (for defense), I want to hear from you!

Jonathan Looi

Discussion about this post

Ready for more?