Security

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz

Published February 23, 2023 arXiv: 2310.03025 View on arXiv → PDF →

TL;DR

The paper that named and benchmarked indirect prompt injection. Demonstrated end-to-end attacks against Bing Chat, GitHub Copilot Chat, and other production LLM integrations via poisoned web content, email, and code comments. The practical wake-up call for agent security.

Why it matters

If you ship an agent that reads untrusted content -- fetching URLs, summarizing emails, processing uploaded documents -- you are exposed to this attack class. The paper documents real exploits against major products, not theoretical risks. It's the canonical reference for why LLM apps need real security engineering, not "the prompt says to be good" as a control.

How you'd use this

Read this before shipping any content-fetching agent. Pair the findings with OWASP LLM Top 10 and then design layered defenses: privilege isolation, content delimiting, output constraints, user-in-the-loop for destructive actions.

Read the authors' abstract

We show that real-world LLM-integrated applications are vulnerable to indirect prompt injection attacks where adversaries embed malicious instructions in content consumed by the model.