
Stop Writing Supervisors, Write Handoffs: The Three-Sentence Prompt That Lets One Agent Surrender Control Without Losing the Thread
Microsoft's new Handoff orchestration pattern proves the routing decision isn't architecture, it's a prompt. Here's how to write it so your agents don't deadlock or drop context.
Here's the thing most teams miss when they wire up their first multi-agent system: the routing logic you think you need to build in code is actually three sentences of prompt text.
When Microsoft shipped the Handoff pattern in Agent Framework, and when OpenAI folded handoffs into the Agents SDK, they both made the same move under the hood. A handoff is a tool call. The framework auto-generates a synthetic tool, conventionally named transfer_to_<agent_name>, and hangs it off each agent for every other agent it's allowed to reach. The agent decides when to call it. That's it.
So the question "which agent handles this next" isn't answered by a router class or a supervisor loop. It's answered by the wording in each agent's instructions that tells it WHEN to pull the transfer trigger. That wording is your agent handoff prompt, and it's where the real engineering lives.
Two ways an agent can ask another agent for help
Before you write a single transfer instruction, get this distinction straight, because picking the wrong one is the most common mistake in multi-agent orchestration patterns.
Agent-as-tool (the supervisor move): One agent stays in charge. It calls a specialist the way it would call any tool, gets a result back, and keeps talking to the user. The specialist does a bounded subtask and returns. Control never leaves the supervisor.
Handoff (the surrender move): One agent transfers full ownership of the conversation to another. The receiver now talks to the user directly. The original agent is out of the loop until or unless control comes back.
OpenAI states the rule plainly in their docs: use agents as tools when a specialist should help with a bounded subtask but should not take over the user-facing conversation. Use handoffs when routing itself is part of the workflow.
| Supervisor / agent-as-tool | Handoff | |
|---|---|---|
| Control flow | Centralized, returns to supervisor | Decentralized, passes to receiver |
| Ownership | Supervisor keeps it | Receiver takes full ownership |
| Who talks to the user | Only the supervisor | The specialist, directly |
| Use when | Specialist helps but shouldn't own the thread | Routing IS the workflow |
The handoff vs supervisor agent choice has a measurable cost too. Routing every message through a central supervisor runs 20 to 40 percent more tokens per task than an equivalent handoff setup, because the supervisor has to re-read and re-translate between the user and each specialist. LangChain's own benchmark found that translation step also drops accuracy. The user's words get paraphrased into the specialist and the specialist's answer gets paraphrased back. It's a game of telephone.
The three-sentence handoff
Once you've decided routing is the workflow, the per-agent instruction needs exactly three things. Skip any one and you get either a dropped context or an infinite loop.
The Prompt:
You are the triage agent for Fabrikam support. Answer general
questions yourself.
TRIGGER: If the user's request is about a refund, return, or a
charge on their bill, call transfer_to_billing_agent. For anything
involving account login or password problems, call
transfer_to_account_agent. Do not transfer for any other reason.
CARRY: Before you transfer, pass a one-line `reason` describing
what the user wants and any order number or account ID they already
gave you. Do not restate the full conversation.
GUARDRAIL: If a request comes back to you from a specialist, do not
transfer it to that same specialist again. Handle it yourself or
tell the user you'll escalate. Transfers happen silently; never
mention them to the user.
Why This Works: Each sentence maps to one failure mode. The TRIGGER sentence is the actual routing decision, stated as concrete conditions instead of vague intent, so the model isn't guessing. The CARRY sentence solves the context tax: passing a structured reason plus an ID costs 200 to 500 tokens, while forwarding the whole transcript at every hop costs 5,000 to 20,000. The GUARDRAIL sentence is what stops two agents from bouncing the same request back and forth forever.
Expected Output:
User: "I was double charged for order 4471." Triage agent (internally): calls
transfer_to_billing_agentwithreason: "duplicate charge on order 4471, wants refund". Billing agent (to user): "I see order 4471. You were charged twice on the 2nd. I've reversed the duplicate, you'll see it back in 3 to 5 days."
Notice the billing agent answered the user directly and it had the order number without the triage agent dumping the whole chat into the handoff. That's the carry clause earning its keep.
Why the prompt alone won't save you
Here's the part the hype skips. The three-sentence prompt decides the route. It does not enforce it.
LLM tool-calling is non-deterministic. There's a documented AutoGen bug where a Swarm agent was explicitly instructed to send a message before handing off, and the model kept bundling the message and the handoff call into the same turn. With no clean handoff target, it looped forever. The prompt was correct. The model didn't comply every time.
CrewAI teams hit a nastier version called delegation ping-pong: an LLM manager re-assigns a task in a loop, each agent behaves correctly on its own turn, and the loop is emergent. It burns tokens, fills the context window, and can crash the runtime with an out-of-memory error. Per-agent iteration limits don't catch it because each handoff resets the execution context.
So the honest rule is this: the prompt decides the route, the runtime enforces the rails. Your guardrail sentence reduces ping-pong. It does not eliminate it. Pair it with hard caps in code: a max-iteration limit (AutoGen's max_consecutive_auto_reply, LangChain and CrewAI's max_iterations), cycle detection, and for hierarchical setups, a cross-agent counter that enforces a global max-delegation-depth at the team level rather than per agent.
You also want a terminal state the prompt can name explicitly:
When the task is fully resolved, reply with "RESOLVED:" followed by
the outcome. If you cannot resolve it after three attempts, reply
with "ESCALATE:" and the reason. Do not transfer after escalating.
That gives your runtime a clean signal to stop, instead of waiting for agents to politely agree they're done.
The question to ask before any of this
Now the contrarian part, and it's the most useful thing here. Before you write a handoff prompt at all, check whether you need a second agent.
LangChain benchmarked single agents against both swarm and supervisor setups. When there was only one extra domain in play, the single agent slightly beat both multi-agent designs. The multi-agent versions only pulled ahead once there were two or more genuinely distinct domains the agent had to juggle. Worse, they found that adding context at each handoff sometimes degraded performance. More handoffs meant more accumulated context, and more context meant more errors.
So the carry clause isn't only about not dropping the thread. It's about carrying less. A tight 300-token summary often beats forwarding everything, both on cost and on accuracy.
Start with one agent. Measure it. Add a handoff only when you can point at a real second domain that the single agent is fumbling. Then write the three sentences, wire the runtime caps, and ship.
The Microsoft devblog frames the upgrade trigger well with a content pipeline example: a rigid researcher-to-writer-to-editor sequence has no built-in pause, so the writer can't ask the researcher for one more source and the editor can't stop to ask the user "British or American spelling?" The moment someone in your chain needs to ask a question mid-task, that's your signal that a handoff earns its place.
If your team is wiring up multi-agent systems and watching them deadlock or blow the token budget, that's a skill you can train. Want hands-on prompt engineering training for your team, including handoff design and agentic orchestration? Connect with Kief Studio on Discord or schedule a session. And to get new techniques like this in your inbox, subscribe at qurtoo.com/#subscribe.
Training
Want your team prompting like this?
Kief Studio runs hands-on prompt engineering workshops tailored to your stack and workflows.
Newsletter
Get techniques in your inbox.
New prompt engineering guides delivered weekly. No spam, unsubscribe anytime.
Subscribe
