sub-agent prompting April 21, 2026 • 6 min read

Sub-Agent Delegation Prompts for Claude 4.7's 1M Context: When to Spawn and When to Stay

Stop burning your million-token window on grep output. Prompt patterns that decide what gets delegated.

A 1M-token context window feels like permission to dump everything into the main agent. Repo tree, full files, all the grep output, every test log. Why spawn a sub-agent when you have room?

Because room is not the point. Signal density is.

Anthropic's own analysis of their multi-agent research system found that token budget alone explained about 80% of performance variance. But the same post makes a quieter point: orchestrators that accumulate noise reason worse, even when they haven't run out of space. A 1M window filled with 400k of grep output has 400k tokens of distraction weighing on every subsequent decision. Sub-agents exist to keep the orchestrator's working memory clean, not to work around a size ceiling.

This post is about the prompts that make that split work. When to delegate, when to stay, and how to write return-format contracts that don't just shove the noise back up one level.

The delegation decision rule

Here's the rule I use. Delegate when the task produces more tokens than the orchestrator needs to read. Stay when the task requires reasoning about the orchestrator's in-progress state.

That's it. Everything else is a refinement.

A file search produces 200 matches you'll scan once and discard. Delegate. A refactor that depends on three decisions you just made two turns ago stays in the orchestrator, because shipping that context to a sub-agent is more expensive than just doing the work.

Put this rule in the orchestrator's system prompt explicitly. Models don't default to it.

The Prompt:

You are an orchestrator agent coordinating work across sub-agents.

DELEGATION RULES:
- Delegate read-only search and discovery tasks (grep, file listing, "where
  is X defined") to the Explore sub-agent. Do NOT read the output directly
  into your context. Ask for a summary.
- Delegate verification tasks (does this test pass, does this file compile,
  does this API return what we expect) to a worker sub-agent. Ask for a
  pass/fail and one-line reason, not the full output.
- Stay in-context for: decisions that depend on your in-flight reasoning,
  single-file edits, one-line fact lookups, any task where delegation
  overhead exceeds the token savings.

When you delegate, you MUST specify:
1. The exact question to answer
2. A return-format contract (what shape, what length)
3. What NOT to return (raw tool output, reasoning traces, full file dumps)

Why This Works: Most orchestrators over-fan-out or under-delegate because the default behavior wasn't written down. Making the rule explicit, with concrete examples of "stay" cases, prevents the model from spawning 50 sub-agents for trivial queries (a real failure mode Anthropic had to patch with effort-scaling rules in their own prompts).

Expected Output:

The orchestrator now triages each incoming task against the rule before acting. For "find all callers of parseInvoice," it spawns an Explore sub-agent with a word-limited brief. For "given the three decisions we just made, draft the migration," it stays in-context and writes the migration itself.

The return-format contract

A sub-agent without a return contract will hand you back the problem you delegated, plus extra framing.

Every delegation prompt needs four things. What to answer. What shape to answer in. A length cap. A "what NOT to include" list.

The Prompt:

Task: Find every file that imports `legacyAuth` and classify each as
(a) safe to migrate now, (b) blocked by a TODO or test gap, or
(c) already migrated but import not yet removed.

Return format:
- A markdown table with columns: file path, classification, one-sentence reason
- Under 400 words total
- Sort by classification, then alphabetically by path

Do NOT return:
- The grep output itself
- File contents
- Your reasoning about individual files beyond the one-sentence reason
- Any preamble like "I'll start by..." or "Here are the results"

If you find fewer than 5 files, return the table anyway and note the count.
If you find more than 50, stop and return a summary with the first 20.

Why This Works: The "Do NOT return" list is the load-bearing part. Without it, the sub-agent defaults to showing its work, which is exactly the noise you paid the delegation overhead to avoid. The overflow clause (fewer than 5, more than 50) prevents the sub-agent from making the wrong judgment call under edge conditions.

Expected Output:

A clean 18-row table. The orchestrator reads it in one pass, makes a migration plan, and never sees the 200 grep hits, the file contents, or the sub-agent's internal deliberation about whether auth_test.ts counts as a test gap.

The artifact return pattern

For anything larger than a short report, skip the return payload entirely. Have the sub-agent write to disk and return only the path plus a headline.

This is what Anthropic's research system does, and it inverts the common pattern of "return a summary to the parent." The orchestrator reads the artifact only when it needs to, and only the sections it needs.

The Prompt:

Task: Audit the entire /api directory for endpoints that don't validate
request bodies against a schema.

Write your findings to /tmp/audit-{run_id}/unvalidated-endpoints.md in this
structure:

# Unvalidated Endpoints Audit
## Summary
[3 bullets: total endpoints scanned, unvalidated count, highest-risk file]
## Findings
[One H3 per file, with line numbers and the missing validation]
## Suggested Fix Order
[Ordered list, ranked by exposure]

Return to me ONLY:
- The artifact path
- The three summary bullets
- Nothing else. No preamble, no sign-off.

Why This Works: The orchestrator now carries about 80 tokens of context from this sub-agent instead of 4,000. It decides based on the summary whether to read the artifact, and if so, can Read only the Findings section for the file it cares about. The compression loss of a traditional summary return is avoided because nothing was summarized. The full fidelity lives on disk.

Expected Output:

The orchestrator receives: "Audit complete. /tmp/audit-a7f2/unvalidated-endpoints.md. 142 endpoints scanned, 9 unvalidated, highest risk in routes/billing.ts." It can now plan the next step without the full report bloating its window.

When to stay

Three cases where you should not delegate, even though the task looks like a candidate.

Coding with tight dependencies. Research tasks are embarrassingly parallel. You can spawn ten sub-agents to look up ten companies and merge the results. Coding rarely works this way. Most refactors require agents to share an evolving file state, and sub-agents start clean without the context of what just changed. Anthropic's post makes this explicit. The 90% research-eval win does not transfer to code.

One-line fact lookups. Delegation has fixed overhead: the prompt you write, the sub-agent's context warm-up, the return parse. For a task that takes three seconds in-context, that overhead is larger than the savings.

Reasoning about in-flight decisions. If the task is "given what we just decided about the schema, write the migration," a sub-agent can't help. It doesn't have what we just decided, and packaging that state into a prompt is equivalent to just doing the task.

The hierarchy question

One more pattern worth knowing. If your orchestrator is about to spawn more than four peer sub-agents, don't. Spawn two feature leads and let them each spawn their specialists.

Single-layer fan-out fragments the orchestrator's context with every return. Hierarchical delegation keeps the top-level conversation narrow (two children, not six) and pushes the coordination work down to the leads. It's how engineering orgs actually work. VPs don't task individual engineers.

Claude Code hard-limits this at two layers (sub-agents cannot spawn sub-agents), so for a third layer you escalate to an Agent Team. But most work fits in two.

The shortest version

A 1M context is not a license to stop delegating. It's an argument for delegating better, because the orchestrator's reasoning quality still degrades with noise regardless of how much space you have left. Write delegation rules into the orchestrator's system prompt. Give every sub-agent a return-format contract with a "do not return" list. Use artifacts for anything longer than a paragraph. Stay in-context when the task depends on state you're holding right now.

If your team is standing up multi-agent workflows and the orchestrator keeps losing the thread, that's usually a prompt problem, not a model problem. Want hands-on training on delegation prompts and agentic workflows for your team? Connect with Kief Studio on Discord or schedule a session.

Training

Want your team prompting like this?

Kief Studio runs hands-on prompt engineering workshops tailored to your stack and workflows.

Newsletter

Get techniques in your inbox.

New prompt engineering guides delivered weekly. No spam, unsubscribe anytime.

Subscribe