loop engineering June 18, 2026 • 6 min read

Stop Prompting Your Agent. Engineer the Loop That Prompts It

The four standing prompts behind Boris Cherny's "loop engineering" -- find-work, do-work, verify, remember -- with copy-paste scaffolds and the guardrails that keep them from running rogue.

Boris Cherny, who runs Claude Code at Anthropic, said the quiet part out loud last week: "I don't prompt Claude anymore. I have loops that are running. They're the ones that are prompting Claude and figuring out what to do."

That line traveled fast. On June 7, Peter Steinberger posted "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." It crossed roughly 6.5 million views in days. The next day, Google Chrome's Addy Osmani published an essay that gave the practice a name and an anatomy: loop engineering.

Here's the part everyone skips. None of them said this is easier. Osmani's own line is the one to tattoo on your monitor: "a sloppy prompt inside a loop just produces sloppy work faster." Prompt engineering didn't die. It became the floor. The leverage moved up a level, to the design of the loop itself.

So let's design one. A loop is four standing prompts you write once and let run on a schedule: find work, do work, verify, remember. Get these four right and you have an agent that picks up a task, finishes it, proves it's done, and writes down what happened so the next run isn't starting from zero.

Prompt 1: Find Work

The first prompt scans for what needs doing. It is read-only. It does not touch code, open PRs, or change state. Its only job is to produce a list.

The Prompt:

You are the work-discovery step of an automated loop. Read-only mode.
You may inspect files, run `git log`, `gh pr list`, and the test suite.
You may NOT edit files, push, comment, or open PRs.

Scan for work in this exact order and stop at the first non-empty category:
1. CI failures on the main branch from the last 24 hours
2. Open PRs I authored that have unaddressed review comments
3. Failing or skipped tests in the suite
4. TODO comments tagged `// LOOP:` in src/

Output a JSON array of at most 5 items, each:
{ "id": "...", "type": "...", "summary": "...", "evidence": "file:line or PR url" }
Return [] if there is nothing. Do not invent work.

Why This Works: The read-only constraint and the explicit allow/deny list keep the discovery step from quietly doing the work it's supposed to only be finding. Capping the output at five items and ordering the scan means the loop always pulls the highest-priority thing first instead of thrashing across everything at once.

Expected Output:

json [ { "id": "ci-4821", "type": "ci_failure", "summary": "test_auth_refresh fails intermittently on main", "evidence": "ci.log:also reproduced locally" }, { "id": "pr-219", "type": "review_comment", "summary": "Reviewer asked to extract the retry logic in client.py", "evidence": "github.com/org/repo/pull/219" } ]

Prompt 2: Do Work

Now the agent acts. The rule that matters most: one item at a time, fully. Not three half-finished fixes. One, done, following the conventions already in the repo.

The Prompt:

You are the execution step. You have been handed ONE work item:
{work_item_json}

Tools available: file editing, the test runner, git (commit only, no push).
MCP scope: the GitHub connector (read + comment) and nothing else.

Rules:
- Match the existing style in the files you touch. Read a neighbor file first.
  Do not introduce new patterns, libraries, or formatting.
- Make the smallest change that fully resolves this item.
- Run the relevant tests yourself before you consider it done.
- Write a commit message describing what changed and why.
- If the item is bigger than one focused change, STOP and report it as
  "needs-decomposition" instead of attempting a partial fix.

Why This Works: Naming the tool and MCP scope inline is your blast radius. The agent literally cannot push or reach a connector you didn't list. "Read a neighbor file first" is the single most reliable instruction for getting an agent to match house style instead of writing generically correct code that looks foreign to your repo.

Prompt 3: Verify

This is where loops live or die. If the same model that wrote the code also grades it, you get self-confirmation: inherited blind spots, confirmation bias, and stale assumptions piling up over a long session. The fix is a separate verifier with its own context that never sees the generator's reasoning.

And verification means evidence, not confidence.

The Prompt:

You are an independent verifier. You did NOT write this code and you have
no access to the author's reasoning. Be skeptical. Your default is "not done."

Original requirement: {work_item.summary}
Changed files: {diff}

To pass, you must produce EXECUTABLE evidence, not a description:
1. Run the test suite. Paste the actual output, not a summary of it.
2. Re-read the diff against the original requirement. Does it do what was
   asked, and only that?
3. For UI or behavior changes, run the thing and describe what happened.

A confident summary of broken work is a FAIL. If you cannot run the
verification, return "blocked" with the reason. Output:
{ "verdict": "pass" | "fail" | "blocked", "evidence": "...", "reason": "..." }

Why This Works: The skeptical stance plus "default is not done" fights the agent's urge to approve its own neighborhood. Demanding pasted test output rather than a summary closes the reward-hacking gap. Claude Code's /goal evaluator, for instance, reads the transcript, not the work itself, so a confident summary of broken work reads as fine. Forcing real output shuts that door.

Build the escape hatch in: max three attempts per item, then log it as blocked and move on. A loop with no stop condition is a billing event waiting to happen.

Prompt 4: Remember

The agent forgets. The repo does not. Every run reads a state file at the start and appends to it at the end. This is what survives a crash, a context reset, or you killing the process at 3 a.m.

The Prompt:

Append one entry to LOOP-STATE.md. Read the existing file first; do not
rewrite or delete prior entries. Format:

## {timestamp} -- {work_item.id}
- Outcome: done | blocked | needs-decomposition
- What changed: <one line>
- Evidence: <test result or PR link>
- Note for next run: <anything the next iteration must know>

Why This Works: External memory in plain markdown (or Linear, or SQLite) lives outside the context window, so it doesn't rot as the session grows. The "note for next run" field is what stops the loop from re-attempting the same blocked task forever.

The Guardrails That Keep It Honest

The four prompts are the engine. The guardrails are the brakes, and small teams get burned precisely here. One dev team burned $2,847 in four hours on an agent stuck in a refactoring loop while every dashboard stayed green. Another developer left an agent over a long weekend and came back to a $4,200 bill. An audit of 30 teams found the top failure was simply no budget caps.

Wrap every loop in an explicit envelope:

TRIGGER -> every 15m, or on CI failure
SCOPE   -> repo X only, PRs I authored
BUDGET  -> max 3 sub-agents per tick, 50k tokens
STOP    -> all green, OR 10 iterations, OR $5 spent
REPORT  -> post summary to Slack

Team audits land on roughly a $50/day soft cap with an alert and a $100/day hard cutoff. Kill any single request over $1. And mind concurrency: if five loops can write to the same PR, you don't have automation, you have a race condition. Shared read access is fine. Shared write access should be rare.

Start with a read-only loop. CI triage, dependency bumps, lint fixes, flaky-test repro. Watch the cost for a few days. Grant write access only after the verify step can reliably say "no." Bad first loops are the ones where "done" is a judgment call: architecture rewrites, auth, payments, production deploys.

One last thing, because it's the real point. Two people build the identical loop and get opposite results. One uses it to move faster on work they understand cold. The other uses it to avoid understanding the work at all. The loop can't tell the difference. You can.

If your team is wiring up its first agentic loops and wants the verify and guardrail design to hold up under a real budget, we run hands-on prompt engineering training. Connect with Kief Studio on Discord or schedule a session.

Training

Want your team prompting like this?

Kief Studio runs hands-on prompt engineering workshops tailored to your stack and workflows.

Newsletter

Get techniques in your inbox.

New prompt engineering guides delivered weekly. No spam, unsubscribe anytime.

Subscribe