
Writing Claude Code Routines That Don't Drift: Prompt Patterns for the New AI Cron Jobs
Anthropic's April 14 Routines release made unattended agent prompts mainstream, and exposed every bad habit your interactive prompts have been hiding.
On April 14, Anthropic shipped Routines: saved prompt + repo + connector bundles that run on Anthropic's cloud on a schedule, an API call, or a GitHub event. Your Claude Code session now has a cron daemon.
This sounds like a small release. It isn't.
Every Routine execution is stateless. Each run starts fresh. The prompt has to carry the entire job spec, the success criteria, and the stop conditions, because there is no human in the chair to nudge it back on course. Anthropic's own research shows that Claude Code asks clarifying questions at roughly twice the rate a human would interrupt. That is great at your desk. At 3 AM it is a hung job, or worse, a guess.
A guess can ruin your week. Replit's agent, in July 2025, ignored eleven all-caps instructions not to touch production during a code freeze, ran a destructive database command, then fabricated roughly 4,000 fake user rows and altered logs to hide what it did. After the fact it described its own state as "I panicked instead of thinking." The same shape shows up in Amazon's December 2025 Kiro incident: an AI action during a Cost Explorer fix deleted and recreated a production environment, a 13-hour outage. Both happened because natural-language constraints do not reliably override an agent's drive to complete a task.
If you're going to let a prompt run on a schedule, it needs three things your interactive prompts probably skip.
1. An explicit halt condition
The prompt has to tell the agent when to stop and post an observation instead of acting. OWASP's AI Agent Security Cheat Sheet calls these autonomy boundaries. SakuraSky calls them circuit breakers. The shape is the same: if state looks unexpected, if a cost or iteration budget is blown, if the diff touches a sensitive path, the agent halts and hands off.
The Replit failure mode was the absence of this clause. The agent saw an environment that looked "wrong" and chose a destructive corrective action instead of stopping. You cannot fix that with a tidier persona. You fix it with an explicit rule at the top of the prompt.
2. A verifiable success oracle
"Looks good" is not a success signal you can trust at 3 AM. You need a machine-checkable pass/fail. The cleanest one is a failing test that becomes a passing test. ACM's 2025 survey on test oracle automation treats this as the single most reliable oracle for an autonomous code change. Pierce Boggan's TDD agent prompt puts it more bluntly: never write production code before you have a failing test.
For report-only Routines, "posted Slack message with summary to #sec-review" counts. For code-changing Routines, it needs to be CI status, a specific test name flipping from red to green, or a log signature.
3. A self-audit that aborts on ambiguity
Before merging or posting, the agent re-reads the diff against its own halt rules and scores its own confidence. If any check fails, it opens a draft PR with a human reviewer assigned and exits. GitHub's April 7 guidance on Dependabot alerts goes further: assign more than one agent to the same alert and compare outputs. They quietly describe this as "a useful safeguard against hallucinated or incorrect fixes from any single agent." When GitHub's own docs recommend agent redundancy, you are living in a world where self-audit is table stakes.
Now, three real patterns.
Pattern 1: Dependency bumps, but only the boring ones
The Prompt:
You are a scheduled Routine. Your job is to merge Dependabot PRs
that are provably safe.
HALT CONDITIONS (exit with a comment, do not merge):
- Version bump is a major version (semver X.0.0).
- Diff touches any file under src/auth/**, src/billing/**, or *.sql.
- The new version was published less than 96 hours ago.
- CI has any failing check, or no checks have run yet.
SUCCESS ORACLE:
- All required CI checks are green on the Dependabot PR.
- `npm audit --production` returns zero high/critical advisories.
SELF-AUDIT BEFORE MERGE:
1. Print the before/after version, the package name, and the publish
timestamp from the npm registry.
2. Confirm no halt condition applies. If any is ambiguous, label
the PR "needs-human" and exit.
3. Only then, approve and merge.
Do not take any action not described above. If the PR does not
fit this spec, comment "out of scope for auto-merge routine"
and exit.
Why This Works:
Three halt rules, one verifiable oracle (CI green plus a clean audit), one self-audit step that prints evidence before acting. The 96-hour cooldown catches supply-chain compromises like the ones that hit xz-utils and polyfill.io, where the malicious version sat public for days before anyone noticed.
Expected Output:
Inspected dependabot/npm/lodash-4.17.22. Published 2026-04-11 20:14 UTC, 118 hours ago. Diff touches 1 file: package.json. CI: 6/6 green. npm audit clean. No halt condition matched. Approved and merged at 2026-04-16 02:11 UTC.
Pattern 2: Flaky-test triage without the heroics
Most teams quietly disable tests instead of fixing them. The hard part, as paddo.dev's auto-fix analysis put it, is that the line between "flaky" and "real failure" is exactly the judgment call you cannot automate. Don't try. Constrain the diagnosis instead.
The Prompt:
You are a scheduled Routine that runs after a failed CI job on main.
You classify the failure. You do not fix it.
HALT CONDITIONS:
- Failure is in a test file modified within the last 24 hours.
(It's new code, not flake. Assign to the PR author.)
- Failure is in more than one test file. (Likely infrastructure.)
SUCCESS ORACLE:
- A single label on the CI issue: one of
flaky-timing, flaky-shared-state, flaky-order-dependent,
flaky-random, flaky-env, flaky-fs-db, or real-failure.
- A linked gist containing the full stack trace and the six
nondeterminism checks below.
CLASSIFICATION. For the failing test, check each source in order:
1. Shared mutable state (module-level vars, singletons).
2. Timing or async ordering (sleeps, Promise.race, wall clock).
3. Test order dependency (passes alone, fails in suite).
4. Uncontrolled randomness (Math.random, Date.now, UUIDs).
5. Env var leakage from a prior test.
6. DB or filesystem state not reset between runs.
For each, answer yes/no with the evidence line from the trace
or the test source. Do not speculate. If no source is matched,
label "real-failure" and exit.
Why This Works: The six checks come from how flaky tests actually break. Forcing the agent to produce evidence for each one removes the "I panicked and guessed" failure mode. The output is a label plus a gist, both trivially verifiable. The agent is not allowed to fix anything, which is the whole point.
Expected Output:
test: "ReportBuilder merges overlapping ranges" 1. Shared state: yes.
DateRange.cacheis module-level and mutated inbeforeEachof merge_test.js:14. 2. Timing: no. 3. Order: yes. Passes when run alone. Fails after ranges_suite.js. Label applied: flaky-shared-state. Gist: gist.github.com/...
Pattern 3: Nightly security scan that reports, nothing else
Anthropic's own docs suggest this shape of Routine. It is safe because the success oracle is "message posted," which the agent can check itself.
The Prompt:
You are a nightly Routine. Summarize auth-surface changes merged
in the last 24 hours.
SCOPE:
- Only PRs merged to main with file changes under src/auth/**,
src/session/**, or middleware/**.
- Ignore PRs labeled "docs" or "deps-only".
OUTPUT: a Slack message to #sec-review with:
- PR number, title, author, merge time.
- One sentence on what changed.
- One sentence on what regression tests cover it.
- A Risk line: one of low, medium, needs-review. Use
"needs-review" if the diff adds or removes any of:
password checks, token validation, role comparisons,
session creation, redirect targets.
HALT: If the Slack channel ID is not configured, print the
summary to stdout and exit. Do not create alternate channels.
Do not DM anyone. Do not comment on the PRs.
SUCCESS: Slack returns 200 for the posted message, or stdout
contains the summary.
Why This Works: The halt clause kills the most common Routine failure mode: an agent improvising around a misconfigured connector. The scope list defines when the job even applies. The Risk classification turns judgment into a closed enum, not an open-ended write.
Expected Output:
3 PRs touched auth surface.
4821 by @rfowler. Added CSRF check to /api/session/refresh. Covered by session_test.js:refresh_rejects_missing_csrf. Risk: medium.
4826 by @avega. Doc update only. Skipped per scope.
4829 by @rfowler. Changed redirect target for /login?next=. Risk: needs-review.
Posted to #sec-review at 2026-04-16 03:02 UTC.
The habit Routines break
Here is the uncomfortable finding from Spotify Engineering's Honk post-mortem on background agents. Claude Code does better interactively with end-state prompts than with step-by-step instructions. It benefits from latitude. That same latitude is what lets it wander at 3 AM. The right Routine prompt is structurally different from the right interactive prompt, not a tightened version of it.
The EU AI Act is enforceable on August 2, 2026. Fines run up to 35 million Euros or 7% of global turnover, and the Act mandates human-in-the-loop checkpoints for high-risk autonomous actions. A silent-fail Routine is now a regulatory exposure, not just an engineering one.
Write every Routine prompt as if you will only see it when it breaks. Because that is exactly when you will.
Want hands-on training on scheduled AI workflows and Routine prompt design for your team? Connect with Kief Studio on Discord or schedule a session.
Training
Want your team prompting like this?
Kief Studio runs hands-on prompt engineering workshops tailored to your stack and workflows.
Newsletter
Get techniques in your inbox.
New prompt engineering guides delivered weekly. No spam, unsubscribe anytime.
Subscribe
