Recipes

Playbooks

Multi-step prompt recipes for specific outcomes. Each playbook delivers something concrete -- a RAG eval harness, an injection test suite, a code-review loop -- with the full prompt chain, failure modes, and adjacent variations.

5 playbooks

Techniques

Build a Classification Prompt That Beats Fine-Tuning

A production-ready classification prompt for your own labels, with self-consistency voting, structured output, and an eval harness -- typically reaches 90%+ accuracy without any fine-tuning.

●●○○○ • 45 minutes

Development

Token Budget & Cost Estimator for Production LLM Apps

A pre-flight cost calculator that tells you -- before you call -- how many tokens a given prompt + expected response will burn, across multiple models, so your app can choose the cheapest model that fits the window.

●●○○○ • 30 minutes

Security PRO

Prompt Injection Test Suite for Production Agents

A regression-runnable suite of 40+ injection probes that catches the classes of attacks your agent will actually face. Run nightly, track pass rate per prompt/model version, catch regressions before users do.

●●●●○ • 2-3 hours

Agentic PRO

Supervisor-Worker Agent Pattern for Cost Optimization

Reduce agent costs 70-90% by routing cheap sub-tasks to Haiku-class models and reserving the frontier for planning. Includes a full Python reference implementation, cost accounting, and the fallback logic.

●●●●○ • 3-4 hours

Applied PRO

Build a RAG Eval Harness That Catches Regressions

A repeatable test suite measuring faithfulness, answer relevance, and context precision across 50-200 Q&A pairs. Ship gates on it; your RAG quality stops being vibes-based and becomes observable.

●●●●● • 4-6 hours