Guardrails are code around the LLM, not prompts inside it. "Reject any output mentioning a competitor by name," "block responses containing personal emails," "require JSON schema match." The model generates; the guardrail decides if the output ships.
Patterns: input guardrails (filter prompts for toxicity / injection attempts), output guardrails (validate shape, redact PII, run secondary classifier), topical guardrails (refuse off-scope queries). Libraries like Guardrails-AI, NeMo Guardrails, and homegrown Zod/Pydantic validators all implement variants.
Example Prompt
# Output guardrail example (pseudocode)
output = call_llm(prompt)
# 1. Schema check
if not validate_schema(output, ResponseSchema):
return retry_or_reject()
# 2. Content classifier
if pii_detected(output) or toxicity_score(output) > 0.3:
return redact_or_reject()
# 3. Topical relevance
if is_off_topic(output, allowed_topics):
return fallback_response()
return outputWhen to use it
- High-stakes deployments where model-alone is too risky
- Regulated domains (finance, medical, legal)
- Known failure modes you can detect programmatically
When NOT to use it
- As a substitute for good prompts -- guardrails are defense, not design
- Low-stakes deployments where the overhead isn't justified
- Guardrails that are so aggressive they kill the product UX
