Security

Guardrails

Programmatic checks -- before, during, or after LLM generation -- that enforce policy, block unsafe outputs, or validate shape / content independent of what the model would produce on its own.

First published April 14, 2026

Guardrails are code around the LLM, not prompts inside it. "Reject any output mentioning a competitor by name," "block responses containing personal emails," "require JSON schema match." The model generates; the guardrail decides if the output ships.

Patterns: input guardrails (filter prompts for toxicity / injection attempts), output guardrails (validate shape, redact PII, run secondary classifier), topical guardrails (refuse off-scope queries). Libraries like Guardrails-AI, NeMo Guardrails, and homegrown Zod/Pydantic validators all implement variants.

Example Prompt

# Output guardrail example (pseudocode)

output = call_llm(prompt)

# 1. Schema check
if not validate_schema(output, ResponseSchema):
    return retry_or_reject()

# 2. Content classifier
if pii_detected(output) or toxicity_score(output) > 0.3:
    return redact_or_reject()

# 3. Topical relevance
if is_off_topic(output, allowed_topics):
    return fallback_response()

return output

When to use it

  • High-stakes deployments where model-alone is too risky
  • Regulated domains (finance, medical, legal)
  • Known failure modes you can detect programmatically

When NOT to use it

  • As a substitute for good prompts -- guardrails are defense, not design
  • Low-stakes deployments where the overhead isn't justified
  • Guardrails that are so aggressive they kill the product UX