Development

Temperature

A sampling parameter (typically 0.0-2.0) that controls how deterministic vs. creative an LLM's output is. Lower = more predictable, higher = more varied.

First published April 14, 2026

Temperature scales the logit distribution before sampling the next token. At temperature=0, the model always picks the most likely next token (near-deterministic). At temperature=1, it samples proportionally to probability. Above 1, unlikely tokens get relatively boosted, producing more varied and surprising output.

Production defaults: classification / extraction / structured output → 0.0-0.2. General chat → 0.5-0.7. Creative writing → 0.7-1.0. Above 1.0 is usually noise. Temperature=0 is not fully deterministic on frontier models (there's other stochasticity); rely on `seed` parameters when available for reproducibility.

Example Prompt

# Choosing temperature by task

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    temperature=0.0  # classification -> low
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    temperature=0.7  # draft marketing copy -> medium
)

When to use it

  • You want reproducibility -- use 0.0
  • You want variety across multiple samples -- use 0.5-0.8
  • Creative brainstorming -- up to 1.0

When NOT to use it

  • Setting temperature > 1 without a specific reason (quality cliff)
  • Assuming temperature=0 = fully deterministic (not on most modern APIs)