Temperature scales the logit distribution before sampling the next token. At temperature=0, the model always picks the most likely next token (near-deterministic). At temperature=1, it samples proportionally to probability. Above 1, unlikely tokens get relatively boosted, producing more varied and surprising output.
Production defaults: classification / extraction / structured output → 0.0-0.2. General chat → 0.5-0.7. Creative writing → 0.7-1.0. Above 1.0 is usually noise. Temperature=0 is not fully deterministic on frontier models (there's other stochasticity); rely on `seed` parameters when available for reproducibility.
Example Prompt
# Choosing temperature by task
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
temperature=0.0 # classification -> low
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
temperature=0.7 # draft marketing copy -> medium
)When to use it
- You want reproducibility -- use 0.0
- You want variety across multiple samples -- use 0.5-0.8
- Creative brainstorming -- up to 1.0
When NOT to use it
- Setting temperature > 1 without a specific reason (quality cliff)
- Assuming temperature=0 = fully deterministic (not on most modern APIs)
