Temperature -- Qurtoo Glossary

Temperature scales the logit distribution before sampling the next token. At temperature=0, the model always picks the most likely next token (near-deterministic). At temperature=1, it samples proportionally to probability. Above 1, unlikely tokens get relatively boosted, producing more varied and surprising output.

Production defaults: classification / extraction / structured output → 0.0-0.2. General chat → 0.5-0.7. Creative writing → 0.7-1.0. Above 1.0 is usually noise. Temperature=0 is not fully deterministic on frontier models (there's other stochasticity); rely on `seed` parameters when available for reproducibility.

Example Prompt

# Choosing temperature by task

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    temperature=0.0  # classification -> low
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    temperature=0.7  # draft marketing copy -> medium
)

When to use it

You want reproducibility -- use 0.0
You want variety across multiple samples -- use 0.5-0.8
Creative brainstorming -- up to 1.0

When NOT to use it

Setting temperature > 1 without a specific reason (quality cliff)
Assuming temperature=0 = fully deterministic (not on most modern APIs)