Development

Top-P (Nucleus Sampling)

An alternative to temperature: the model samples only from the smallest set of tokens whose cumulative probability exceeds P, ignoring everything below that threshold.

First published April 14, 2026

Top-p (nucleus sampling) caps the candidate pool to tokens that together account for P% of the probability mass. At top_p=0.1, only the most likely tokens making up 10% of the mass are considered; at 0.95, nearly all are. Cleaner quality-control than temperature because it directly excludes low-probability tails.

Industry convention: set either temperature OR top_p, not both at extreme values -- they compound. Common defaults: top_p=0.9-0.95 with temperature=0.7 works well for general chat. For reproducibility, top_p=1 with temperature=0 is cleanest.

Example Prompt

# top_p vs temperature
# Both control variance, but differently:

# temperature 1.0, top_p 1.0      -> full distribution, max variance
# temperature 0.7, top_p 1.0      -> smoothed distribution
# temperature 1.0, top_p 0.1      -> only the most likely ~10% of tokens
# temperature 0.0, top_p irrelevant -> deterministic

When to use it

  • You want to exclude low-probability tail tokens (rare hallucinations)
  • Quality-over-variety tradeoff that temperature alone doesn't hit cleanly

When NOT to use it

  • You're already tuning temperature -- pick one
  • Extreme top_p (< 0.3) combined with high temperature produces incoherent output