Top-p (nucleus sampling) caps the candidate pool to tokens that together account for P% of the probability mass. At top_p=0.1, only the most likely tokens making up 10% of the mass are considered; at 0.95, nearly all are. Cleaner quality-control than temperature because it directly excludes low-probability tails.
Industry convention: set either temperature OR top_p, not both at extreme values -- they compound. Common defaults: top_p=0.9-0.95 with temperature=0.7 works well for general chat. For reproducibility, top_p=1 with temperature=0 is cleanest.
Example Prompt
# top_p vs temperature
# Both control variance, but differently:
# temperature 1.0, top_p 1.0 -> full distribution, max variance
# temperature 0.7, top_p 1.0 -> smoothed distribution
# temperature 1.0, top_p 0.1 -> only the most likely ~10% of tokens
# temperature 0.0, top_p irrelevant -> deterministicWhen to use it
- You want to exclude low-probability tail tokens (rare hallucinations)
- Quality-over-variety tradeoff that temperature alone doesn't hit cleanly
When NOT to use it
- You're already tuning temperature -- pick one
- Extreme top_p (< 0.3) combined with high temperature produces incoherent output
