Self-consistency (Wang et al., 2022) is the cheapest robustness technique. Sample the model 5-10 times, take the mode of the final answers. Errors from individual generations wash out; correct answers cluster.
Works best when there's a discrete final answer (a number, a label, a yes/no). Prose outputs don't "vote" cleanly. In 2026 practice: cheapest reliability upgrade for classification, math, and extraction tasks. Modern reasoning models blunt the gains somewhat because they already reason multiple ways internally.
Example Prompt
Call the API 5 times at temperature=0.7 with this same prompt:
"Classify the sentiment (positive/neutral/negative) of this review: {review}"
Then return the label that appears most often across the 5 calls. If tied, default to neutral.When to use it
- The final answer is a discrete value (number, label, enum)
- You can afford 5-10x the cost for a meaningful accuracy bump
- The task has known single-sample failure rate >5%
When NOT to use it
- Prose / open-ended outputs where "most common answer" doesn't apply
- Latency-sensitive paths
- The base single-sample accuracy is already >99%
