TL;DR
Sample the model N times at nonzero temperature, take the mode of the final answers. Cheapest known reliability upgrade for discrete-answer tasks -- easily doubles accuracy on math reasoning at the cost of N-x compute.
Why it matters
In 2026 with fast / cheap inference, self-consistency is a free quality lever for any task with a discrete answer. Classification, extraction, yes/no -- sample 5x, vote, done. The only reasons not to use it are latency budgets and hard single-sample determinism requirements.
How you'd use this
Implement as a wrapper over your normal call path: run the same prompt N times at temperature=0.7, tally answers, return the plurality. For production use, pair with a confidence check -- if the plurality is thin, fall back to a smarter model.
Read the authors' abstract
We propose a simple decoding strategy -- self-consistency -- which samples diverse reasoning paths and selects the most consistent answer by taking a majority vote.
