Self-Consistency Improves Chain of Thought Reasoning in Language Models -- Qurtoo Papers

TL;DR

Sample the model N times at nonzero temperature, take the mode of the final answers. Cheapest known reliability upgrade for discrete-answer tasks -- easily doubles accuracy on math reasoning at the cost of N-x compute.

Why it matters

In 2026 with fast / cheap inference, self-consistency is a free quality lever for any task with a discrete answer. Classification, extraction, yes/no -- sample 5x, vote, done. The only reasons not to use it are latency budgets and hard single-sample determinism requirements.

How you'd use this

Implement as a wrapper over your normal call path: run the same prompt N times at temperature=0.7, tally answers, return the plurality. For production use, pair with a confidence check -- if the plurality is thin, fall back to a smarter model.

Read the authors' abstract

We propose a simple decoding strategy -- self-consistency -- which samples diverse reasoning paths and selects the most consistent answer by taking a majority vote.