Chain-of-Thought Prompting Elicits Reasoning in Large Language Models -- Qurtoo Papers

TL;DR

Appending "let's think step by step" or showing worked-example reasoning in the prompt dramatically improves LLM accuracy on math and multi-step problems. The paper that named and formalized chain-of-thought. Still the cited reference for CoT despite being from 2022.

Why it matters

The practical foundation under every reasoning-heavy LLM deployment today. Even reasoning-tuned 2026 models (Claude 4.6, GPT-5, o-series) owe their internal thinking behavior to the CoT family of techniques.

For practitioners, the lesson isn't "add let's think step by step" -- it's that explicit intermediate reasoning can be engineered, measured, and improved, and that non-reasoning models benefit enormously from it.

How you'd use this

If you're using a non-reasoning model (Haiku, GPT-4o-mini, smaller open-weight), always enable CoT for math, logic, and multi-hop Q&A. If you're using a reasoning model, CoT in the prompt is redundant -- the model is already doing it.

Read the authors' abstract

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.