TL;DR
Appending "let's think step by step" or showing worked-example reasoning in the prompt dramatically improves LLM accuracy on math and multi-step problems. The paper that named and formalized chain-of-thought. Still the cited reference for CoT despite being from 2022.
Why it matters
The practical foundation under every reasoning-heavy LLM deployment today. Even reasoning-tuned 2026 models (Claude 4.6, GPT-5, o-series) owe their internal thinking behavior to the CoT family of techniques.
For practitioners, the lesson isn't "add let's think step by step" -- it's that explicit intermediate reasoning can be engineered, measured, and improved, and that non-reasoning models benefit enormously from it.
How you'd use this
If you're using a non-reasoning model (Haiku, GPT-4o-mini, smaller open-weight), always enable CoT for math, logic, and multi-hop Q&A. If you're using a reasoning model, CoT in the prompt is redundant -- the model is already doing it.
Read the authors' abstract
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning.
