TL;DR
Coined the term RAG. Showed that combining a language model with retrieval over a knowledge base beats fine-tuning on knowledge-intensive tasks while staying updateable without retraining. The foundation of every modern document-chat, support bot, and enterprise search app.
Why it matters
RAG is how you keep an LLM current without retraining, how you integrate private data without fine-tuning, and how you force grounded answers with citations. Every production LLM app with proprietary documentation is running some descendant of this 2020 paper.
The modern stack has diverged (hybrid search, rerankers, chunking strategies), but the core pattern -- retrieve context, inject into prompt, generate grounded answer -- is this paper.
How you'd use this
Start here before building any knowledge-backed LLM product. The paper is a bit dated on mechanics (modern vector DBs, embeddings, rerankers are much better), but the architectural lesson holds: retrieval + generation beats either alone for knowledge-intensive tasks.
Read the authors' abstract
We introduce RAG -- models that combine a pre-trained parametric model with a non-parametric memory (Wikipedia dense vector index) for generation.
