Applied

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Published May 22, 2020 arXiv: 2005.11401 View on arXiv → PDF →

TL;DR

Coined the term RAG. Showed that combining a language model with retrieval over a knowledge base beats fine-tuning on knowledge-intensive tasks while staying updateable without retraining. The foundation of every modern document-chat, support bot, and enterprise search app.

Why it matters

RAG is how you keep an LLM current without retraining, how you integrate private data without fine-tuning, and how you force grounded answers with citations. Every production LLM app with proprietary documentation is running some descendant of this 2020 paper.

The modern stack has diverged (hybrid search, rerankers, chunking strategies), but the core pattern -- retrieve context, inject into prompt, generate grounded answer -- is this paper.

How you'd use this

Start here before building any knowledge-backed LLM product. The paper is a bit dated on mechanics (modern vector DBs, embeddings, rerankers are much better), but the architectural lesson holds: retrieval + generation beats either alone for knowledge-intensive tasks.

Read the authors' abstract

We introduce RAG -- models that combine a pre-trained parametric model with a non-parametric memory (Wikipedia dense vector index) for generation.