Applied

Retrieval-Augmented Generation (RAG)

A pattern where relevant context is retrieved from a knowledge base (vector store, search index, graph) and injected into the prompt before the model generates its answer.

First published April 14, 2026

RAG closes the gap between a model's training data and your specific, private, or recent information. The flow: user query → retrieval (semantic, keyword, or hybrid search) → top-k documents packed into the prompt → model generates grounded in those docs.

Good RAG is mostly about the retrieval half, not the generation half. The model can only answer well from what you hand it. Common failure modes: weak embeddings, poor chunking, no query rewriting, no re-ranking, no citation forcing. 2026 best practice: hybrid search (BM25 + vectors), reciprocal rank fusion, LLM re-ranker on top-50, citation-forced generation.

Example Prompt

System: Answer the user's question using ONLY the context below. If the answer isn't in the context, say "I don't have that information."
Cite sources inline as [1], [2], etc.

Context:
[1] "Our standard return window is 30 days from purchase date..." (source: return-policy.md)
[2] "Electronics have an extended 45-day window..." (source: electronics-policy.md)

User: What's the return policy for a TV?

When to use it

  • Your content changes more often than model retraining cycles
  • Private/proprietary documentation the model can't know
  • Regulatory or accuracy requirements that demand citations
  • Long-tail domain questions where the model hallucinates

When NOT to use it

  • Your knowledge fits in the context window -- just put it all in the prompt
  • The model's parametric knowledge is sufficient for the task
  • Latency or complexity of the retrieval infrastructure outweighs the accuracy gain