Retrieval-Augmented Generation (RAG) -- Qurtoo Glossary

RAG closes the gap between a model's training data and your specific, private, or recent information. The flow: user query → retrieval (semantic, keyword, or hybrid search) → top-k documents packed into the prompt → model generates grounded in those docs.

Good RAG is mostly about the retrieval half, not the generation half. The model can only answer well from what you hand it. Common failure modes: weak embeddings, poor chunking, no query rewriting, no re-ranking, no citation forcing. 2026 best practice: hybrid search (BM25 + vectors), reciprocal rank fusion, LLM re-ranker on top-50, citation-forced generation.

Example Prompt

System: Answer the user's question using ONLY the context below. If the answer isn't in the context, say "I don't have that information."
Cite sources inline as [1], [2], etc.

Context:
[1] "Our standard return window is 30 days from purchase date..." (source: return-policy.md)
[2] "Electronics have an extended 45-day window..." (source: electronics-policy.md)

User: What's the return policy for a TV?

When to use it

Your content changes more often than model retraining cycles
Private/proprietary documentation the model can't know
Regulatory or accuracy requirements that demand citations
Long-tail domain questions where the model hallucinates

When NOT to use it

Your knowledge fits in the context window -- just put it all in the prompt
The model's parametric knowledge is sufficient for the task
Latency or complexity of the retrieval infrastructure outweighs the accuracy gain