Classic RAG: one query, one retrieval, one answer. Agentic RAG: model drives the retrieval loop. If the first batch doesn't cover the question, it issues a new query. If the retrieved doc is low-quality, it skips. It can call multiple retrieval backends, combine results, and decide when it has enough.
Requires tool-using models and a retrieval tool exposed via function calling. More flexible than one-shot RAG; more expensive, and harder to reason about. Usually earns its keep on ambiguous or multi-part questions where a single retrieval can't cover the scope.
Example Prompt
You have retrieval tools: search_docs(query), search_faq(query),
search_knowledge_base(query). The user will ask a question.
You can retrieve as many times as you need. For each question:
1. Plan which tool(s) are most likely to have the answer.
2. Query them.
3. If results don't cover the question, reformulate and try again.
4. When you have enough context, answer grounded in the retrieved docs.
5. Cite each factual claim.
Do NOT answer from your own training data.When to use it
- Multi-part questions where one retrieval won't cover
- Ambiguous queries that benefit from reformulation
- Multiple knowledge sources with different content types
When NOT to use it
- Simple direct queries -- classic RAG is simpler and faster
- Cost-sensitive paths (agentic loops burn tokens)
- You can't observe / debug the retrieval trace at runtime
