Reranker -- Qurtoo Glossary

Retrieval is fast but noisy; rerankers are slow but precise. The pattern: retrieve top 50-100 with cheap methods, then rerank to top 5-10 with an expensive cross-encoder that jointly scores query + doc. Cross-encoders catch relevance nuances embeddings miss (entailment, temporal fit, exact entity match).

2026 options: open-weight cross-encoders (BAAI/bge-reranker, Voyage rerank-2, Cohere Rerank 3), or LLM-as-reranker (send top-50 to a smart model and ask it to order by relevance). LLM rerankers cost more but handle complex query intent better. RAG quality almost always improves with a reranker; it's the single highest-leverage retrieval upgrade.

Example Prompt

# Two-stage retrieve + rerank
candidates = hybrid_search(query, k=50)   # cheap + fast, noisy

# Cross-encoder rerank
scores = cross_encoder.predict([(query, doc.text) for doc in candidates])
ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])
top_10 = [doc for doc, _ in ranked[:10]]

When to use it

RAG quality plateau despite tuning retrieval
Queries with nuanced intent embeddings miss
You can afford 100-300ms of additional latency

When NOT to use it

Very cheap retrieval budgets (reranker is 5-50x the cost of embed-only)
Your retrieval is already near-perfect
Hard-real-time use cases