Applied

Reranker

A model that takes a candidate list of retrieved documents and a query, and produces a more accurate ranking -- applied after initial retrieval (BM25, vector, or hybrid) as a second pass.

First published April 14, 2026

Retrieval is fast but noisy; rerankers are slow but precise. The pattern: retrieve top 50-100 with cheap methods, then rerank to top 5-10 with an expensive cross-encoder that jointly scores query + doc. Cross-encoders catch relevance nuances embeddings miss (entailment, temporal fit, exact entity match).

2026 options: open-weight cross-encoders (BAAI/bge-reranker, Voyage rerank-2, Cohere Rerank 3), or LLM-as-reranker (send top-50 to a smart model and ask it to order by relevance). LLM rerankers cost more but handle complex query intent better. RAG quality almost always improves with a reranker; it's the single highest-leverage retrieval upgrade.

Example Prompt

# Two-stage retrieve + rerank
candidates = hybrid_search(query, k=50)   # cheap + fast, noisy

# Cross-encoder rerank
scores = cross_encoder.predict([(query, doc.text) for doc in candidates])
ranked = sorted(zip(candidates, scores), key=lambda x: -x[1])
top_10 = [doc for doc, _ in ranked[:10]]

When to use it

  • RAG quality plateau despite tuning retrieval
  • Queries with nuanced intent embeddings miss
  • You can afford 100-300ms of additional latency

When NOT to use it

  • Very cheap retrieval budgets (reranker is 5-50x the cost of embed-only)
  • Your retrieval is already near-perfect
  • Hard-real-time use cases