Hybrid Search -- Qurtoo Glossary

Semantic search catches paraphrase; keyword search catches exact terms. Neither wins alone on realistic queries. Hybrid search runs both, then fuses the results.

Dominant fusion method: Reciprocal Rank Fusion (RRF) -- score each doc as the sum of 1/(k + rank) across each retrieval source. Simple, robust, beats weighted-score fusion in practice because it's scale-invariant. 2026 best practice: BM25 + dense retrieval → RRF → LLM reranker on top-50.

Example Prompt

# Pseudocode: hybrid with reciprocal rank fusion
bm25_results = bm25_search(query, k=50)
dense_results = vector_search(embed(query), k=50)

def rrf(rank, k=60): return 1 / (k + rank)

scores = {}
for rank, doc in enumerate(bm25_results):
    scores[doc.id] = scores.get(doc.id, 0) + rrf(rank)
for rank, doc in enumerate(dense_results):
    scores[doc.id] = scores.get(doc.id, 0) + rrf(rank)

top = sorted(scores.items(), key=lambda x: -x[1])[:10]

When to use it

Production RAG where either mode alone is missing recall
Corpora with mixed content (code + prose, product + policy)
You want a robust default before fancier retrieval

When NOT to use it

Tiny corpora where one retrieval mode is already perfect
Latency budget can't absorb running two retrievers