Semantic search is the retrieval mechanism behind modern RAG. Embed the query, embed the documents, return the K nearest neighbors by cosine distance. A user asking "how do I cancel" matches a policy doc titled "account termination procedure" -- zero keyword overlap, high semantic match.
Not a universal upgrade over keyword search: semantic misses exact-match cases (error codes, SKUs, API names), and drifts when queries are very short. Best-in-class retrieval is hybrid search: semantic + BM25 fused, then reranked.
Example Prompt
# Query-time semantic search (pgvector)
query_vec = embed(user_query)
# Top-10 most similar documents
SELECT id, title, chunk
FROM document_chunks
ORDER BY embedding <=> $1
LIMIT 10;When to use it
- Queries that phrase things differently than source docs
- Natural-language question answering over documentation
- Long-tail queries where keyword match misses
When NOT to use it
- Exact-match queries (error codes, part numbers, function names) -- use keyword
- Very short queries (1-2 words) where embedding is ambiguous
- Small corpus where string matching is fast and good enough
