Embeddings convert text into ~1536-dim float vectors. "king" and "queen" land near each other; "king" and "banana" don't. This makes similarity a math operation (cosine distance, dot product) instead of a string match.
Every semantic search, RAG system, and recommendation engine built on LLMs depends on embeddings. 2026 production options: OpenAI text-embedding-3-large, Cohere embed-v4, Voyage voyage-3, open-weight BAAI/bge-m3. Cost is trivial (~$0.10/M tokens). The harder problems are chunking strategy and reranking, not the embedding call itself.
Example Prompt
# Embed a chunk, store in pgvector
from openai import OpenAI
client = OpenAI()
def embed(text: str) -> list[float]:
return client.embeddings.create(
model="text-embedding-3-large",
input=text
).data[0].embedding
# Later, at query time:
query_vec = embed(user_question)
# SELECT slug, content FROM docs ORDER BY embedding <=> $1 LIMIT 10When to use it
- Building semantic search or RAG
- Clustering / deduplication / recommendation
- Similarity-based routing (find the closest FAQ answer)
When NOT to use it
- Keyword / lexical match is what you actually need -- BM25 is simpler and often better
- Domain is super specialized and generic embeddings miss the nuance (fine-tune or use a specialist embedder)
