Development

Embedding

A fixed-length vector representation of a piece of text (or image, audio, etc.) produced by an embedding model -- where semantic similarity maps to geometric proximity.

First published April 14, 2026

Embeddings convert text into ~1536-dim float vectors. "king" and "queen" land near each other; "king" and "banana" don't. This makes similarity a math operation (cosine distance, dot product) instead of a string match.

Every semantic search, RAG system, and recommendation engine built on LLMs depends on embeddings. 2026 production options: OpenAI text-embedding-3-large, Cohere embed-v4, Voyage voyage-3, open-weight BAAI/bge-m3. Cost is trivial (~$0.10/M tokens). The harder problems are chunking strategy and reranking, not the embedding call itself.

Example Prompt

# Embed a chunk, store in pgvector
from openai import OpenAI
client = OpenAI()

def embed(text: str) -> list[float]:
    return client.embeddings.create(
        model="text-embedding-3-large",
        input=text
    ).data[0].embedding

# Later, at query time:
query_vec = embed(user_question)
# SELECT slug, content FROM docs ORDER BY embedding <=> $1 LIMIT 10

When to use it

  • Building semantic search or RAG
  • Clustering / deduplication / recommendation
  • Similarity-based routing (find the closest FAQ answer)

When NOT to use it

  • Keyword / lexical match is what you actually need -- BM25 is simpler and often better
  • Domain is super specialized and generic embeddings miss the nuance (fine-tune or use a specialist embedder)