Applied

Chunking

Splitting a document into smaller pieces before embedding and indexing -- so retrieval returns coherent, focused passages instead of entire docs.

First published April 14, 2026

Chunking is where most RAG systems quietly lose. Naive fixed-size chunks split mid-sentence, strand context, and embed meaningless fragments. Good chunking respects document structure: section headings, paragraphs, code blocks.

2026 practice: semantic chunking (split at topic boundaries via embedding similarity), parent-document retrieval (embed small chunks, but return the surrounding parent section at generation time), hierarchical chunking (index at multiple granularities). Classic fixed 512-token chunks are a last resort.

Example Prompt

# Recursive character-based chunker with overlap
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=150,
    separators=["\\n## ", "\\n### ", "\\n\\n", "\\n", ". ", " ", ""],
)

chunks = splitter.split_text(doc_text)
# Each chunk respects heading/paragraph boundaries where possible,
# falls back progressively to smaller splits only if needed.

When to use it

  • Documents longer than your embedding model's effective context
  • Retrieval quality correlates with chunk focus (too big = noisy embedding, too small = missing context)
  • Diverse document types (mix code, prose, tables)

When NOT to use it

  • Docs are already short and focused -- chunk = whole doc
  • You're splitting on character counts without respecting structure (regression)