Chunking is where most RAG systems quietly lose. Naive fixed-size chunks split mid-sentence, strand context, and embed meaningless fragments. Good chunking respects document structure: section headings, paragraphs, code blocks.
2026 practice: semantic chunking (split at topic boundaries via embedding similarity), parent-document retrieval (embed small chunks, but return the surrounding parent section at generation time), hierarchical chunking (index at multiple granularities). Classic fixed 512-token chunks are a last resort.
Example Prompt
# Recursive character-based chunker with overlap
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=150,
separators=["\\n## ", "\\n### ", "\\n\\n", "\\n", ". ", " ", ""],
)
chunks = splitter.split_text(doc_text)
# Each chunk respects heading/paragraph boundaries where possible,
# falls back progressively to smaller splits only if needed.When to use it
- Documents longer than your embedding model's effective context
- Retrieval quality correlates with chunk focus (too big = noisy embedding, too small = missing context)
- Diverse document types (mix code, prose, tables)
When NOT to use it
- Docs are already short and focused -- chunk = whole doc
- You're splitting on character counts without respecting structure (regression)
