Development

Token

The atomic unit of input and output in an LLM. Not a word or a character -- a chunk produced by the model's tokenizer, roughly 3-4 characters of English.

First published April 14, 2026

Everything an LLM sees and produces is tokens. Your prompt, its response, the context window limit, the per-request cost -- all measured in tokens. Different models use different tokenizers (BPE, SentencePiece, tiktoken variants), so "1000 tokens" on GPT is not identical to 1000 on Claude.

Rules of thumb for English prose: 1 token ≈ 0.75 words, ≈ 4 characters. Code tokenizes denser (more tokens per char) because of punctuation and identifiers. Whitespace counts. A 1000-word article ≈ 1300 tokens.

Example Prompt

# Approximating cost before you call

word_count = len(text.split())
estimated_tokens = int(word_count * 1.3)  # English prose
estimated_cost_usd = estimated_tokens * (INPUT_PRICE_PER_M / 1_000_000)

# Exact count via tokenizer:
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
exact_tokens = len(enc.encode(text))

When to use it

  • Estimating cost before calling
  • Deciding whether content fits the context window
  • Budgeting prompts and retrievals in a RAG pipeline

When NOT to use it

  • You don't have a tokenizer handy -- rough char-count approximation is good enough for first pass