Reference

Prompt Engineering Glossary

Every term you need, with working example prompts and practical notes on when each technique earns its keep.

10 terms  in Development

Development

Context Window

The maximum number of tokens (input + output combined) a model can process in a single request. Content past this limit is truncated or requires chunking.

Development

Embedding

A fixed-length vector representation of a piece of text (or image, audio, etc.) produced by an embedding model -- where semantic similarity maps to geometric proximity.

Development

Fine-Tuning

Additional training of a pretrained LLM on task-specific or domain-specific data -- updating the model's weights to specialize it, rather than prompting a generic model.

Development

Function Calling

A provider-specific API feature where the model returns a structured tool-call request (function name + JSON arguments) that your runtime executes and feeds back.

Development

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning technique that freezes the base model and trains small low-rank adapter matrices alongside it -- cutting GPU memory and training cost by 10-100x.

Development

Model Context Protocol (MCP)

An open protocol by Anthropic (2024+) for exposing tools, resources, and prompts from external servers to any LLM client, standardizing the integration layer.

Development

Temperature

A sampling parameter (typically 0.0-2.0) that controls how deterministic vs. creative an LLM's output is. Lower = more predictable, higher = more varied.

Development

Token

The atomic unit of input and output in an LLM. Not a word or a character -- a chunk produced by the model's tokenizer, roughly 3-4 characters of English.

Development

Top-P (Nucleus Sampling)

An alternative to temperature: the model samples only from the smallest set of tokens whose cumulative probability exceeds P, ignoring everything below that threshold.

Development

Vector Database

A storage system optimized for similarity search over high-dimensional embeddings -- returning the K nearest neighbors of a query vector in sublinear time.