Reference

Prompt Engineering Glossary

Every term you need, with working example prompts and practical notes on when each technique earns its keep.

52 terms

Agent Handoff

Transferring control (and the relevant context) from one agent to another mid-workflow -- so each segment of a task runs on the agent best equipped for it.

Applied

Agentic RAG

RAG where the model doesn't passively consume retrieved context, but actively decides what to retrieve, iterates queries, and judges retrieval quality before answering.

Agentic

Agentic Workflow

A system where one or more LLMs plan, act, and adapt autonomously over multiple steps -- versus a single-turn Q&A where the model just responds.

Agentic

Agent Memory

Persisted context an agent can recall across turns, sessions, or tasks -- beyond what fits in a single prompt's context window.

Techniques

Chain-of-Thought (CoT)

Prompting the model to produce intermediate reasoning steps before its final answer, improving accuracy on multi-step problems.

Applied

Chunking

Splitting a document into smaller pieces before embedding and indexing -- so retrieval returns coherent, focused passages instead of entire docs.

Agentic

Context Engineering

The practice of deciding what goes into an LLM's context window, in what order, and in what form -- so the model has exactly the signal it needs and none of the noise.

Development

Context Window

The maximum number of tokens (input + output combined) a model can process in a single request. Content past this limit is truncated or requires chunking.

Development

Embedding

A fixed-length vector representation of a piece of text (or image, audio, etc.) produced by an embedding model -- where semantic similarity maps to geometric proximity.

Applied

Evals

Automated test suites for LLM outputs -- measuring accuracy, safety, or task-specific quality metrics against labeled inputs, so you know when a prompt or model change helps or regresses.

Techniques

Few-Shot Prompting

Including 1-5 worked examples of the desired input/output pattern in the prompt so the model can infer the task format and style.

Development

Fine-Tuning

Additional training of a pretrained LLM on task-specific or domain-specific data -- updating the model's weights to specialize it, rather than prompting a generic model.

Development

Function Calling

A provider-specific API feature where the model returns a structured tool-call request (function name + JSON arguments) that your runtime executes and feeds back.

Applied

Grounding

Constraining an LLM's response to information provided in the prompt -- typically retrieved documents -- rather than the model's parametric (trained-in) knowledge.

Security

Guardrails

Programmatic checks -- before, during, or after LLM generation -- that enforce policy, block unsafe outputs, or validate shape / content independent of what the model would produce on its own.

Applied

Hallucination

An LLM confidently producing content that sounds plausible but is factually wrong, fabricated, or ungrounded -- a fundamental failure mode of all current generative models.

Applied

Hybrid Search

Retrieval that combines sparse (keyword, BM25) and dense (embedding, semantic) signals, typically fusing their rankings into one list -- beats either alone on most real corpora.

Techniques

In-Context Learning (ICL)

The capability of a pretrained LLM to learn a task at inference time from examples provided in the prompt -- no weight updates, no fine-tuning.

Security

Indirect Prompt Injection

An attack where malicious instructions are embedded in content the model fetches (webpages, documents, emails) rather than typed directly by the attacker into the prompt.

Security

Jailbreak

A prompt that bypasses a model's safety training to elicit content or behavior the model was aligned to refuse.

Agentic

LLM-as-Judge

Using an LLM to evaluate or score the output of another LLM (or the same one) against criteria -- automating quality control without human review in the loop.

Development

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning technique that freezes the base model and trains small low-rank adapter matrices alongside it -- cutting GPU memory and training cost by 10-100x.

Techniques

Meta-Prompting

Using a language model to write or improve prompts that another model (or the same one) will execute -- treating prompt engineering itself as a prompt engineering task.

Development

Model Context Protocol (MCP)

An open protocol by Anthropic (2024+) for exposing tools, resources, and prompts from external servers to any LLM client, standardizing the integration layer.

Agentic

Multi-Agent System

A system where two or more specialized LLM agents interact -- via a shared workspace, message passing, or tool output -- to solve a task together.

Techniques

Negative Prompting

Explicitly stating what the model should NOT do, NOT include, or NOT sound like -- in addition to (or instead of) describing the desired output.

Security

OWASP LLM Top 10

A community-maintained catalog of the ten most critical security risks for LLM-integrated applications -- the industry-standard starting point for LLM threat modeling.

Applied

Persona Prompting

Instructing the model to adopt a specific role, expertise level, or character -- typically via the system prompt -- to shape tone, depth, and domain framing.

Techniques

Prompt Chaining

Splitting a complex task into a sequence of smaller prompts where each step's output feeds the next -- versus asking one mega-prompt to do everything.

Techniques

Prompt Decomposition

Identifying the sub-problems inside a single prompt and addressing each explicitly -- in one prompt -- rather than asking the model to figure out the structure itself.

Security

Prompt Injection

An attack where user-controlled or third-party content smuggles new instructions into an LLM's context, overriding or subverting the system prompt.

Security

Prompt Injection Defense

A layered set of mitigations -- input filtering, output constraints, privilege boundaries, behavioral monitoring -- that reduce prompt injection impact. No single defense is sufficient.

Security

Prompt Leaking

An attack that extracts the system prompt, tool definitions, or other hidden context from a deployed LLM -- exposing proprietary prompt IP, credentials, or hints at injection vectors.

Agentic

ReAct Pattern

A prompting pattern where the model alternates between reasoning ("Thought"), acting ("Action" = tool call), and observing ("Observation" = tool result), looping until a final answer.

Security

Red Teaming

Adversarial testing of an LLM system -- deliberately trying to break it -- to surface safety failures, injection vectors, and misuse paths before real users (or attackers) find them.

Agentic

Reflection (Self-Critique)

A pattern where the model reviews its own output against the task, identifies flaws, and produces an improved revision -- in the same prompt or a follow-up call.

Applied

Reranker

A model that takes a candidate list of retrieved documents and a query, and produces a more accurate ranking -- applied after initial retrieval (BM25, vector, or hybrid) as a second pass.

Applied

Retrieval-Augmented Generation (RAG)

A pattern where relevant context is retrieved from a knowledge base (vector store, search index, graph) and injected into the prompt before the model generates its answer.

Techniques

Role Prompting

Assigning the model a concrete professional role (e.g. "senior security engineer", "copy editor") to shape its framing, vocabulary, and priorities -- distinct from broader persona prompting.

Techniques

Self-Consistency

Generating N independent answers to the same prompt (with temperature > 0) and picking the majority answer -- trading compute for reliability.

Applied

Semantic Search

Search that matches on meaning (via embeddings) rather than exact keywords -- finding relevant documents even when they share no surface words with the query.

Techniques

Step-Back Prompting

Before answering a specific question, asking the model to derive the general principle or high-level abstraction that the question falls under -- then using that principle to answer.

Techniques

Structured Output

Constraining a model to produce JSON, XML, or other machine-parseable output conforming to a schema -- so downstream code can consume it reliably.

Agentic

Supervisor-Worker Pattern

An agentic architecture where one high-capability model (supervisor) plans and decomposes, delegating narrow sub-tasks to smaller/cheaper models (workers) that execute.

Techniques

System Prompt

The high-priority, typically hidden instruction that sets a model's persona, rules, and constraints for an entire conversation.

Development

Temperature

A sampling parameter (typically 0.0-2.0) that controls how deterministic vs. creative an LLM's output is. Lower = more predictable, higher = more varied.

Development

Token

The atomic unit of input and output in an LLM. Not a word or a character -- a chunk produced by the model's tokenizer, roughly 3-4 characters of English.

Agentic

Tool Use

Giving a model the ability to call external functions (APIs, databases, code execution) during generation, with the model deciding when and how to invoke them.

Development

Top-P (Nucleus Sampling)

An alternative to temperature: the model samples only from the smallest set of tokens whose cumulative probability exceeds P, ignoring everything below that threshold.

Techniques

Tree of Thoughts (ToT)

A prompting strategy where the model explores multiple reasoning branches, evaluates each, and selects (or combines) the best -- versus linear chain-of-thought which commits to one path.

Development

Vector Database

A storage system optimized for similarity search over high-dimensional embeddings -- returning the K nearest neighbors of a query vector in sublinear time.

Techniques

Zero-Shot Prompting

Asking a model to perform a task with no examples of the expected output -- relying entirely on the instruction and the model's pretrained knowledge.