Reference
Prompt Engineering Glossary
Every term you need, with working example prompts and practical notes on when each technique earns its keep.
52 terms
Agent Handoff
Transferring control (and the relevant context) from one agent to another mid-workflow -- so each segment of a task runs on the agent best equipped for it.
AppliedAgentic RAG
RAG where the model doesn't passively consume retrieved context, but actively decides what to retrieve, iterates queries, and judges retrieval quality before answering.
AgenticAgentic Workflow
A system where one or more LLMs plan, act, and adapt autonomously over multiple steps -- versus a single-turn Q&A where the model just responds.
AgenticAgent Memory
Persisted context an agent can recall across turns, sessions, or tasks -- beyond what fits in a single prompt's context window.
TechniquesChain-of-Thought (CoT)
Prompting the model to produce intermediate reasoning steps before its final answer, improving accuracy on multi-step problems.
AppliedChunking
Splitting a document into smaller pieces before embedding and indexing -- so retrieval returns coherent, focused passages instead of entire docs.
AgenticContext Engineering
The practice of deciding what goes into an LLM's context window, in what order, and in what form -- so the model has exactly the signal it needs and none of the noise.
DevelopmentContext Window
The maximum number of tokens (input + output combined) a model can process in a single request. Content past this limit is truncated or requires chunking.
DevelopmentEmbedding
A fixed-length vector representation of a piece of text (or image, audio, etc.) produced by an embedding model -- where semantic similarity maps to geometric proximity.
AppliedEvals
Automated test suites for LLM outputs -- measuring accuracy, safety, or task-specific quality metrics against labeled inputs, so you know when a prompt or model change helps or regresses.
TechniquesFew-Shot Prompting
Including 1-5 worked examples of the desired input/output pattern in the prompt so the model can infer the task format and style.
DevelopmentFine-Tuning
Additional training of a pretrained LLM on task-specific or domain-specific data -- updating the model's weights to specialize it, rather than prompting a generic model.
DevelopmentFunction Calling
A provider-specific API feature where the model returns a structured tool-call request (function name + JSON arguments) that your runtime executes and feeds back.
AppliedGrounding
Constraining an LLM's response to information provided in the prompt -- typically retrieved documents -- rather than the model's parametric (trained-in) knowledge.
SecurityGuardrails
Programmatic checks -- before, during, or after LLM generation -- that enforce policy, block unsafe outputs, or validate shape / content independent of what the model would produce on its own.
AppliedHallucination
An LLM confidently producing content that sounds plausible but is factually wrong, fabricated, or ungrounded -- a fundamental failure mode of all current generative models.
AppliedHybrid Search
Retrieval that combines sparse (keyword, BM25) and dense (embedding, semantic) signals, typically fusing their rankings into one list -- beats either alone on most real corpora.
TechniquesIn-Context Learning (ICL)
The capability of a pretrained LLM to learn a task at inference time from examples provided in the prompt -- no weight updates, no fine-tuning.
SecurityIndirect Prompt Injection
An attack where malicious instructions are embedded in content the model fetches (webpages, documents, emails) rather than typed directly by the attacker into the prompt.
SecurityJailbreak
A prompt that bypasses a model's safety training to elicit content or behavior the model was aligned to refuse.
AgenticLLM-as-Judge
Using an LLM to evaluate or score the output of another LLM (or the same one) against criteria -- automating quality control without human review in the loop.
DevelopmentLoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning technique that freezes the base model and trains small low-rank adapter matrices alongside it -- cutting GPU memory and training cost by 10-100x.
TechniquesMeta-Prompting
Using a language model to write or improve prompts that another model (or the same one) will execute -- treating prompt engineering itself as a prompt engineering task.
DevelopmentModel Context Protocol (MCP)
An open protocol by Anthropic (2024+) for exposing tools, resources, and prompts from external servers to any LLM client, standardizing the integration layer.
AgenticMulti-Agent System
A system where two or more specialized LLM agents interact -- via a shared workspace, message passing, or tool output -- to solve a task together.
TechniquesNegative Prompting
Explicitly stating what the model should NOT do, NOT include, or NOT sound like -- in addition to (or instead of) describing the desired output.
SecurityOWASP LLM Top 10
A community-maintained catalog of the ten most critical security risks for LLM-integrated applications -- the industry-standard starting point for LLM threat modeling.
AppliedPersona Prompting
Instructing the model to adopt a specific role, expertise level, or character -- typically via the system prompt -- to shape tone, depth, and domain framing.
TechniquesPrompt Chaining
Splitting a complex task into a sequence of smaller prompts where each step's output feeds the next -- versus asking one mega-prompt to do everything.
TechniquesPrompt Decomposition
Identifying the sub-problems inside a single prompt and addressing each explicitly -- in one prompt -- rather than asking the model to figure out the structure itself.
SecurityPrompt Injection
An attack where user-controlled or third-party content smuggles new instructions into an LLM's context, overriding or subverting the system prompt.
SecurityPrompt Injection Defense
A layered set of mitigations -- input filtering, output constraints, privilege boundaries, behavioral monitoring -- that reduce prompt injection impact. No single defense is sufficient.
SecurityPrompt Leaking
An attack that extracts the system prompt, tool definitions, or other hidden context from a deployed LLM -- exposing proprietary prompt IP, credentials, or hints at injection vectors.
AgenticReAct Pattern
A prompting pattern where the model alternates between reasoning ("Thought"), acting ("Action" = tool call), and observing ("Observation" = tool result), looping until a final answer.
SecurityRed Teaming
Adversarial testing of an LLM system -- deliberately trying to break it -- to surface safety failures, injection vectors, and misuse paths before real users (or attackers) find them.
AgenticReflection (Self-Critique)
A pattern where the model reviews its own output against the task, identifies flaws, and produces an improved revision -- in the same prompt or a follow-up call.
AppliedReranker
A model that takes a candidate list of retrieved documents and a query, and produces a more accurate ranking -- applied after initial retrieval (BM25, vector, or hybrid) as a second pass.
AppliedRetrieval-Augmented Generation (RAG)
A pattern where relevant context is retrieved from a knowledge base (vector store, search index, graph) and injected into the prompt before the model generates its answer.
TechniquesRole Prompting
Assigning the model a concrete professional role (e.g. "senior security engineer", "copy editor") to shape its framing, vocabulary, and priorities -- distinct from broader persona prompting.
TechniquesSelf-Consistency
Generating N independent answers to the same prompt (with temperature > 0) and picking the majority answer -- trading compute for reliability.
AppliedSemantic Search
Search that matches on meaning (via embeddings) rather than exact keywords -- finding relevant documents even when they share no surface words with the query.
TechniquesStep-Back Prompting
Before answering a specific question, asking the model to derive the general principle or high-level abstraction that the question falls under -- then using that principle to answer.
TechniquesStructured Output
Constraining a model to produce JSON, XML, or other machine-parseable output conforming to a schema -- so downstream code can consume it reliably.
AgenticSupervisor-Worker Pattern
An agentic architecture where one high-capability model (supervisor) plans and decomposes, delegating narrow sub-tasks to smaller/cheaper models (workers) that execute.
TechniquesSystem Prompt
The high-priority, typically hidden instruction that sets a model's persona, rules, and constraints for an entire conversation.
DevelopmentTemperature
A sampling parameter (typically 0.0-2.0) that controls how deterministic vs. creative an LLM's output is. Lower = more predictable, higher = more varied.
DevelopmentToken
The atomic unit of input and output in an LLM. Not a word or a character -- a chunk produced by the model's tokenizer, roughly 3-4 characters of English.
AgenticTool Use
Giving a model the ability to call external functions (APIs, databases, code execution) during generation, with the model deciding when and how to invoke them.
DevelopmentTop-P (Nucleus Sampling)
An alternative to temperature: the model samples only from the smallest set of tokens whose cumulative probability exceeds P, ignoring everything below that threshold.
TechniquesTree of Thoughts (ToT)
A prompting strategy where the model explores multiple reasoning branches, evaluates each, and selects (or combines) the best -- versus linear chain-of-thought which commits to one path.
DevelopmentVector Database
A storage system optimized for similarity search over high-dimensional embeddings -- returning the K nearest neighbors of a query vector in sublinear time.
TechniquesZero-Shot Prompting
Asking a model to perform a task with no examples of the expected output -- relying entirely on the instruction and the model's pretrained knowledge.
