
Context Engineering Is the New Prompt Engineering: How to Load Your LLM's RAM Like an Operating System
Karpathy was right -- the highest-leverage skill in 2026 isn't writing better prompts, it's deciding what goes into the context window and what stays out
Last June, Shopify CEO Tobi Lutke posted a quiet observation on X: he preferred the term "context engineering" over prompt engineering because it describes the core skill better. A week later, Andrej Karpathy replied with an analogy that spread everywhere.
"The LLM is a CPU. The context window is RAM. You are the operating system."
That framing changed the conversation. Not because it was novel, but because it was precise. Prompt engineering asks "how do I phrase this question?" Context engineering asks "what information does this model need to solve this problem, and what should I keep out?"
Those are very different skills.
Why "Write Better Prompts" Stopped Being Enough
A 2026 DataHub report found that 82% of IT and data leaders say prompt engineering alone is no longer sufficient for AI at scale. And 89% of those teams plan to invest in context management infrastructure within the next 12 months.
The reason is simple. Modern LLMs are good at following instructions. They are bad at working with the wrong information. A perfectly worded prompt fed into a context window stuffed with irrelevant documents will produce confident, well-structured garbage.
MIT's 2025 State of AI in Business found that 95% of enterprise AI pilots delivered zero measurable ROI. The diagnosis wasn't model quality. It was lack of context. Teams were throwing documents at models without thinking about what those models actually needed to see.
Here's the uncomfortable finding: even with 100% perfect retrieval, performance drops 13.9% to 85% as input length increases. Researchers at arXiv replaced all irrelevant tokens with whitespace to make the target information trivially obvious. Performance still degraded. Sheer length itself imposes a tax on LLM reasoning.
More context is not better context. Less, chosen well, almost always wins.
The Four Operations: Write, Select, Compress, Isolate
LangChain published a practical framework that maps cleanly to how you should think about context assembly. Four operations, each solving a different problem.
Write means saving information outside the context window so you can retrieve it later. An agent that dumps its research into a scratchpad file and keeps only a summary in-context is using Write. Your context window is expensive real estate. Treat it that way.
Select means pulling in only what's relevant for this specific step. This is where RAG lives, but it's bigger than RAG. Tool filtering is a Select operation too. One study found that providing 5 relevant tools from a set of 50 (instead of all 50) improved task accuracy by 3x. The model spent less time confused about which tool to use and more time using the right one.
Compress means reducing token count without losing signal. Claude Code does this automatically -- when context hits ~115K tokens, it compacts down to ~60K while preserving the information that matters. You should be doing this deliberately in your own pipelines.
Isolate means splitting work across sub-agents with separate context windows. Cognition (the team behind the Devin coding agent) revealed they use fine-tuned summarization models specifically at agent-to-agent handoff boundaries. When one sub-agent finishes and passes work to another, they compress the context rather than forwarding raw transcripts. Each agent gets a clean, focused window.
These aren't theoretical categories. They're operations you can implement today.
A Prompt Example: Context-Aware System Instructions
Most people write system prompts that tell the model what to do. Context engineering means also telling it what it's working with and what to ignore.
The Prompt (naive version):
You are a helpful assistant. Answer the user's question about our product.
[Entire 200-page product documentation dump]
User question: How do I reset my API key?
The Prompt (context-engineered version):
You are a technical support agent for Acme API Platform.
## What you're looking at
The user is asking about API key management. Below is the relevant
section from our docs (v3.2, last updated 2026-05-01). ONLY use
this section to answer. If the answer isn't here, say so.
## Relevant documentation
[2-page excerpt: API Key Management section only]
## What to ignore
Do not reference pricing, billing, or account deletion workflows.
The user has not asked about those topics.
## Response format
Step-by-step instructions. Max 5 steps. Include the exact URL for
the dashboard page where this action happens.
User question: How do I reset my API key?
Why This Works: The second prompt does three things the first doesn't. It scopes the context (only the relevant docs section), sets explicit boundaries on what to ignore, and specifies the output format. The model isn't swimming through 200 pages trying to figure out what matters -- you already made that decision.
Expected Output:
To reset your API key:
- Log in to your dashboard at dashboard.acme.dev/settings/api
- Click "API Keys" in the left sidebar
- Find the key you want to reset and click the three-dot menu
- Select "Regenerate Key"
- Copy the new key immediately -- it won't be shown again
Your old key will stop working within 60 seconds of regeneration.
The difference isn't the quality of the instruction. It's the quality of the context selection.
Each Model Rewards Different Context Strategies
This is where context engineering gets interesting. The three major frontier models in May 2026 respond differently to how you assemble context.
Claude Opus 4.7 (1M token window) holds accuracy across its full context range with less than 5% degradation. It handles large codebases well. If you need to load 15 source files and reason across all of them, Claude is where you do that. The strategy: front-load critical files, but don't over-compress. Claude can handle the full picture.
Gemini 3.1 Pro (1M tokens, 2M for enterprise) scored 94.3% on GPQA Diamond and excels at multimodal context. If your task involves images, diagrams, code, and text all mixed together, Gemini's retrieval across that mixture is the strongest. The strategy: use mixed-modality context when you have it. A screenshot of an error is worth more than a text description of that error.
GPT-5.5 (400K tokens) is the strongest at agentic terminal work, scoring 82.7% on Terminal-Bench. It's best when you give it tools and let it fetch its own context rather than pre-loading everything. The strategy: define tools clearly, give it access to search or file read capabilities, and let it pull what it needs on each step.
There's no single "best" approach. The right context strategy depends on which model you're using and what kind of task you're running.
The Budget That Actually Matters
Here's a number that should change how you think about context: input tokens account for 70-85% of total API spend.
Every irrelevant document you stuff into context isn't just hurting accuracy. It's burning money. Context compression techniques (reducing token count by 50-70%), prompt caching (saving ~90% on repeated context blocks), and model routing (using smaller models for simple tasks) can cut total costs by 60-80%.
Also, effective context capacity is roughly 60-70% of the advertised maximum. A model claiming 200K tokens typically becomes unreliable around 130K. The drop isn't gradual -- it's sudden. Budget for 130K, not 200K.
Context Engineering Is Systems Thinking
Lutke made another observation that stuck with me. He said experienced engineers turned out to be far better at prompting AI systems than people with "prompt engineer" in their title. The reason: they'd accumulated thousands of problem-solving reps. They knew what information mattered and what was noise.
Context engineering isn't a new discipline you need to hire for. It's systems thinking applied to LLM interactions. What does this model need to see? What will confuse it? How do I structure information so the critical parts are impossible to miss?
If you've ever debugged a system by narrowing down the inputs until you found the one that caused the failure, you already know how to do this.
The shift from prompt engineering to context engineering is the shift from "how do I ask" to "what do I load." Start treating your context window like RAM. Be deliberate about every token you put in it.
Want hands-on training on context engineering and prompt design for your team? Connect with Kief Studio on Discord or schedule a session.
Training
Want your team prompting like this?
Kief Studio runs hands-on prompt engineering workshops tailored to your stack and workflows.
Newsletter
Get techniques in your inbox.
New prompt engineering guides delivered weekly. No spam, unsubscribe anytime.
Subscribe
