Agent memory divides into three flavors: short-term (this conversation's history, usually just context management), long-term (facts about the user, prior decisions, saved across sessions), and semantic (retrievable knowledge the agent has accumulated).
Short-term is solved by growing context windows + summary compression. Long-term is solved by a key-value store keyed on user + fact type. Semantic is solved by a vector store + retrieval. Most production agents need at least short-term + some kind of long-term. Frameworks like LangGraph's persistent store or Claude's Projects ship purpose-built memory layers.
Example Prompt
Memory API:
Short-term: full conversation transcript in context (up to window limit).
Long-term: after each turn, call memory_write(key, value) for any stable fact:
memory_write("user.timezone", "America/Los_Angeles")
memory_write("user.preferences.tone", "direct, no fluff")
Semantic: embed and store any factual claim the user made.
At the start of each turn, auto-inject:
- Last 10 messages of short-term
- All long-term memories matching user_id
- Top 5 semantic matches to the current messageWhen to use it
- Agents where continuity matters (assistants, long-running workflows)
- Tasks that outlive a single context window
- Personalization requires stable user state
When NOT to use it
- Stateless single-turn tasks
- You can't manage privacy / retention / deletion of stored memory
- The memory store becomes a liability (regression in retrieved context)
