Introduction
Every developer knows the problem: AI agents forget. You have a conversation, come back the next day, and it’s like meeting someone with complete amnesia. The agent doesn’t remember who you are, what you discussed, or what it promised to do.
This is the memory problem - and it’s one of the most critical challenges in building useful AI agents. This guide covers everything about agent memory systems, from basic context windows to sophisticated multi-layered architectures.
The Memory Problem
Why Agents Forget
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ WHY AGENTS FORGET โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Context Window Limit โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โ [User: Hi] [Agent: Hello!] [User: My name is Bob] โ โ
โ โ [Agent: Nice to meet you, Bob] โ โ
โ โ [User: What's my name?] โ โ
โ โ [Agent: I don't know] โ FORGOT! โ โ
โ โ โ โ
โ โ Context full - oldest messages dropped โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Session Boundaries โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Monday: "Remember my preferences..." โ โ
โ โ Tuesday: "What were my preferences?" โ NO MEMORY โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Types of Memory
| Type | Duration | Example |
|---|---|---|
| Working | Seconds | Current conversation |
| Episodic | Days | Specific interactions |
| Semantic | Forever | Facts and knowledge |
| Procedural | Forever | How to use tools |
Memory Architecture
The Memory Hierarchy
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MEMORY HIERARCHY โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ LONG-TERM MEMORY โ โ
โ โ (Weeks, Months, Forever) โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ
โ โ โ Semantic โ โ Procedural โ โ Episodic โ โ โ
โ โ โ Memory โ โ Memory โ โ Memory โ โ โ
โ โ โ (Facts) โ โ (Skills) โ โ (Events) โ โ โ
โ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ SHORT-TERM MEMORY โ โ
โ โ (Minutes to Hours) โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ
โ โ โ Working โ โ Context โ โ โ
โ โ โ Memory โ โ Window โ โ โ
โ โ โ (Active) โ โ (LLM Input) โ โ โ
โ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Implementation Patterns
1. Simple Context Management
# Basic context window management
class SimpleContextManager:
def __init__(self, max_tokens: int = 8000):
self.max_tokens = max_tokens
self.messages = []
def add_message(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
self._trim_if_needed()
def _trim_if_needed(self):
while self._count_tokens() > self.max_tokens:
# Remove oldest non-system message
for i, msg in enumerate(self.messages):
if msg["role"] != "system":
self.messages.pop(i)
break
def get_context(self) -> list:
return self.messages
2. MemGPT-Style Architecture
# MemGPT-inspired memory architecture
from dataclasses import dataclass
from enum import Enum
from typing import List, Optional
class MemoryLevel(Enum):
CORE = "core" # Always in context
RECENT = "recent" # Last N messages
ARCHIVAL = "archival" # Everything else
@dataclass
class MemoryItem:
content: str
level: MemoryLevel
importance: float
timestamp: float
class MemGPTMemory:
def __init__(
self,
core_capacity: int = 2000,
recent_capacity: int = 6000,
vector_store=None
):
self.core = [] # Always in context
self.recent = [] # Recent messages
self.archival = [] # Vector-stored
self.core_capacity = core_capacity
self.recent_capacity = recent_capacity
self.vector_store = vector_store
def add(self, content: str, importance: float = 0.5):
item = MemoryItem(
content=content,
level=MemoryLevel.RECENT,
importance=importance,
timestamp=time.time()
)
self.recent.append(item)
self._manage_memory()
def _manage_memory(self):
# Check if recent is full
recent_tokens = sum(self._estimate_tokens(m.content) for m in self.recent)
if recent_tokens > self.recent_capacity:
# Move to archival
oldest = self.recent.pop(0)
self.archival.append(oldest)
# Embed and store
embedding = self._embed(oldest.content)
self.vector_store.add(
id=str(uuid.uuid4()),
embedding=embedding,
content=oldest.content
)
def get_context(self) -> str:
# Build context from all levels
parts = []
# Core memory (always)
core_text = "\n".join([m.content for m in self.core])
parts.append(f"CORE MEMORY:\n{core_text}")
# Recent (last N)
recent_text = "\n".join([m.content for m in self.recent[-10:]])
parts.append(f"RECENT:\n{recent_text}")
return "\n\n".join(parts)
def search(self, query: str, k: int = 5) -> List[str]:
# Semantic search in archival
query_embedding = self._embed(query)
results = self.vector_store.search(query_embedding, k=k)
return [r["content"] for r in results]
3. Multi-Layer Memory
# Complete multi-layer memory system
class AgentMemory:
def __init__(self, user_id: str):
self.user_id = user_id
# Layers
self.working_memory = WorkingMemory()
self.episodic_memory = EpisodicMemory(user_id)
self.semantic_memory = SemanticMemory(user_id)
self.procedural_memory = ProceduralMemory()
async def remember_interaction(self, interaction: dict):
"""Store a complete interaction"""
# Extract important facts
facts = self.extract_facts(interaction)
for fact in facts:
await self.semantic_memory.add(fact)
# Store episode
await self.episodic_memory.add(interaction)
# Update working memory
self.working_memory.update(interaction)
async def retrieve(self, query: str) -> MemoryContext:
"""Retrieve relevant memories"""
# Search all layers
recent = self.working_memory.get_recent()
episodic = await self.episodic_memory.search(query)
semantic = await self.semantic_memory.search(query)
procedural = await self.procedural_memory.get_relevant(query)
return MemoryContext(
recent=recent,
episodic=episodic,
semantic=semantic,
procedural=procedural
)
def build_prompt(self, query: str) -> str:
"""Build full context prompt"""
context = asyncio.run(self.retrieve(query))
parts = []
if context.recent:
parts.append(f"Recent conversation:\n{context.recent}")
if context.semantic:
facts = "\n".join([f"- {s}" for s in context.semantic[:5]])
parts.append(f"Known facts:\n{facts}")
if context.procedural:
skills = "\n".join([p for p in context.procedural])
parts.append(f"Relevant skills:\n{skills}")
return "\n\n".join(parts)
class WorkingMemory:
"""Short-term memory for current session"""
def __init__(self, capacity: int = 20):
self.messages = []
self.capacity = capacity
def add(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
if len(self.messages) > self.capacity:
self.messages.pop(0)
def get_recent(self, n: int = 10) -> str:
msgs = self.messages[-n:]
return "\n".join([f"{m['role']}: {m['content']}" for m in msgs])
class EpisodicMemory:
"""Memory for specific interactions"""
def __init__(self, user_id: str):
self.user_id = user_id
self.db = TinyVectorDB()
async def add(self, interaction: dict):
embedding = await embed(interaction["summary"])
await self.db.add(
id=str(uuid.uuid4()),
embedding=embedding,
metadata=interaction
)
async def search(self, query: str, k: int = 3) -> List[dict]:
embedding = await embed(query)
results = await self.db.search(embedding, k=k)
return [r["metadata"] for r in results]
class SemanticMemory:
"""Facts and knowledge about user"""
def __init__(self, user_id: str):
self.user_id = user_id
self.store = {} # key -> value
async def add(self, fact: str):
key = self.extract_key(fact)
self.store[key] = fact
async def search(self, query: str) -> List[str]:
# Simple keyword matching
results = []
for fact in self.store.values():
if any(word in fact.lower() for word in query.lower().split()):
results.append(fact)
return results[:5]
class ProceduralMemory:
"""How to do things"""
def __init__(self):
self.procedures = {}
async def add(self, name: str, steps: List[str]):
self.procedures[name] = steps
async def get_relevant(self, query: str) -> List[str]:
results = []
for name, steps in self.procedures.items():
if name in query.lower():
results.append(f"{name}: " + " -> ".join(steps))
return results
Memory Storage Solutions
Redis
# Redis for fast memory storage
import redis
import json
class RedisMemoryStore:
def __init__(self, redis_url: str):
self.redis = redis.from_url(redis_url)
async def save_conversation(self, user_id: str, messages: list):
key = f"memory:conversation:{user_id}"
await self.redis.rpush(key, *json.dumps(messages))
await self.redis.expire(key, 86400 * 30) # 30 days
async def get_conversation(self, user_id: str, limit: int = 50):
key = f"memory:conversation:{user_id}"
messages = await self.redis.lrange(key, -limit, -1)
return [json.loads(m) for m in messages]
async def save_fact(self, user_id: str, fact: dict):
key = f"memory:facts:{user_id}"
await self.redis.hset(key, fact["key"], json.dumps(fact))
async def get_facts(self, user_id: str) -> dict:
key = f"memory:facts:{user_id}"
facts = await self.redis.hgetall(key)
return {k: json.loads(v) for k, v in facts.items()}
Vector Database
# Pinecone/Weaviate for semantic search
from pinecone import Pinecone
class VectorMemoryStore:
def __init__(self, api_key: str, index_name: str):
self.pc = Pinecone(api_key=api_key)
self.index = self.pc.Index(index_name)
async def add_memory(self, memory_id: str, content: str, metadata: dict):
embedding = await get_embedding(content)
await self.index.upsert(vectors=[{
"id": memory_id,
"values": embedding,
"metadata": {"content": content, **metadata}
}])
async def search(self, query: str, k: int = 5, filter: dict = None) -> list:
query_embedding = await get_embedding(query)
results = await self.index.query(
vector=query_embedding,
top_k=k,
filter=filter,
include_metadata=True
)
return [match["metadata"] for match in results["matches"]]
Memory Best Practices
Good: Prioritize Important Information
# Good: Importance-based memory
class PrioritizedMemory:
def __init__(self):
self.memories = []
async def add(self, content: str, importance: float = 0.5):
# Higher importance = kept longer
self.memories.append({
"content": content,
"importance": importance,
"timestamp": time.time()
})
self.memories.sort(key=lambda x: x["importance"], reverse=True)
def get_context(self, max_tokens: int) -> str:
# Always include high importance
selected = []
total_tokens = 0
for mem in self.memories:
tokens = estimate_tokens(mem["content"])
if total_tokens + tokens <= max_tokens:
selected.append(mem)
total_tokens += tokens
elif mem["importance"] > 0.8:
# Always include critical info
selected.append(mem)
return "\n".join([m["content"] for m in selected])
Bad: Store Everything
# Bad: No organization, just append
class BadMemory:
def __init__(self):
self.all = []
def add(self, content: str):
self.all.append(content) # Never clean up!
# Will eventually hit token limits
Good: Automatic Summarization
class SummarizingMemory:
def __init__(self, llm, max_messages: int = 50):
self.llm = llm
self.messages = []
self.max_messages = max_messages
def add(self, message: dict):
self.messages.append(message)
if len(self.messages) > self.max_messages:
# Summarize old messages
old_messages = self.messages[:len(self.messages)//2]
summary = await self.llm.summarize(old_messages)
self.messages = [{"role": "system", "content": f"Summary: {summary}"}] + self.messages[len(self.messages)//2:]
Memory Evaluation
Benchmarking Memory Systems
# Simple memory evaluation
class MemoryEvaluator:
def __init__(self, memory_system, test_cases: list):
self.memory = memory_system
self.tests = test_cases
async def run(self) -> dict:
results = {
"retention": [],
"retrieval_accuracy": [],
"context_relevance": []
}
for test in self.tests:
# Store memories
for fact in test["facts"]:
await self.memory.add(fact)
# Test retrieval
retrieved = await self.memory.retrieve(test["query"])
# Calculate metrics
results["retrieval_accuracy"].append(
self.calculate_accuracy(retrieved, test["expected"])
)
return results
def calculate_accuracy(self, retrieved, expected) -> float:
hits = sum(1 for r in retrieved if any(e in r for e in expected))
return hits / len(expected) if expected else 0
Advanced Patterns
1. Memory Consolidation
# Periodic memory consolidation
class MemoryConsolidator:
async def consolidate(self, memory: AgentMemory):
# Get all episodic memories from last week
recent = await memory.episodic_memory.get_range(
start=week_ago,
end=yesterday
)
# Extract patterns
patterns = await self.extract_patterns(recent)
# Store as semantic memory
for pattern in patterns:
await memory.semantic_memory.add(pattern)
# Archive original episodes
await self.archive(recent)
async def extract_patterns(self, episodes: list) -> List[str]:
# Use LLM to find patterns
prompt = f"""
Analyze these interactions and extract recurring patterns:
{episodes}
Return 3-5 patterns as short statements.
"""
return await self.llm.extract(prompt)
2. Memory Decay
# Important memories last longer
class DecayingMemory:
def __init__(self):
self.memories = {}
def add(self, key: str, content: str, importance: float):
ttl = 3600 * (importance * 24) # Hours based on importance
self.memories[key] = {
"content": content,
"importance": importance,
"expires": time.time() + ttl
}
def get(self, key: str) -> Optional[str]:
mem = self.memories.get(key)
if not mem:
return None
if time.time() > mem["expires"]:
del self.memories[key]
return None
# Extend life for accessed memories
mem["expires"] = time.time() + (3600 * mem["importance"])
return mem["content"]
Tools & Frameworks
| Tool | Type | Best For |
|---|---|---|
| MemGPT | Framework | Research, experimentation |
| Letta | Platform | Production agents |
| LangChain | Library | Integration |
| Redis | Storage | Fast, ephemeral |
| Pinecone | Vector DB | Semantic search |
| Chroma | Vector DB | Local development |
Conclusion
Memory is what separates truly useful AI agents from glorified chatbots. The key patterns are:
- Layer your memory - Working, episodic, semantic, procedural
- Manage context actively - Don’t just append, organize and prioritize
- Use vector search - For finding relevant past information
- Consolidate periodically - Extract patterns from interactions
The best memory system depends on your use case. Start simple, measure, and add complexity as needed.
Related Articles
- Building Production AI Agents
- Agent-to-Agent Protocol: A2A
- Model Context Protocol: MCP
- Introduction to Agentic AI
Comments