Memory Systems and Persistence
Overview
Memory systems enable AI agents to retain and utilize information across interactions, improving their ability to maintain context and learn from experience [19].
Memory Types
| Type | Duration | Purpose | Implementation |
|---|---|---|---|
| Short-term | Single session | Working context | In-memory buffer |
| Long-term | Persistent | Knowledge retention | Vector stores, databases |
| Episodic | Persistent | Experience recall | Structured logs |
| Semantic | Persistent | Factual knowledge | Knowledge graphs |
| Procedural | Persistent | How-to knowledge | Tool definitions |
Short-term Memory
Short-term memory maintains context within a single conversation or task.
Implementation Strategies
- Buffer Memory: Store recent N messages
- Window Memory: Sliding window of recent context
- Summary Memory: Compress older context into summaries
- Token Buffer: Limit by token count rather than message count
Example
# LangChain Buffer Memory
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
# Add to conversation
memory.save_context(
{"input": "Hello, I'm Alice"},
{"output": "Hello Alice! How can I help you today?"}
)
# Retrieve history
history = memory.load_memory_variables({})
print(history["chat_history"])Long-term Memory
Long-term memory persists information across sessions, enabling agents to remember user preferences, past interactions, and accumulated knowledge [20].
Storage Options
| Storage | Best For | Retrieval Method |
|---|---|---|
| Vector Stores | Semantic search | Similarity search |
| Key-Value Stores | Fast lookups | Exact match |
| Relational DBs | Structured data | SQL queries |
| Graph DBs | Relationships | Graph traversal |
MemGPT Architecture
MemGPT (now Letta) introduces a hierarchical memory system inspired by operating systems [21].
Memory Hierarchy
| Level | Analogy | Characteristics |
|---|---|---|
| Main Context | RAM | Limited, fast access, in LLM context |
| Archival Memory | Disk | Unlimited, slower access, searchable |
| Recall Memory | Cache | Recent interactions, quick retrieval |
Key Concepts
- Self-Editing Memory: Agent can modify its own memory
- Memory Paging: Swap information in/out of context
- Memory Search: Query archival storage as needed
LangGraph Memory and Persistence
LangGraph provides built-in support for memory and state persistence [22].
Checkpointing
LangGraph supports checkpointing for state persistence:
from langgraph.checkpoint.sqlite import SqliteSaver
# Create checkpointer
checkpointer = SqliteSaver.from_conn_string(":memory:")
# Create graph with checkpointing
graph = StateGraph(State)
# ... add nodes and edges ...
app = graph.compile(checkpointer=checkpointer)
# Run with thread ID for persistence
config = {"configurable": {"thread_id": "user_123"}}
result = app.invoke({"messages": [...]}, config)Memory Backends
- SQLite: Local file-based persistence
- PostgreSQL: Production-grade persistence
- Redis: Fast in-memory with optional persistence
- Custom: Implement your own checkpoint saver
Memory Management Strategies
Compression
- Summarize older conversations
- Extract key facts and discard details
- Use hierarchical summarization
Prioritization
- Score memories by relevance and recency
- Implement forgetting curves
- Prioritize frequently accessed information
Organization
- Categorize memories by type
- Use metadata for filtering
- Build knowledge graphs for relationships
Implementation Example
# Hybrid Memory System
class HybridMemory:
def __init__(self):
self.short_term = [] # Recent messages
self.long_term = VectorStore() # Persistent knowledge
self.max_short_term = 10
def add_message(self, message: str, role: str):
# Add to short-term
self.short_term.append({"role": role, "content": message})
# Trim if needed
if len(self.short_term) > self.max_short_term:
# Summarize and store in long-term
summary = self.summarize(self.short_term[:5])
self.long_term.add(summary)
self.short_term = self.short_term[5:]
def get_context(self, query: str) -> str:
# Get relevant long-term memories
relevant = self.long_term.search(query, k=3)
# Combine with short-term
context = "\n".join([m["content"] for m in relevant])
context += "\n" + "\n".join([
f"{m['role']}: {m['content']}"
for m in self.short_term
])
return contextBest Practices
- Separate Concerns: Use different stores for different memory types
- Implement Forgetting: Not all information needs to be retained
- Index Appropriately: Use the right retrieval method for each use case
- Handle Conflicts: Decide how to handle contradictory memories
- Privacy Considerations: Be mindful of what information is stored