The Problem: Growing Memory Costs
When users have many memories, each conversation becomes more expensive. Imagine the following scenario: User with 50 memories:- Each memory: ~50 tokens
- Total memory context: 50 × 50 = 2,500 tokens per conversation
- Each memory: ~50 tokens
- Total memory context: 200 × 50 = 10,000 tokens per conversation
The Solution: Memory Optimization
Memory optimization uses LLM-based summarization to combine multiple memories into fewer, more efficient memories. The default Summarize strategy combines all memories into a single comprehensive summary, dramatically reducing token usage while preserving all factual information. Benefits:- Dramatically reduced token costs - Often 50-80% reduction
- Preserves all key information - Facts, preferences, and context maintained
- Automatic metadata preservation - Topics, user_id, agent_id, team_id retained
- Preview before applying - Test optimization without saving changes
- Flexible strategies - Extensible system for custom optimization approaches
How It Works
Memory optimization follows a simple pattern:1
Retrieve Memories
All memories for a user are retrieved from the database.
2
Apply Strategy
The selected optimization strategy (e.g., Summarize) processes the memories using an LLM to create optimized versions.
3
Replace or Preview
If
apply=True, optimized memories replace the originals. If apply=False, returns preview without saving.Optimize Memories for a User
The simplest way to optimize memories is through theMemoryManager:
Preview Optimization Results
Before applying optimization, you can preview the results:Async Usage
For async applications, useaoptimize_memories:
Optimization Strategies
Summarize Strategy (Default)
TheSUMMARIZE strategy combines all memories into a single comprehensive summary:
- Best for: Most use cases, maximum token reduction
- Result: All memories combined into 1 memory
- Preserves: All factual information, topics, metadata
- Before: ~50 tokens × 5 memories = 250 tokens
- After: ~40 tokens × 1 memory = 40 tokens
- Savings: 84% reduction
Custom Strategies
You can create custom optimization strategies by implementing theMemoryOptimizationStrategy interface:
Measuring Token Savings
You can measure token savings by comparing the number of tokens before and after optimization (using thecount_tokens method):
API Usage (AgentOS)
Memory optimization is also available through the AgentOS API:Limitations
- Loss of granularity: Multiple memories become one, making individual memory retrieval less precise
- LLM cost: Optimization itself requires an LLM call (use cheaper models)
- One-way operation: Once optimized, original memories are replaced (use
apply=Falseto preview)
Examples
See the Memory Optimization Example for a complete working example that demonstrates:- Creating multiple memories for a user
- Optimizing memories using the summarize strategy
- Measuring token savings
- Viewing optimized results
- Summarize Strategy - Basic optimization example
- Custom Strategy - Creating custom optimization strategies
Related Documentation
- Memory Overview - Learn about memory basics
- Production Best Practices - Memory optimization strategies
- Working with Memories - Advanced memory patterns