Skip to main content
As your application grows and users accumulate memories, token costs can increase significantly. Memory optimization helps you reduce these costs by combining multiple memories into fewer, more concise memories while preserving all key information.

The Problem: Growing Memory Costs

When users have many memories, each conversation becomes more expensive. Imagine the following scenario: User with 50 memories:
  • Each memory: ~50 tokens
  • Total memory context: 50 × 50 = 2,500 tokens per conversation
User with 200 memories
  • Each memory: ~50 tokens
  • Total memory context: 200 × 50 = 10,000 tokens per conversation
This gets loaded into every agentic memory operation!

The Solution: Memory Optimization

Memory optimization uses LLM-based summarization to combine multiple memories into fewer, more efficient memories. The default Summarize strategy combines all memories into a single comprehensive summary, dramatically reducing token usage while preserving all factual information. Benefits:
  • Dramatically reduced token costs - Often 50-80% reduction
  • Preserves all key information - Facts, preferences, and context maintained
  • Automatic metadata preservation - Topics, user_id, agent_id, team_id retained
  • Preview before applying - Test optimization without saving changes
  • Flexible strategies - Extensible system for custom optimization approaches

How It Works

Memory optimization follows a simple pattern:
1

Retrieve Memories

All memories for a user are retrieved from the database.
2

Apply Strategy

The selected optimization strategy (e.g., Summarize) processes the memories using an LLM to create optimized versions.
3

Replace or Preview

If apply=True, optimized memories replace the originals. If apply=False, returns preview without saving.

Optimize Memories for a User

The simplest way to optimize memories is through the MemoryManager:
from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.memory.strategies.types import MemoryOptimizationStrategyType

# Setup your database
db = SqliteDb(db_file="agno.db")

# Create agent with memory enabled
agent = Agent(
    db=db,
    enable_user_memories=True,
)

memory_manager = MemoryManager(
    db=db,
    model=OpenAIChat(id="gpt-4o-mini"),  # Use a cheaper model for optimization
)

# After a few conversations, optimize memories for a user
optimized = memory_manager.optimize_memories(
    user_id="user_123",
    strategy=MemoryOptimizationStrategyType.SUMMARIZE,
    apply=True,  # Set to False to preview without saving
)

print(f"Optimized {len(optimized)} memories")

Preview Optimization Results

Before applying optimization, you can preview the results:
# Preview optimization without saving
preview = memory_manager.optimize_memories(
    user_id="user_123",
    strategy=MemoryOptimizationStrategyType.SUMMARIZE,
    apply=False,  # Don't save changes
)

# Check the optimized memory
if preview:
    print(f"Original memories: {len(memory_manager.get_user_memories('user_123'))}")
    print(f"Optimized to: {len(preview)} memories")
    print(f"Content: {preview[0].memory}")

Async Usage

For async applications, use aoptimize_memories:
import asyncio
from agno.db.postgres import PostgresDb
from agno.memory import MemoryManager
from agno.models.openai import OpenAIChat

async def optimize_user_memories():
    db = PostgresDb(db_url="postgresql://...")
    memory_manager = MemoryManager(
        db=db,
        model=OpenAIChat(id="gpt-4o-mini"),
    )
    
    # Async optimization
    optimized = await memory_manager.aoptimize_memories(
        user_id="user_123",
        apply=True,
    )
    
    return optimized

# Run async optimization
optimized = asyncio.run(optimize_user_memories())

Optimization Strategies

Summarize Strategy (Default)

The SUMMARIZE strategy combines all memories into a single comprehensive summary:
  • Best for: Most use cases, maximum token reduction
  • Result: All memories combined into 1 memory
  • Preserves: All factual information, topics, metadata
Before Optimization:
Memory 1: "The user's name is John Doe"
Memory 2: "The user likes to play basketball"
Memory 3: "The user's favorite color is blue"
Memory 4: "The user works as a software engineer"
Memory 5: "The user lives in San Francisco"
After Optimization:
Memory 1: "John Doe is a software engineer who lives in San Francisco. 
          He enjoys playing basketball and his favorite color is blue."
Token Savings:
  • Before: ~50 tokens × 5 memories = 250 tokens
  • After: ~40 tokens × 1 memory = 40 tokens
  • Savings: 84% reduction

Custom Strategies

You can create custom optimization strategies by implementing the MemoryOptimizationStrategy interface:
from agno.memory.strategies import MemoryOptimizationStrategy
from agno.db.schemas import UserMemory
from agno.models.base import Model
from typing import List

class CustomOptimizationStrategy(MemoryOptimizationStrategy):
    def optimize(
        self,
        memories: List[UserMemory],
        model: Model,
    ) -> List[UserMemory]:
        # Your custom optimization logic
        # Return optimized list of UserMemory objects
        pass
    
    async def aoptimize(
        self,
        memories: List[UserMemory],
        model: Model,
    ) -> List[UserMemory]:
        # Async version
        pass

# Use custom strategy
custom_strategy = CustomOptimizationStrategy()
optimized = memory_manager.optimize_memories(
    user_id="user_123",
    strategy=custom_strategy,
    apply=True,
)

Measuring Token Savings

You can measure token savings by comparing the number of tokens before and after optimization (using the count_tokens method):
from agno.memory.strategies.summarize import SummarizeStrategy

# Get original memories
original_memories = memory_manager.get_user_memories(user_id="user_123")

# Count tokens before
strategy = SummarizeStrategy()
tokens_before = strategy.count_tokens(original_memories)

# Optimize
optimized = memory_manager.optimize_memories(
    user_id="user_123",
    strategy=strategy,
    apply=True,
)

# Count tokens after
tokens_after = strategy.count_tokens(optimized)

# Calculate savings
tokens_saved = tokens_before - tokens_after
reduction_percentage = (tokens_saved / tokens_before * 100) if tokens_before > 0 else 0

print(f"Tokens before: {tokens_before}")
print(f"Tokens after: {tokens_after}")
print(f"Tokens saved: {tokens_saved} ({reduction_percentage:.1f}% reduction)")

API Usage (AgentOS)

Memory optimization is also available through the AgentOS API:
curl -X POST "https://os.agno.com/api/v1/memory/optimize-memories" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "user_id": "user_123",
    "apply": true,
    "model": "openai:gpt-4o-mini"
  }'
The API returns detailed statistics:
{
  "memories": [...],
  "memories_before": 50,
  "memories_after": 1,
  "tokens_before": 2500,
  "tokens_after": 500,
  "tokens_saved": 2000,
  "reduction_percentage": 80.0
}

Limitations

  • Loss of granularity: Multiple memories become one, making individual memory retrieval less precise
  • LLM cost: Optimization itself requires an LLM call (use cheaper models)
  • One-way operation: Once optimized, original memories are replaced (use apply=False to preview)

Examples

See the Memory Optimization Example for a complete working example that demonstrates:
  • Creating multiple memories for a user
  • Optimizing memories using the summarize strategy
  • Measuring token savings
  • Viewing optimized results
You can also explore the cookbook examples: