Context Compression

Context Compression allows you to manage your agent context while it is running, helping the agent stay within its context window and avoid rate limits or decreases in response quality. Think of it like a research assistant who reads lengthy reports and gives you the key bullet points instead of the full documents.

The Problem: Verbose Tool Results

If you are using tools with large response sizes, without compression, tool results quickly consume your context window:

Component	Cumulative Token Count	Notes
System Prompt	1,200 tokens
User Message	1,300 tokens
LLM Response	1,500 tokens
Tool Call 1	2,500 tokens
Tool Call 2	5,700 tokens	2,500 + 3,200 new
Tool Call 3	8,500 tokens	5,700 + 2,800 new
Tool Call 4	12,000 tokens	8,500 + 3,500 new

This quickly becomes expensive and hits context limits during complex workflows.

The Solution: Automatic Compression

Context compression summarizes tool results after a threshold:

Tool Call 1: 2,500 tokens
Tool Call 2: 5,700 tokens
Tool Call 3: 8,500 tokens
[Compression triggered]
Tool Call 4: 1,300 tokens (800 compressed + 500 new)

Benefits:

Dramatically reduced token costs
Stay within context window limits
Preserve critical facts and data
Automatic compression

How It Works

Context compression follows a simple pattern:

Enable Compression

Set compress_tool_results=True on your agent or team. This comes with a default threshold of 3 tool calls. The system monitors tool call results as they come in.

Threshold Reached

After the threshold is reached, compression is triggered. Each uncompressed tool call result is individually summarized.

Intelligent Summarization

The compression model preserves key facts (numbers, dates, entities, URLs) while removing boilerplate, redundancy, and filler text.

The LLM loop continues

The compressed tool results are used in the next LLM executions, reducing token usage and extending the life of your context window.

When using arun on Agent or Team, compression is handled asynchronously and the uncompressed tool call results are summarised concurrently.

Enable Compression

Turn on compress_tool_results=True to automatically compress tool results. This comes with a default threshold of 3 tool calls. For example:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.duckduckgo import DuckDuckGoTools

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGoTools()],
    compress_tool_results=True,
)

agent.print_response("Research each of the following topics: AI, Crypto, Web3, and Blockchain")

You can also enable compress_tool_results=True on individual team members to compress their tool results independently.

Custom Compression

Provide a CompressionManager to customize the compression behavior:

from agno.agent import Agent
from agno.compression.manager import CompressionManager
from agno.models.openai import OpenAIChat
from agno.tools.duckduckgo import DuckDuckGoTools

compression_manager = CompressionManager(
    model=OpenAIChat(id="gpt-4o-mini"),  # Use a faster model for compression
    compress_tool_results_limit=2,  # Compress after 2 tool calls (default: 3)
    compress_tool_call_instructions="Your custom compression prompt here...",
)

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGoTools()],
    compression_manager=compression_manager,
)

agent.print_response("Find recent funding rounds for AI startups")

Use a faster, cheaper model like gpt-4o-mini for compression to reduce latency and cost while using a more capable model as your Agent’s main model.

When to Use Context Compression

Perfect for:

Agents with tools that return verbose results (web search, APIs)
Multi-step workflows with many tool calls
Long-running sessions where context accumulates
Production systems where cost matters

Developer Resources

CompressionManager Reference - Full CompressionManager documentation
Agent Reference - Agent parameter documentation
Team Reference - Team parameter documentation
Cookbook Examples

Get Started

Basics

Context Management

Execution Control

Additional Features

Integrations

Help

Context Compression

The Problem: Verbose Tool Results

The Solution: Automatic Compression

How It Works

Enable Compression

Custom Compression

When to Use Context Compression

Developer Resources

Get Started

Basics

Context Management

Execution Control

Additional Features

Integrations

Help

​The Problem: Verbose Tool Results

​The Solution: Automatic Compression

​How It Works

​Enable Compression

​Custom Compression

​When to Use Context Compression

​Developer Resources

The Problem: Verbose Tool Results

The Solution: Automatic Compression

How It Works

Enable Compression

Custom Compression

When to Use Context Compression

Developer Resources