Skip to main content
Pass fallback_models to any Agent or Team. If the primary model fails after exhausting its retries, each fallback is tried in order until one succeeds.
from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    fallback_models=[Claude(id="claude-sonnet-4-20250514")],
)
If gpt-4o fails after exhausting its own retries, Claude is tried automatically. Model strings work too:
from agno.agent import Agent

agent = Agent(
    model="openai:gpt-4o",
    fallback_models=["anthropic:claude-sonnet-4-20250514"],
)

Usage with Teams

Fallback models apply to the team leader’s model calls. Member agents keep their own models and are not affected by the leader’s fallback config.
from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
from agno.team import Team

researcher = Agent(
    name="Researcher",
    role="You research topics and provide detailed findings.",
    model=OpenAIChat(id="gpt-4o-mini"),
)

writer = Agent(
    name="Writer",
    role="You write clear, concise summaries from research findings.",
    model=OpenAIChat(id="gpt-4o-mini"),
)

team = Team(
    name="Research Team",
    model=OpenAIChat(id="gpt-4o"),
    fallback_models=[Claude(id="claude-sonnet-4-20250514")],
    members=[researcher, writer],
    markdown=True,
)

Error-Specific Fallbacks

FallbackConfig lets you route different error types to different fallback models. Instead of a flat list, you specify which models to try for rate limits, context window overflows, and general errors separately.
from agno.agent import Agent
from agno.models.fallback import FallbackConfig
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    fallback_config=FallbackConfig(
        # On rate-limit (429/529) errors
        on_rate_limit=[
            OpenAIChat(id="gpt-4o-mini"),
            Claude(id="claude-sonnet-4-20250514"),
        ],
        # On context-window-exceeded errors
        on_context_overflow=[
            Claude(id="claude-sonnet-4-20250514"),
        ],
        # General fallback for any other retryable error
        on_error=[
            Claude(id="claude-sonnet-4-20250514"),
        ],
    ),
)

Error routing

When the primary model fails, the error is classified and routed to the matching fallback list:
Error TypeFallback ListExample
Rate limit (429/529)on_rate_limitProvider throttling, Anthropic overloaded
Context window exceededon_context_overflowInput too long for model’s context window
Other retryable errorson_errorServer errors (5xx), network failures
If a specific list (like on_rate_limit) is empty, on_error is used as a catch-all. Non-retryable client errors like 400, 401, 403, 404, and 422 are not caught by fallback. These indicate configuration problems (bad API key, invalid request) that need to be fixed rather than masked by switching models.

Fallback Callback

Use the callback parameter to get notified whenever a fallback model is activated. This is useful for logging, metrics, or alerting.
from agno.agent import Agent
from agno.models.fallback import FallbackConfig
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat


def on_fallback(primary_model_id: str, fallback_model_id: str, error: Exception) -> None:
    print(f"[fallback] {primary_model_id} -> {fallback_model_id} (reason: {error})")


agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    fallback_config=FallbackConfig(
        on_error=[Claude(id="claude-sonnet-4-20250514")],
        callback=on_fallback,
    ),
)
The callback fires after the fallback model succeeds. For streaming calls, it fires after the full stream completes.

Retry vs. Fallback

Retry and fallback are separate layers. Retry happens inside each model. Fallback only triggers after the primary model’s retry loop is fully exhausted.
Primary model
  └── _invoke_with_retry()        # retries N times (per model config)
On failure
  └── classify error type
  └── select matching fallback list
  └── try each fallback in order
        └── fallback._invoke_with_retry()   # each fallback retries independently
Each model controls its own retry behavior:
agent = Agent(
    model=OpenAIChat(id="gpt-4o", retries=3, exponential_backoff=True),
    fallback_models=[
        Claude(id="claude-sonnet-4-20250514", retries=2),
    ],
)
The primary model retries 3 times with exponential backoff. Only after all 3 attempts fail does the fallback kick in, and it gets 2 retries of its own.

Streaming

Fallback works with streaming responses. If the primary model fails mid-stream, the fallback model takes over and the response content is reset so the consumer receives a clean response from the fallback model only.

Parameters

Available on both Agent and Team:
ParameterTypeDescription
fallback_modelsList[Model | str]Models tried in order on any failure. Shorthand for FallbackConfig(on_error=...).
fallback_configFallbackConfigError-specific routing. Takes precedence over fallback_models if both are set.

FallbackConfig

FieldTypeDescription
on_errorList[Model | str]General fallback for any retryable error.
on_rate_limitList[Model | str]Fallback for rate-limit (429/529) errors. Falls back to on_error if empty.
on_context_overflowList[Model | str]Fallback for context-window-exceeded errors. Falls back to on_error if empty.
callbackCallable[[str, str, Exception], None]Called when a fallback model is activated. Receives (primary_model_id, fallback_model_id, error).

Developer Resources