Ollama

Run large language models with Ollama, either locally or through Ollama Cloud. Ollama is a fantastic tool for running models both locally and in the cloud. Local Usage: Run models on your own hardware using the Ollama client. Cloud Usage: Access cloud-hosted models via Ollama Cloud with an API key. Ollama supports multiple open-source models. See the library here. Experiment with different models to find the best fit for your use case. Here are some general recommendations:

gpt-oss:120b-cloud is an excellent general-purpose cloud model for most tasks.
llama3.3 models are good for most basic use-cases.
qwen models perform specifically well with tool use.
deepseek-r1 models have strong reasoning capabilities.
phi4 models are powerful, while being really small in size.

Authentication (Ollama Cloud Only)

To use Ollama Cloud, set your OLLAMA_API_KEY environment variable. You can get an API key from Ollama Cloud.

export OLLAMA_API_KEY=***

When using Ollama Cloud, the host is automatically set to https://ollama.com. For local usage, no API key is required.

Set up a model

Local Usage

Install ollama and run a model:

run model

ollama run llama3.1

This starts an interactive session with the model. To download the model for use in an Agno agent:

pull model

ollama pull llama3.1

Cloud Usage

For Ollama Cloud, no local Ollama server installation is required. Install the Ollama library, set up your API key as described in the Authentication section above, and access cloud-hosted models directly.

Examples

Local Usage

Once the model is available locally, use the Ollama model class to access it:

from agno.agent import Agent
from agno.models.ollama import Ollama

agent = Agent(
    model=Ollama(id="llama3.1"),
    markdown=True
)

# Print the response in the terminal
agent.print_response("Share a 2 sentence horror story.")

Cloud Usage

When using Ollama Cloud with an API key, the host is automatically set to https://ollama.com. You can omit the host parameter.

from agno.agent import Agent
from agno.models.ollama import Ollama

agent = Agent(
    model=Ollama(id="gpt-oss:120b-cloud"),
    markdown=True
)

# Print the response in the terminal
agent.print_response("Share a 2 sentence horror story.")

View more examples here.

Params

Parameter	Type	Default	Description
`id`	`str`	`"llama3.2"`	The name of the Ollama model to use
`name`	`str`	`"Ollama"`	The name of the model
`provider`	`str`	`"Ollama"`	The provider of the model
`host`	`str`	`"http://localhost:11434"`	The host URL for the Ollama server
`timeout`	`Optional[int]`	`None`	Request timeout in seconds
`format`	`Optional[str]`	`None`	The format to return the response in (e.g., “json”)
`options`	`Optional[Dict[str, Any]]`	`None`	Additional model options (temperature, top_p, etc.)
`keep_alive`	`Optional[Union[float, str]]`	`None`	How long to keep the model loaded (e.g., “5m”, 3600 seconds)
`template`	`Optional[str]`	`None`	The prompt template to use
`system`	`Optional[str]`	`None`	System message to use
`raw`	`Optional[bool]`	`None`	Whether to return raw response without formatting
`stream`	`bool`	`True`	Whether to stream the response

Ollama is a subclass of the Model class and has access to the same params.

Introduction

Learn

Help

Authentication (Ollama Cloud Only)

Set up a model

Local Usage

Cloud Usage

Examples

Local Usage

Cloud Usage

Params

Introduction

Learn

Help

​Authentication (Ollama Cloud Only)

​Set up a model

​Local Usage

​Cloud Usage

​Examples

​Local Usage

​Cloud Usage

​Params

Authentication (Ollama Cloud Only)

Set up a model

Local Usage

Cloud Usage

Examples

Local Usage

Cloud Usage

Params