Skip to main content
Run large language models with Ollama, either locally or through Ollama Cloud. Ollama is a fantastic tool for running models both locally and in the cloud. Local Usage: Run models on your own hardware using the Ollama client. Cloud Usage: Access cloud-hosted models via Ollama Cloud with an API key. Ollama supports multiple open-source models. See the library here. Experiment with different models to find the best fit for your use case. Here are some general recommendations:
  • gpt-oss:120b-cloud is an excellent general-purpose cloud model for most tasks.
  • llama3.3 models are good for most basic use-cases.
  • qwen models perform specifically well with tool use.
  • deepseek-r1 models have strong reasoning capabilities.
  • phi4 models are powerful, while being really small in size.

Authentication (Ollama Cloud Only)

To use Ollama Cloud, set your OLLAMA_API_KEY environment variable. You can get an API key from Ollama Cloud.
export OLLAMA_API_KEY=***
When using Ollama Cloud, the host is automatically set to https://ollama.com. For local usage, no API key is required.

Set up a model

Local Usage

Install ollama and run a model:
run model
ollama run llama3.1
This starts an interactive session with the model. To download the model for use in an Agno agent:
pull model
ollama pull llama3.1

Cloud Usage

For Ollama Cloud, no local Ollama server installation is required. Install the Ollama library, set up your API key as described in the Authentication section above, and access cloud-hosted models directly.

Examples

Local Usage

Once the model is available locally, use the Ollama model class to access it:
from agno.agent import Agent
from agno.models.ollama import Ollama

agent = Agent(
    model=Ollama(id="llama3.1"),
    markdown=True
)

# Print the response in the terminal
agent.print_response("Share a 2 sentence horror story.")

Cloud Usage

When using Ollama Cloud with an API key, the host is automatically set to https://ollama.com. You can omit the host parameter.
from agno.agent import Agent
from agno.models.ollama import Ollama

agent = Agent(
    model=Ollama(id="gpt-oss:120b-cloud"),
    markdown=True
)

# Print the response in the terminal
agent.print_response("Share a 2 sentence horror story.")
View more examples here.

Params

ParameterTypeDefaultDescription
idstr"llama3.2"The name of the Ollama model to use
namestr"Ollama"The name of the model
providerstr"Ollama"The provider of the model
hoststr"http://localhost:11434"The host URL for the Ollama server
timeoutOptional[int]NoneRequest timeout in seconds
formatOptional[str]NoneThe format to return the response in (e.g., “json”)
optionsOptional[Dict[str, Any]]NoneAdditional model options (temperature, top_p, etc.)
keep_aliveOptional[Union[float, str]]NoneHow long to keep the model loaded (e.g., “5m”, 3600 seconds)
templateOptional[str]NoneThe prompt template to use
systemOptional[str]NoneSystem message to use
rawOptional[bool]NoneWhether to return raw response without formatting
streamboolTrueWhether to stream the response
Ollama is a subclass of the Model class and has access to the same params.
I