Skip to main content
Agno agents support multimodal input and output using text, image, audio, video and files.

Guides

Image As Input

Analyze and describe images with agents.

Image As Output

Return generated images from agent responses.

Image to Text

Convert input image to text.

OpenAI Image Generation

Generate images with OpenAI tool.

Image Generation

Generate images with DALL-E.

Image Analysis in Same Run

Generate and analyze image in the same run.

Image Analysis in Multi-turn Runs

Generate and analyze image in multi-turn runs.

Image I/O with Fal API

Use input image and Fal API to generate new images.

Image to Structured Output

Convert input image to structured output using Pydantic models.

Generate Image with Intermediate Steps

Use DALL-E to generate image with intermediate steps.

High Fidelity Image Analysis

Analyze images with high fidelity.

Image to Audio

Convert input image to audio.

Image input for Tools

Shows how tools can receive and process images.