OpenAI

OpenAITools allow an Agent to interact with OpenAI models for performing audio transcription, image generation, and text-to-speech.

Prerequisites

Before using OpenAITools, ensure you have the openai library installed and your OpenAI API key configured.

Install the library:
```
pip install -U openai
```
Set your API key: Obtain your API key from OpenAI and set it as an environment variable.
export OPENAI_API_KEY=xxx

Initialization

Import OpenAITools and add it to your Agent’s tool list.

from agno.agent import Agent
from agno.tools.openai import OpenAITools

agent = Agent(
    name="OpenAI Agent",
    tools=[OpenAITools()],
        markdown=True,
)

Usage Examples

1. Transcribing Audio

This example demonstrates an agent that transcribes an audio file.

transcription_agent.py

from pathlib import Path
from agno.agent import Agent
from agno.tools.openai import OpenAITools
from agno.utils.media import download_file

audio_url = "https://agno-public.s3.amazonaws.com/demo_data/sample_conversation.wav"

local_audio_path = Path("tmp/sample_conversation.wav")
download_file(audio_url, local_audio_path)

agent = Agent(
    name="OpenAI Transcription Agent",
    tools=[OpenAITools(transcription_model="whisper-1")],
        markdown=True,
)

agent.print_response(f"Transcribe the audio file located at '{local_audio_path}'")

2. Generating Images

This example demonstrates an agent that generates an image based on a text prompt.

image_generation_agent.py

from agno.agent import Agent
from agno.tools.openai import OpenAITools
from agno.utils.media import save_base64_data

agent = Agent(
    name="OpenAI Image Generation Agent",
    tools=[OpenAITools(image_model="dall-e-3")],
        markdown=True,
)

response = agent.run("Generate a photorealistic image of a cozy coffee shop interior")

if response.images:
    save_base64_data(response.images[0].content, "tmp/coffee_shop.png")

3. Generating Speech

This example demonstrates an agent that generates speech from text.

speech_synthesis_agent.py

from agno.agent import Agent
from agno.tools.openai import OpenAITools
from agno.utils.media import save_base64_data

agent = Agent(
    name="OpenAI Speech Agent",
    tools=[OpenAITools(
        text_to_speech_model="tts-1",
        text_to_speech_voice="alloy",
        text_to_speech_format="mp3"
    )],
    markdown=True,
)

response = agent.run("Generate audio for the text: 'Hello, this is a synthesized voice example.'")
if response and response.audio:
    save_base64_data(response.audio[0].base64_audio, "tmp/hello.mp3")

View more examples here.

Customization

You can customize the underlying OpenAI models used for transcription, image generation, and TTS:

OpenAITools(
    transcription_model="whisper-1",
    image_model="dall-e-3",
    text_to_speech_model="tts-1-hd",
    text_to_speech_voice="nova",
    text_to_speech_format="wav"
)

Toolkit Params

Parameter	Type	Default	Description
`api_key`	`str`	`None`	OpenAI API key. Uses OPENAI_API_KEY env var if not provided
`enable_transcription`	`bool`	`True`	Enable audio transcription functionality
`enable_image_generation`	`bool`	`True`	Enable image generation functionality
`enable_speech_generation`	`bool`	`True`	Enable speech generation functionality
`all`	`bool`	`False`	Enable all tools when set to True
`transcription_model`	`str`	`whisper-1`	Model to use for audio transcription
`text_to_speech_voice`	`str`	`alloy`	Voice to use for text-to-speech (alloy, echo, fable, onyx, nova, shimmer)
`text_to_speech_model`	`str`	`tts-1`	Model to use for text-to-speech (tts-1, tts-1-hd)
`text_to_speech_format`	`str`	`mp3`	Audio format for TTS output (mp3, opus, aac, flac, wav, pcm)
`image_model`	`str`	`dall-e-3`	Model to use for image generation
`image_quality`	`str`	`None`	Quality setting for image generation
`image_size`	`str`	`None`	Size setting for image generation
`image_style`	`str`	`None`	Style setting for image generation (vivid, natural)

Toolkit Functions

The OpenAITools toolkit provides the following functions:

Function	Description
`transcribe_audio`	Transcribes audio from a local file path or a public URL
`generate_image`	Generates images based on a text prompt
`generate_speech`	Synthesizes speech from text

Introduction

Learn

Help

Prerequisites

Initialization

Usage Examples

1. Transcribing Audio

2. Generating Images

3. Generating Speech

Customization

Toolkit Params

Toolkit Functions

Developer Resources

Introduction

Learn

Help

​Prerequisites

​Initialization

​Usage Examples

​1. Transcribing Audio

​2. Generating Images

​3. Generating Speech

​Customization

​Toolkit Params

​Toolkit Functions

​Developer Resources

Prerequisites

Initialization

Usage Examples

1. Transcribing Audio

2. Generating Images

3. Generating Speech

Customization

Toolkit Params

Toolkit Functions

Developer Resources