LLM Gateway
Features

Speech Generation

Generate speech (text-to-speech) using Gemini and OpenAI models through the OpenAI-compatible audio API

Speech Generation

LLMGateway supports text-to-speech (TTS) through the OpenAI-compatible /v1/audio/speech endpoint, powered by Google Gemini and OpenAI speech models.

Available Models

Browse all speech generation models, with up-to-date pricing, on the models page.

Billing varies by model family. Some models are billed on token usage reported by the provider (input text tokens and output audio tokens), while others are billed on input character count (those return audio bytes without usage data). See the models page for each model's exact pricing.

Parameters

ParameterTypeDefaultDescription
modelstringrequiredThe speech model to use
inputstringrequiredThe text to synthesize into speech
voicestringmodelA prebuilt voice. Defaults to Kore (Gemini) or alloy (OpenAI)
response_formatstringmodelAudio format. OpenAI: mp3 (default), opus, aac, flac, wav, pcm. Gemini: wav (default), pcm
instructionsstringOptional style/delivery directive prepended to the input (e.g. "Say cheerfully")
speednumberAccepted for OpenAI compatibility, but not applied by Gemini speech models

Gemini speech models return raw PCM audio. LLMGateway wraps it in a WAV container by default (response_format: "wav"), or returns the raw 16-bit little-endian PCM at 24 kHz when response_format: "pcm" is requested. Other formats such as mp3 are only available on the OpenAI models, which return the audio already encoded in the requested format.

curl

curl -X POST "https://llm-gw.agenzo.com/v1/audio/speech" \
  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash-preview-tts",
    "input": "Hello, welcome to LLM Gateway!",
    "voice": "Kore"
  }' \
  --output speech.wav

OpenAI SDK

Works with the standard OpenAI client library — just point the base URL to LLMGateway.

import OpenAI from "openai";
import { writeFileSync } from "fs";

const openai = new OpenAI({
	apiKey: process.env.LLM_GATEWAY_API_KEY,
	baseURL: "https://llm-gw.agenzo.com/v1",
});

const response = await openai.audio.speech.create({
	model: "gemini-2.5-flash-preview-tts",
	voice: "Kore",
	input: "Hello, welcome to LLM Gateway!",
});

const buffer = Buffer.from(await response.arrayBuffer());
writeFileSync("speech.wav", buffer);

Streaming

Streaming speech responses (chunked audio or stream_format: "sse") are not supported yet. The endpoint always returns the complete audio file in a single response, so there is no low-latency, play-as-you-go output for now.

Voices

Gemini exposes 30 prebuilt voices. A few common ones: Kore, Puck, Zephyr, Charon, Fenrir, Leda, Orus, Aoede. When voice is omitted on a Gemini model, Kore is used.

OpenAI voices include alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer, and verse. When voice is omitted on an OpenAI model, alloy is used.

How is this guide?

Last updated on

On this page

Ready for production?

Ship to production with SSO, audit logs, spend controls, and guardrails your security team will approve.

Explore Enterprise