Speech Generation
Generate speech (text-to-speech) using Gemini and OpenAI models through the OpenAI-compatible audio API
Speech Generation
LLMGateway supports text-to-speech (TTS) through the OpenAI-compatible
/v1/audio/speech endpoint, powered by Google Gemini and OpenAI speech
models.
Available Models
Browse all speech generation models, with up-to-date pricing, on the models page.
Billing varies by model family. Some models are billed on token usage reported by the provider (input text tokens and output audio tokens), while others are billed on input character count (those return audio bytes without usage data). See the models page for each model's exact pricing.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | The speech model to use |
input | string | required | The text to synthesize into speech |
voice | string | model | A prebuilt voice. Defaults to Kore (Gemini) or alloy (OpenAI) |
response_format | string | model | Audio format. OpenAI: mp3 (default), opus, aac, flac, wav, pcm. Gemini: wav (default), pcm |
instructions | string | — | Optional style/delivery directive prepended to the input (e.g. "Say cheerfully") |
speed | number | — | Accepted for OpenAI compatibility, but not applied by Gemini speech models |
Gemini speech models return raw PCM audio. LLMGateway wraps it in a WAV
container by default (response_format: "wav"), or returns the raw 16-bit
little-endian PCM at 24 kHz when response_format: "pcm" is requested.
Other formats such as mp3 are only available on the OpenAI models, which
return the audio already encoded in the requested format.
curl
curl -X POST "https://llm-gw.agenzo.com/v1/audio/speech" \
-H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.5-flash-preview-tts",
"input": "Hello, welcome to LLM Gateway!",
"voice": "Kore"
}' \
--output speech.wavOpenAI SDK
Works with the standard OpenAI client library — just point the base URL to LLMGateway.
import OpenAI from "openai";
import { writeFileSync } from "fs";
const openai = new OpenAI({
apiKey: process.env.LLM_GATEWAY_API_KEY,
baseURL: "https://llm-gw.agenzo.com/v1",
});
const response = await openai.audio.speech.create({
model: "gemini-2.5-flash-preview-tts",
voice: "Kore",
input: "Hello, welcome to LLM Gateway!",
});
const buffer = Buffer.from(await response.arrayBuffer());
writeFileSync("speech.wav", buffer);Streaming
Streaming speech responses (chunked audio or stream_format: "sse") are not
supported yet. The endpoint always returns the complete audio file in a single
response, so there is no low-latency, play-as-you-go output for now.
Voices
Gemini exposes 30 prebuilt voices. A few common ones:
Kore, Puck, Zephyr, Charon, Fenrir, Leda, Orus, Aoede. When
voice is omitted on a Gemini model, Kore is used.
OpenAI voices include alloy, ash, ballad, coral, echo, fable,
nova, onyx, sage, shimmer, and verse. When voice is omitted on an
OpenAI model, alloy is used.
How is this guide?
Last updated on