Realtime voice, speech, and transcription now supported on AI Gateway

AI Gateway now supports voice and audio models. You can build realtime voice agents, generate speech from text, and transcribe audio to text. This provides the same observability, spend controls, and bring-your-own-key support as text, image, and video models in AI Gateway, with no markup or platform fees. These capabilities are in beta and available via AI SDK 7.

With realtime support, a single model takes audio in and audio out, so a user can talk and hear a reply back in near real time instead of waiting on a chain of separate models.

Capability	What it does
Realtime voice agents	Model listens to the user, works out a response, and speaks it back in a live, low-latency conversation. It can call your tools mid-conversation to look something up or take an action. The `useRealtime` hook handles microphone capture and playback.
Text to speech	Generate spoken audio from text, with a selectable voice and output format such as MP3. Use it for voiceovers, audio versions of written content, and spoken responses.
Speech to text	Transcribe recordings into text, from a file buffer, base64 string, or URL. Use it for voice notes or other transcriptions.

Two ways to get started:

Follow the realtime example below or the realtime quickstart to add a voice agent to your app.
Use the playground. Talk to a realtime model in the browser, no code required, in the AI Gateway Playground.

Link to headingRealtime example

A voice agent has two pieces: a server route that mints a short-lived token, so your API key never reaches the client, and a browser component that connects with it.

Add the token route:

app/api/realtime/token/route.ts

import { gateway } from '@ai-sdk/gateway';
export async function POST() {
  const { token, url } = await gateway.experimental_realtime.getToken({
    model: 'openai/gpt-realtime-2',
  });
  return Response.json({ token, url, tools: [] });
}

Then connect from the browser. The useRealtime hook fetches that route and manages the WebSocket connection, microphone capture, and audio playback:

'use client';
import { experimental_useRealtime as useRealtime } from '@ai-sdk/react';
import { gateway } from '@ai-sdk/gateway';
// Inside a client component:
const { status, connect, startAudioCapture } = useRealtime({
  model: gateway.experimental_realtime('openai/gpt-realtime-2'),
  api: { token: '/api/realtime/token' },
  sessionConfig: { voice: 'alloy', turnDetection: { type: 'server-vad' } },
});
// Call connect(), then startAudioCapture(stream) to start talking.

Link to headingPlayground

You can also try audio models without writing any code. Open the models page, click into a model, and interact with it right in the browser:

Talk to a realtime model to hold a voice conversation
Send text and have a transcription model read it back
Speak to an audio model and have it transcribe your words

For more information on realtime voice, speech, and transcription models on AI Gateway, see the documentation. To view a list of all the supported realtime voice, speech, and transcription models on AI Gateway, check the full list here.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Realtime voice, speech, and transcription now supported on AI Gateway

Link to headingRealtime example

Link to headingPlayground

Ready to deploy?