The AI Gateway for Developers

One endpoint, every model, no markup. Built-in routing, failover, and observability across text, image, video, and audio.

Get an API Key Read the Docs

TypeScript

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-5.5',
  prompt: 'Why is the sky blue?'
})

andmore

0%

markup on inference

Pay provider prices, with no platform fee.

Routing, billing, and observability in one place

Use a single API key and dashboard to access models, track spend, and keep workloads resilient.

One API key, hundreds of models

Unified billing and observability across your entire AI stack, with text, image, and video models.

Diagram of multiple AI models connected by lines to a single unified endpoint, representing seamless access to different models through one gateway.

Built-in failovers, better uptime

Automatic fallbacks during provider outages so your app stays up even when a model goes down.

Graphic showing several API and service icons converging into one central AI Gateway node, symbolizing simplified API management and centralized billing.

No markup on provider list prices

Pay exactly what providers charge. No platform fees, including when you bring your own key.

Circular flowchart with AI provider icons linked in a ring to a central triangle, illustrating automatic failover and high availability during service outages.

“Moving to the gateway is just so ergonomic. We get references to model names, and rely on Vercel to do the correct implementations and handle the edge cases.”

Rob, co-founder of Zo Computer

How Zo Computer improved AI reliability 20×

20×

improvement in AI reliability after switching.

30s

to adopt a new model, down from an hour of code.

25%

improvement in average latency on model calls.

Production-ready, from request one.

Security, spend controls, and compatibility built for teams running real traffic.

Security and compliance

Route only to ZDR providers. No training on data, no prompt logging. Configure custom rules per request or team-wide.

Track everything

Track tokens, requests, spend, and performance across your AI stack. Cap spend with per-key budgets and per-user quotas.

Drop-in compatible

Use the API formats you already have. Migrate existing apps to AI Gateway in minutes, no SDK rewrites required.

Security and compliance

Route only to ZDR providers, no training or prompt logging, configurable per request or team-wide.

Zero Data Retention

Enforced on every request across your team. Routes only to providers under a ZDR agreement.

No training on your data

Route only to providers that will not train on customer data, configurable per request.

Provider allowlist

Restrict your team to approved providers. Enforced on every request, no code changes.

See everything. Set controls.

Track usage, spend, and performance across your AI stack in real time. Set per-key budgets and per-user quotas to cap spend, and tag every request so finance and engineering see the same picture.

See everything.

Dashboard observability

Usage, spend, requests, TTFT, and token counts at team, API key, and project scope.

Custom Reporting API

Tag requests by user, customer, feature, or environment. Slice spend in the dashboard or pull into your own systems.

Set controls.

Per-user spending limits

Cap spend for individual users on shared keys. No separate API key required.

API key management

Create, rotate, and revoke keys, and set per-key budgets that stop requests at the limit. All from the dashboard or CLI.

Short-lived OIDC tokens

Authenticate with OIDC tokens that expire automatically. No static credentials to leak or rotate.

Drop-in compatible

Use the API formats you already have. Move existing apps to AI Gateway with a base URL swap.

The open-source AI toolkit designed to help developers build AI-powered applications and agents with React, Next.js, Vue, Svelte, Node.js, and more.

Learn more about the AI SDK

Seamless migration

Point your existing OpenAI or Anthropic SDK at AI Gateway. Same calls, no rewrites.

Bring your own keys

Bring your own provider keys with no platform fee. Existing commitments flow through.

Models on day zero

Immediate launch partner with major labs. New models work the minute they ship.

29 June29 Jun
xAI Grok audio models now available on Vercel AI Gateway
29 June29 Jun
Realtime voice, speech, and transcription now supported on AI Gateway
24 June24 Jun
GLM 5.2 Fast via Wafer now available on AI Gateway

Every modality in one place.

Text, image, video, realtime, speech, transcription, embeddings, and reranking through one endpoint.

Text

Access the latest from every major model lab and provider. Power your AI features all through a single endpoint.

OpenAI

Vercel is a cloud platform for deploying and hosting websites, apps, and serverless functions with speed, scalability, and simplicity. It gives teams one workflow from development to global delivery, so products ship faster while keeping reliability and performance high across every request. With robust developer tooling and seamless integrations, Vercel enables engineering teams to collaborate efficiently and manage code from preview to production. Its edge network and automatic scaling help you serve users globally with minimal latency and maximum uptime, so you can focus on building great products while Vercel handles infrastructure, deployments, and optimizations to ensure a fast, consistent user experience at scale.

[0.234, -0.198, 0.567, 0.012, -0.823, 0.445, -0.671, 0.298, 0.789, -0.234, 0.617, 0.082, -0.301, 0.519, -0.448, 0.176, 0.902, -0.057, 0.388, -0.741, 0.263, 0.014, -0.598, 0.831, -0.122, 0.476, 0.249, -0.687, 0.354, 0.911, -0.205, 0.066, 0.498, -0.379, 0.732, -0.461, 0.187, 0.853, -0.092, 0.541, 0.318, -0.226, 0.679, -0.518, 0.034, 0.793, -0.347, 0.612, 0.158, -0.806, 0.273, 0.469, -0.135, 0.587, 0.821, -0.044, 0.396, -0.752, 0.218, 0.503, -0.661, 0.129, 0.874, -0.317, 0.452, 0.085, -0.539, 0.706, -0.198, 0.361, 0.927, -0.475, 0.244, -0.683, 0.518, 0.039, -0.792, 0.155, 0.642, -0.288, 0.471, 0.836, -0.107, 0.523, -0.366, 0.018, 0.748, -0.491, 0.265, 0.882, -0.146, 0.397, 0.609, -0.728, 0.184, 0.456, -0.029, 0.713, -0.554, 0.298, 0.067, -0.421, 0.836, -0.193, 0.512, 0.347, -0.768, 0.124, 0.658, -0.385, 0.901, 0.071, -0.469, 0.234, 0.587, -0.812, 0.156, 0.473, -0.628, 0.319, 0.052, -0.741, 0.486, 0.207, -0.563, 0.894, -0.135, 0.428, 0.671, -0.298, 0.519, 0.084, -0.756, 0.347, 0.918, -0.412, 0.176, -0.689, 0.245, 0.561, -0.328, 0.073, 0.842, -0.197, 0.453, -0.726, 0.288, 0.617, 0.039, -0.504, 0.871, -0.265, 0.392, 0.158, -0.673, 0.527, 0.084, -0.439, 0.916, -0.221, 0.358, 0.495, -0.782, 0.146, 0.629, -0.317, 0.058, 0.847, -0.473, 0.196, 0.534, -0.628, 0.279, 0.913, -0.045, 0.461, 0.184, -0.752, 0.398, 0.625, -0.171, 0.043, 0.789, -0.526, 0.314, 0.867, -0.082, 0.471, -0.638, 0.219, 0.582, -0.395, 0.146, 0.704, -0.273, 0.519, 0.038, -0.846, 0.187, 0.493, -0.561, 0.328, 0.075, -0.412, 0.901, -0.246]

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Image

Generate and edit with the latest image models, no extra setup.

Google

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Video

New

Ship production-ready video across a wide range of models from a single prompt.

xAI

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Realtime

New

Build voice and live multimodal experiences with realtime models through a single endpoint.

OpenAI

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Speech

New

Turn text into natural, expressive speech with the latest text-to-speech models, all through one endpoint.

ElevenLabs

0:00

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Transcription

New

Transcribe audio to text with leading speech-to-text models, no extra setup.

Deepgram

You can build and host many different types of applications from static sites with your favorite framework, multi-tenant applications or micro-frontends to AI-powered agents. Deploy globally in seconds, scale automatically with traffic, and ship every change with preview deployments, observability, and built-in security on every request.

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Embeddings

Vector embeddings for search, retrieval, and RAG pipelines, with every major provider available through one endpoint.

OpenAI

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Reranking

Improve retrieval relevance by reordering results before they hit your model.

Cohere

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Text

Access the latest from every major model lab and provider. Power your AI features all through a single endpoint.

Image

Generate and edit with the latest image models, no extra setup.

Video

New

Ship production-ready video across a wide range of models from a single prompt.

Realtime

New

Build voice and live multimodal experiences with realtime models through a single endpoint.

Speech

New

Turn text into natural, expressive speech with the latest text-to-speech models, all through one endpoint.

Transcription

New

Transcribe audio to text with leading speech-to-text models, no extra setup.

Embeddings

Vector embeddings for search, retrieval, and RAG pipelines, with every major provider available through one endpoint.

Reranking

Improve retrieval relevance by reordering results before they hit your model.

OpenAI

1Cache invalidation guide
2How streaming works
3Routing edge cases
4Provider failover patterns

Works with the tools your team already uses.

Route the most popular AI coding agents through AI Gateway with a base URL change. Get unified observability and spend tracking across every tool, no matter who built it.

Claude Code

Anthropic’s coding agent. Route it through AI Gateway’s Anthropic-compatible endpoint for observability, spend tracking, and failover. Works with Claude Code Max too.

OpenAI Codex

OpenAI’s coding agent. Route it through AI Gateway’s Responses API for observability, spend tracking, and retries. Usage joins the rest of your AI spend.

OpenCode

Open-source terminal coding agent with native AI Gateway support. Connect once for observability, spend tracking, and failover, then switch between any model on the fly.

Blackbox AI

Terminal CLI for AI code generation and debugging. Route it through AI Gateway for observability, spend tracking, and failover across every model in the catalog.

Cline

Autonomous coding agent for VS Code. Select Vercel AI Gateway as the provider for observability, spend tracking, and failover, with detailed token and cache metrics.

Grok Build

xAI’s terminal coding agent. Point it at AI Gateway with two environment variables for observability, spend tracking, and failover. The model picker pulls the full catalog.

See all supported coding agents

Get Started

This quickstart walks you through making your first text generation request with AI Gateway.

Read Quick Start

Set Up Your Project

Create a new directory and initialize a Node.js project.

Terminal

mkdir ai-text-demo
cd ai-text-demo
pnpm init

Install Dependencies

Install the AI SDK and development dependencies.

Terminal

npm install ai dotenv @types/node tsx typescript

Terminal

yarn add ai dotenv @types/node tsx typescript

Terminal

pnpm add ai dotenv @types/node tsx typescript

Terminal

bun add ai dotenv @types/node tsx typescript

Set Up Your API Key

Go to the AI Gateway API Keys page in your Vercel dashboard and click Create Key to generate a new API Key.

Create a .env.local file and save your API Key.

.env.local

AI_GATEWAY_API_KEY=your_ai_gateway_api_key

Instead of using an API Key, you can use OIDC tokens to authenticate your requests.

Create and Run Your Script

Create the index.ts file.

index.ts

import { streamText } from 'ai';
import 'dotenv/config';

async function main() {
  const result = streamText({
    model: 'openai/gpt-5.5',
    prompt: 'Invent a new holiday and describe its traditions.',
  });

  for await (const textPart of result.textStream) {
    process.stdout.write(textPart);
  }

  console.log();
  console.log('Token usage:', await result.usage);
  console.log('Finish reason:', await result.finishReason);
}

main().catch(console.error);

Run your script.

Terminal

pnpm tsx index.ts

You should see the AI model’s response stream to your terminal.

Next Steps

Frequently Asked Questions

How is AI Gateway priced?

We offer tokens at list price from the upstream providers with no markup, including when you bring your own keys. Certain capabilities are available at higher plan tiers and metered separately. See the pricing page for details.

What's the difference between using AI Gateway and going direct to each provider?

AI Gateway gives you one integration, automatic failover, unified spend tracking, and one invoice across every major provider. Going direct means signing N contracts and stitching together N billing dashboards.

Will AI Gateway work with our existing AI stack?

Almost certainly. AI Gateway supports the AI SDK, OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and an OpenResponses-compatible endpoint. Migrating is typically a base URL swap with no code changes.

Which modalities does AI Gateway support?

Text, image, video, embeddings, and reranking, all through the same endpoint. Browse the full model catalog for specific models and providers.

How does AI Gateway handle our enterprise security and compliance requirements?

AI Gateway supports Zero Data Retention routing, a no-training guarantee, and team-wide provider allowlists. See the security overview for full details.

What observability does AI Gateway provide out of the box?

A dashboard with usage, spend, request volume, TTFT, and token counts, broken down by model, provider, and project. For deeper analysis, the Custom Reporting API lets you pull the same data into your own tools.

Can we use our existing provider contracts and committed spend?

Yes, through BYOK. Bring your own keys for almost every supported provider and your existing commitments flow through. We try BYOK first and only fall back to system credentials on failure.

Do I pay per request or get invoiced?

AI Gateway uses pre-purchased credits by default. Top up in the dashboard and usage is drawn down per request. Enterprise customers can switch to a single consolidated invoice from Vercel covering every provider in their routing pool. For invoicing, reach out to sales for more details.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

The AI Gateway for Developers

One API key, hundreds of models

Built-in failovers, better uptime

No markup on provider list prices

Security and compliance

Track everything

Drop-in compatible

Zero Data Retention

No training on your data

Provider allowlist

See everything.

Set controls.

Seamless migration

Bring your own keys

Models on day zero

xAI Grok audio models now available on Vercel AI Gateway

Realtime voice, speech, and transcription now supported on AI Gateway

GLM 5.2 Fast via Wafer now available on AI Gateway

Claude Code

OpenAI Codex

OpenCode

Blackbox AI

Cline

Grok Build

Get Started

Frequently Asked Questions