Skip to content
Dashboard

The AI Gateway for Developers

One endpoint, every model, no markup. Built-in routing, failover, and observability across text, image, video, and audio.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-5.5',
  prompt: 'Why is the sky blue?'
})
andmore
0%

markup on inference

Pay provider prices, with no platform fee.

Routing, billing, and observability in one place

Use a single API key and dashboard to access models, track spend, and keep workloads resilient.

One API key, hundreds of models

Unified billing and observability across your entire AI stack, with text, image, and video models.

Diagram of multiple AI models connected by lines to a single unified endpoint, representing seamless access to different models through one gateway.Diagram of multiple AI models connected by lines to a single unified endpoint, representing seamless access to different models through one gateway.

Built-in failovers, better uptime

Automatic fallbacks during provider outages so your app stays up even when a model goes down.

Graphic showing several API and service icons converging into one central AI Gateway node, symbolizing simplified API management and centralized billing.Graphic showing several API and service icons converging into one central AI Gateway node, symbolizing simplified API management and centralized billing.

No markup on provider list prices

Pay exactly what providers charge. No platform fees, including when you bring your own key.

Circular flowchart with AI provider icons linked in a ring to a central triangle, illustrating automatic failover and high availability during service outages.Circular flowchart with AI provider icons linked in a ring to a central triangle, illustrating automatic failover and high availability during service outages.
Moving to the gateway is just so ergonomic. We get references to model names, and rely on Vercel to do the correct implementations and handle the edge cases.

Rob, co-founder of Zo Computer

How Zo Computer improved AI reliability 20×

20×

improvement in AI reliability after switching.

30s

to adopt a new model, down from an hour of code.

25%

improvement in average latency on model calls.

Production-ready, from request one.

Security, spend controls, and compatibility built for teams running real traffic.

Security and compliance

Route only to ZDR providers. No training on data, no prompt logging. Configure custom rules per request or team-wide.

Track everything

Track tokens, requests, spend, and performance across your AI stack. Cap spend with per-key budgets and per-user quotas.

Drop-in compatible

Use the API formats you already have. Migrate existing apps to AI Gateway in minutes, no SDK rewrites required.

Security and compliance

Route only to ZDR providers, no training or prompt logging, configurable per request or team-wide.

Zero Data Retention

Enforced on every request across your team. Routes only to providers under a ZDR agreement.

No training on your data

Route only to providers that will not train on customer data, configurable per request.

Provider allowlist

Restrict your team to approved providers. Enforced on every request, no code changes.

See everything. Set controls.

Track usage, spend, and performance across your AI stack in real time. Set per-key budgets and per-user quotas to cap spend, and tag every request so finance and engineering see the same picture.

See everything.

Set controls.

Drop-in compatible

Use the API formats you already have. Move existing apps to AI Gateway with a base URL swap.

The open-source AI toolkit designed to help developers build AI-powered applications and agents with React, Next.js, Vue, Svelte, Node.js, and more.

Seamless migration

Point your existing OpenAI or Anthropic SDK at AI Gateway. Same calls, no rewrites.

Bring your own keys

Bring your own provider keys with no platform fee. Existing commitments flow through.

Models on day zero

Immediate launch partner with major labs. New models work the minute they ship.

Every modality in one place.

Text, image, video, realtime, speech, transcription, embeddings, and reranking through one endpoint.

Text
Access the latest from every major model lab and provider. Power your AI features all through a single endpoint.

OpenAI

Vercel is a cloud platform for deploying and hosting websites, apps, and serverless functions with speed, scalability, and simplicity. It gives teams one workflow from development to global delivery, so products ship faster while keeping reliability and performance high across every request. With robust developer tooling and seamless integrations, Vercel enables engineering teams to collaborate efficiently and manage code from preview to production. Its edge network and automatic scaling help you serve users globally with minimal latency and maximum uptime, so you can focus on building great products while Vercel handles infrastructure, deployments, and optimizations to ensure a fast, consistent user experience at scale.

Image
Generate and edit with the latest image models, no extra setup.

Google

Video
New
Ship production-ready video across a wide range of models from a single prompt.

xAI

Realtime
New
Build voice and live multimodal experiences with realtime models through a single endpoint.

OpenAI

Loading...
Speech
New
Turn text into natural, expressive speech with the latest text-to-speech models, all through one endpoint.

ElevenLabs

0:00
0:00
Transcription
New
Transcribe audio to text with leading speech-to-text models, no extra setup.

Deepgram

You can build and host many different types of applications from static sites with your favorite framework, multi-tenant applications or micro-frontends to AI-powered agents. Deploy globally in seconds, scale automatically with traffic, and ship every change with preview deployments, observability, and built-in security on every request.

Embeddings
Vector embeddings for search, retrieval, and RAG pipelines, with every major provider available through one endpoint.

OpenAI

[0.234, -0.198, 0.567, 0.012, -0.823, 0.445, -0.671, 0.298, 0.789, -0.234, 0.617, 0.082, -0.301, 0.519, -0.448, 0.176, 0.902, -0.057, 0.388, -0.741, 0.263, 0.014, -0.598, 0.831, -0.122, 0.476, 0.249, -0.687, 0.354, 0.911, -0.205, 0.066, 0.498, -0.379, 0.732, -0.461, 0.187, 0.853, -0.092, 0.541, 0.318, -0.226, 0.679, -0.518, 0.034, 0.793, -0.347, 0.612, 0.158, -0.806, 0.273, 0.469, -0.135, 0.587, 0.821, -0.044, 0.396, -0.752, 0.218, 0.503, -0.661, 0.129, 0.874, -0.317, 0.452, 0.085, -0.539, 0.706, -0.198, 0.361, 0.927, -0.475, 0.244, -0.683, 0.518, 0.039, -0.792, 0.155, 0.642, -0.288, 0.471, 0.836, -0.107, 0.523, -0.366, 0.018, 0.748, -0.491, 0.265, 0.882, -0.146, 0.397, 0.609, -0.728, 0.184, 0.456, -0.029, 0.713, -0.554, 0.298, 0.067, -0.421, 0.836, -0.193, 0.512, 0.347, -0.768, 0.124, 0.658, -0.385, 0.901, 0.071, -0.469, 0.234, 0.587, -0.812, 0.156, 0.473, -0.628, 0.319, 0.052, -0.741, 0.486, 0.207, -0.563, 0.894, -0.135, 0.428, 0.671, -0.298, 0.519, 0.084, -0.756, 0.347, 0.918, -0.412, 0.176, -0.689, 0.245, 0.561, -0.328, 0.073, 0.842, -0.197, 0.453, -0.726, 0.288, 0.617, 0.039, -0.504, 0.871, -0.265, 0.392, 0.158, -0.673, 0.527, 0.084, -0.439, 0.916, -0.221, 0.358, 0.495, -0.782, 0.146, 0.629, -0.317, 0.058, 0.847, -0.473, 0.196, 0.534, -0.628, 0.279, 0.913, -0.045, 0.461, 0.184, -0.752, 0.398, 0.625, -0.171, 0.043, 0.789, -0.526, 0.314, 0.867, -0.082, 0.471, -0.638, 0.219, 0.582, -0.395, 0.146, 0.704, -0.273, 0.519, 0.038, -0.846, 0.187, 0.493, -0.561, 0.328, 0.075, -0.412, 0.901, -0.246]

Reranking
Improve retrieval relevance by reordering results before they hit your model.

Cohere

  • 1Cache invalidation guide
  • 2How streaming works
  • 3Routing edge cases
  • 4Provider failover patterns

Works with the tools your team already uses.

Route the most popular AI coding agents through AI Gateway with a base URL change. Get unified observability and spend tracking across every tool, no matter who built it.

Claude Code

Anthropic’s coding agent. Route it through AI Gateway’s Anthropic-compatible endpoint for observability, spend tracking, and failover. Works with Claude Code Max too.

OpenAI Codex

OpenAI’s coding agent. Route it through AI Gateway’s Responses API for observability, spend tracking, and retries. Usage joins the rest of your AI spend.

OpenCode

Open-source terminal coding agent with native AI Gateway support. Connect once for observability, spend tracking, and failover, then switch between any model on the fly.

Blackbox AI

Terminal CLI for AI code generation and debugging. Route it through AI Gateway for observability, spend tracking, and failover across every model in the catalog.

Cline

Autonomous coding agent for VS Code. Select Vercel AI Gateway as the provider for observability, spend tracking, and failover, with detailed token and cache metrics.

Grok Build

xAI’s terminal coding agent. Point it at AI Gateway with two environment variables for observability, spend tracking, and failover. The model picker pulls the full catalog.

Get Started

This quickstart walks you through making your first text generation request with AI Gateway.

Set Up Your Project

Create a new directory and initialize a Node.js project.
Terminal
mkdir ai-text-demo
cd ai-text-demo
pnpm init

Install Dependencies

Install the AI SDK and development dependencies.
Terminal
npm install ai dotenv @types/node tsx typescript

Set Up Your API Key

Go to the AI Gateway API Keys page in your Vercel dashboard and click Create Key to generate a new API Key.

Create a .env.local file and save your API Key.

.env.local
AI_GATEWAY_API_KEY=your_ai_gateway_api_key
Instead of using an API Key, you can use OIDC tokens to authenticate your requests.

Create and Run Your Script

Create the index.ts file.
index.ts
import { streamText } from 'ai';
import 'dotenv/config';
async function main() {
const result = streamText({
model: 'openai/gpt-5.5',
prompt: 'Invent a new holiday and describe its traditions.',
});
for await (const textPart of result.textStream) {
process.stdout.write(textPart);
}
console.log();
console.log('Token usage:', await result.usage);
console.log('Finish reason:', await result.finishReason);
}
main().catch(console.error);

Run your script.

Terminal
pnpm tsx index.ts

You should see the AI model’s response stream to your terminal.

Frequently Asked Questions

How is AI Gateway priced?

We offer tokens at list price from the upstream providers with no markup, including when you bring your own keys. Certain capabilities are available at higher plan tiers and metered separately. See the pricing page for details.

What's the difference between using AI Gateway and going direct to each provider?

AI Gateway gives you one integration, automatic failover, unified spend tracking, and one invoice across every major provider. Going direct means signing N contracts and stitching together N billing dashboards.

Will AI Gateway work with our existing AI stack?

Almost certainly. AI Gateway supports the AI SDK, OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, and an OpenResponses-compatible endpoint. Migrating is typically a base URL swap with no code changes.

Which modalities does AI Gateway support?

Text, image, video, embeddings, and reranking, all through the same endpoint. Browse the full model catalog for specific models and providers.

How does AI Gateway handle our enterprise security and compliance requirements?

AI Gateway supports Zero Data Retention routing, a no-training guarantee, and team-wide provider allowlists. See the security overview for full details.

What observability does AI Gateway provide out of the box?

A dashboard with usage, spend, request volume, TTFT, and token counts, broken down by model, provider, and project. For deeper analysis, the Custom Reporting API lets you pull the same data into your own tools.

Can we use our existing provider contracts and committed spend?

Yes, through BYOK. Bring your own keys for almost every supported provider and your existing commitments flow through. We try BYOK first and only fall back to system credentials on failure.

Do I pay per request or get invoiced?

AI Gateway uses pre-purchased credits by default. Top up in the dashboard and usage is drawn down per request. Enterprise customers can switch to a single consolidated invoice from Vercel covering every provider in their routing pool. For invoicing, reach out to sales for more details.