Skip to content
Dashboard

GLM 4.7

GLM 4.7 is Z.ai's model released December 22, 2025 with major improvements in coding, tool usage, and multi-step reasoning. It uses a more natural conversational tone and shows improved frontend development results.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.7',
prompt: 'Why is the sky blue?'
})

Playground

Try out GLM 4.7 by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

zai logo
zai logo

Ask GLM 4.7 anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Z.ai
200K
5.8s
196tps
$0.60/M$2.20/M
Read:$0.11/M
Write:—
——
+1
12/22/2025
Novita AI
205K
1.0s
88tps
$0.60/M$2.20/M
Read:$0.1/M
Write:—
——
+1
12/22/2025
DeepInfra
203K
0.5s
31tps
$0.43/M$1.75/M
Read:$0.08/M
Write:—
——
+1
12/22/2025
Cerebras
131K
0.3s
526tps
$2.25/M$2.75/M
Read:$2.25/M
Write:—
——
+1
12/22/2025
Amazon Bedrock
200K
1.4s
57tps
$0.60/M$2.20/M——
12/22/2025
Baseten
200K
0.2s
155tps
$0.60/M$2.20/M
Read:$0.12/M
Write:—
——
+1
12/22/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
1.4s
143tps
$3.00/M$10.25/M
Read:$0.5/M
Write:—
——
+1
wafer logo
06/23/2026
1M
0.8s
152tps
$0.95/M$3.00/M
Read:$0.18/M
Write:—
——
+1
baseten logo
deepinfra logo
fireworks logo
+3
06/16/2026
205K
0.4s
165tps
$1.30/M$4.30/M
Read:$0.26/M
Write:—
——
+1
baseten logo
deepinfra logo
fireworks logo
+3
04/07/2026
203K
4.4s
51tps
$1.20/M$4.00/M
Read:$0.24/M
Write:—
——
+1
zai logo
03/15/2026
203K
0.5s
84tps
$0.80/M$2.56/M
Read:$0.16/M
Write:—
——
+1
baseten logo
bedrock logo
deepinfra logo
+3
02/12/2026
200K
1.4s
152tps
$0.06/M$0.40/M
Read:$0.01/M
Write:—
——
+1
zai logo
01/19/2026

About GLM 4.7

GLM 4.7 was released December 22, 2025 as a capability upgrade in Z.ai's model lineup. It brings major improvements in coding, tool usage, and multi-step reasoning, with added focus on complex agentic tasks that require sustained planning across multiple steps.

The model introduces a more natural conversational tone compared to earlier GLM generations. This improves the experience in chat-based applications and interactive coding assistants. GLM 4.7 shows improved frontend development results for teams building UI components, web applications, and design-to-code workflows.

GLM 4.7 operates within a context window of 204.8K tokens and is available through AI Gateway with the standard unified API, built-in observability, and intelligent provider routing. It represents the full-scale offering in the 4.7 generation, complemented by GLM-4.7-Flash and GLM-4.7-FlashX for speed-optimized workloads.

What To Consider When Choosing a Provider

  • Configuration: GLM 4.7 targets frontend tasks. If your workload involves generating React components, CSS, or converting designs to code, benchmark it against general-purpose alternatives you already use.
  • Configuration: The 4.7 generation includes three tiers. GLM 4.7 provides maximum capability, GLM-4.7-Flash offers speed optimization, and GLM-4.7-FlashX provides the fastest inference. Choose based on your latency-capability tradeoff.
  • Configuration: Multi-step reasoning updates make GLM 4.7 more reliable for agentic pipelines. Test it against your existing agent benchmarks to quantify the improvement over GLM-4.6.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 4.7

Best For

  • Frontend development: HTML, CSS, React components, and design-to-code conversion benefit from the targeted improvements
  • Complex agentic tasks: Multi-step reasoning, tool usage, and sustained planning across extended interactions
  • Interactive coding assistants: The natural conversational tone improves developer experience
  • Full-stack code generation: Improvements in both coding capability and tool usage apply across the stack
  • Production applications: The highest capability tier in the GLM-4.7 generation, paired with AI Gateway observability

Consider Alternatives When

  • Latency-driven workloads: GLM-4.7-Flash or GLM-4.7-FlashX provides faster inference at reduced capability
  • Vision capabilities needed: Evaluate GLM-4.6V or GLM-4.5V for multimodal visual input
  • Advanced reasoning beyond 4.7: GLM-5 introduces multiple thinking modes and improved long-range planning
  • Cost-efficiency priority: The flash variants in the 4.7 generation or GLM-4.5-Air may be more economical when peak capability is not essential

Conclusion

GLM 4.7 advances Z.ai's model lineup with targeted improvements in the areas that matter most for modern development workflows: coding, tool usage, multi-step reasoning, and frontend generation. As the full-scale 4.7 model, it sets the capability ceiling for the generation while GLM-4.7-Flash and FlashX variants serve speed-sensitive workloads.