Model limit probe

Z.ai: GLM 4.7 Flash | concurrency limit

Z.ai: GLM 4.7 Flash concurrency limit decides whether a key can enter a production route. TestKey reads model ID z-ai/glm-4.7-flash, provider Zhipu AI (GLM), catalog context, real-key headers, 429 errors, and region signals together.

Model

z-ai/glm-4.7-flash

Z.ai: GLM 4.7 Flash

Provider

Zhipu AI (GLM)

13 models in catalog

Limit dimension

concurrency limit

concurrency

Visible signal

real key probe required

Context window: 202,752

Limit matrix summary

Model

z-ai/glm-4.7-flash

Limit dimension

concurrency

Visible signal

real key probe required

Read-only check. Detection data burns after 5 minutes.

Why this limit matters

Z.ai: GLM 4.7 Flash concurrency limit decides whether a key can enter a production route. TestKey reads model ID z-ai/glm-4.7-flash, provider Zhipu AI (GLM), catalog context, real-key headers, 429 errors, and region signals together.

Model: z-ai/glm-4.7-flash
Provider: Zhipu AI (GLM)
Limit dimension: concurrency limit

How to prove it

Read-only check. Detection data burns after 5 minutes.

Start with the visible signal: real key probe required, then read headers and error bodies with read-only requests.
concurrency limit must bind model ID z-ai/glm-4.7-flash; limits from another model at the same provider cannot be reused.
concurrency limit · real key probe required · 202,752

Operator action

Z.ai: GLM 4.7 Flash concurrency limit is not just a number. It should become route throttling, sale tags, headroom alerts, fallback model suggestions, and price protection.

Read-only check. Detection data burns after 5 minutes.
Visible signal: real key probe required
Context window: 202,752

Open key check

Read-only check. Detection data burns after 5 minutes.