Automatic model fallback when a provider errors before the first token

29 May 2026 v1.11.0 feature medium impact

routing
providers

If a model errors before it produces any reply content, the gateway now transparently retries the request against the next allowed model in the group’s policy instead of surfacing the failure to the user. The reply is annotated with the model that ultimately served it so the switch is visible in the chat transcript and in audit logs.

Fallback is constrained to models the user is already permitted to use under the active LLM group policy; it does not widen access. Quality, cost, and latency routing strategies select the fallback target using the same ordering they would for a fresh request.

No action required. Fallback is on by default for groups with two or more permitted models and engages only on pre-reply provider errors. Streams that have already emitted tokens are not retried.