Routing
CC-Relay supports multiple routing strategies to distribute requests across providers. This page explains each strategy and how to configure them.
Overview
Routing determines how cc-relay chooses which provider handles each request. The right strategy depends on your priorities: availability, cost, latency, or load distribution.
| Strategy | Config Value | Description | Use Case |
|---|---|---|---|
| Round-Robin | round_robin | Sequential rotation through providers | Even distribution |
| Weighted Round-Robin | weighted_round_robin | Proportional distribution by weight | Capacity-based distribution |
| Shuffle | shuffle | Fair random (“dealing cards”) | Randomized load balancing |
| Failover | failover (default) | Priority-based with automatic retry | High availability |
| Model-Based | model_based | Route by model name prefix | Multi-model deployments |
Configuration
Configure routing in your config file:
routing:
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default), model_based
strategy: failover
# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout: 5000
# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug: false
# Model-based routing configuration (only used when strategy: model_based)
model_mapping:
claude-opus: anthropic # claude-opus-* models → anthropic provider
claude-sonnet: anthropic # claude-sonnet-* models → anthropic provider
glm-4: zai # glm-4* models → zai provider
qwen: ollama # qwen* models → ollama provider
# Default provider when no model mapping matches
default_provider: anthropic[routing]
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default), model_based
strategy = "failover"
# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout = 5000
# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug = false
# Default provider when no model mapping matches
default_provider = "anthropic"
# Model-based routing configuration (only used when strategy: model_based)
[routing.model_mapping]
claude-opus = "anthropic" # claude-opus-* models → anthropic provider
claude-sonnet = "anthropic" # claude-sonnet-* models → anthropic provider
glm-4 = "zai" # glm-4* models → zai provider
qwen = "ollama" # qwen* models → ollama providerDefault: If strategy is not specified, cc-relay uses failover as the safest option.
Strategies
Round-Robin
Sequential distribution using an atomic counter. Each provider receives one request before any provider receives a second.
routing:
strategy: round_robin[routing]
strategy = "round_robin"How it works:
- Request 1 → Provider A
- Request 2 → Provider B
- Request 3 → Provider C
- Request 4 → Provider A (cycle repeats)
Best for: Equal distribution across providers with similar capacity.
Weighted Round-Robin
Distributes requests proportionally based on provider weights. Uses the Nginx smooth weighted round-robin algorithm for even distribution.
routing:
strategy: weighted_round_robin
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}"
weight: 3 # Receives 3x more requests
- name: "zai"
type: "zai"
keys:
- key: "${ZAI_API_KEY}"
weight: 1 # Receives 1x requests[routing]
strategy = "weighted_round_robin"
[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
weight = 3 # Receives 3x more requests
[[providers]]
name = "zai"
type = "zai"
[[providers.keys]]
key = "${ZAI_API_KEY}"
weight = 1 # Receives 1x requestsHow it works:
With weights 3:1, out of every 4 requests:
- 3 requests → anthropic
- 1 request → zai
Default weight: 1 (if not specified)
Best for: Distributing load based on provider capacity, rate limits, or cost allocation.
Shuffle
Fair random distribution using the Fisher-Yates “dealing cards” pattern. Everyone gets one card before anyone gets a second.
routing:
strategy: shuffle[routing]
strategy = "shuffle"How it works:
- All providers start in a “deck”
- Random provider selected and removed from deck
- When deck empty, reshuffle all providers
- Guarantees fair distribution over time
Best for: Randomized load balancing while ensuring fairness.
Failover
Tries providers in priority order. On failure, parallel races remaining providers for the fastest successful response. This is the default strategy.
routing:
strategy: failover
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}"
priority: 2 # Tried first (higher = higher priority)
- name: "zai"
type: "zai"
keys:
- key: "${ZAI_API_KEY}"
priority: 1 # Fallback[routing]
strategy = "failover"
[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
priority = 2 # Tried first (higher = higher priority)
[[providers]]
name = "zai"
type = "zai"
[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1 # FallbackHow it works:
- Try highest priority provider first
- If it fails (see Failover Triggers), launch parallel requests to all remaining providers
- Return first successful response, cancel others
- Respects
failover_timeoutfor total operation time
Default priority: 1 (if not specified)
Best for: High availability with automatic fallback.
Model-Based
Routes requests to providers based on the model name in the request. Uses longest prefix matching for specificity.
routing:
strategy: model_based
model_mapping:
claude-opus: anthropic
claude-sonnet: anthropic
glm-4: zai
qwen: ollama
llama: ollama
default_provider: anthropic
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}"
- name: "zai"
type: "zai"
keys:
- key: "${ZAI_API_KEY}"
- name: "ollama"
type: "ollama"
base_url: "http://localhost:11434"[routing]
strategy = "model_based"
default_provider = "anthropic"
[routing.model_mapping]
claude-opus = "anthropic"
claude-sonnet = "anthropic"
glm-4 = "zai"
qwen = "ollama"
llama = "ollama"
[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
[[providers]]
name = "zai"
type = "zai"
[[providers.keys]]
key = "${ZAI_API_KEY}"
[[providers]]
name = "ollama"
type = "ollama"
base_url = "http://localhost:11434"How it works:
- Extract model name from request body (e.g.,
claude-opus-4) - Find longest matching prefix in
model_mapping(e.g.,claude-opus) - Route to mapped provider (e.g.,
anthropic) - If no match, use
default_provider - If no default, route to any available provider
Prefix matching examples:
| Request Model | Mapping | Result |
|---|---|---|
claude-opus-4 | claude-opus: anthropic | anthropic |
claude-sonnet-4-20250514 | claude-sonnet: anthropic | anthropic |
glm-4.7 | glm-4: zai | zai |
qwen3:8b | qwen: ollama | ollama |
unknown-model | (no match) | default_provider |
Best for: Multi-provider setups where different providers handle different model families.
Debug Headers
When routing.debug: true, cc-relay adds diagnostic headers to responses:
| Header | Value | Description |
|---|---|---|
X-CC-Relay-Strategy | Strategy name | Which routing strategy was used |
X-CC-Relay-Provider | Provider name | Which provider handled the request |
Example response headers:
X-CC-Relay-Strategy: failover
X-CC-Relay-Provider: anthropicSecurity Warning: Debug headers expose internal routing decisions. Use only in development or trusted environments. Never enable in production with untrusted clients.
Failover Triggers
The failover strategy triggers retry on specific error conditions:
| Trigger | Conditions | Description |
|---|---|---|
| Status Code | 429, 500, 502, 503, 504 | Rate limit or server errors |
| Timeout | context.DeadlineExceeded | Request timeout exceeded |
| Connection | net.Error | Network errors, DNS failures, connection refused |
Important: Client errors (4xx except 429) do not trigger failover. These indicate issues with the request itself, not the provider.
Status Codes Explained
| Code | Meaning | Failover? |
|---|---|---|
429 | Rate Limited | Yes - try another provider |
500 | Internal Server Error | Yes - server issue |
502 | Bad Gateway | Yes - upstream issue |
503 | Service Unavailable | Yes - temporarily down |
504 | Gateway Timeout | Yes - upstream timeout |
400 | Bad Request | No - fix the request |
401 | Unauthorized | No - fix authentication |
403 | Forbidden | No - permission issue |
Examples
Simple Failover (Recommended for Most Users)
Use the default strategy with prioritized providers:
routing:
strategy: failover
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}"
priority: 2
- name: "zai"
type: "zai"
keys:
- key: "${ZAI_API_KEY}"
priority: 1[routing]
strategy = "failover"
[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
priority = 2
[[providers]]
name = "zai"
type = "zai"
[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1Load Balanced with Weights
Distribute load based on provider capacity:
routing:
strategy: weighted_round_robin
providers:
- name: "primary"
type: "anthropic"
keys:
- key: "${PRIMARY_KEY}"
weight: 3 # 75% of traffic
- name: "secondary"
type: "anthropic"
keys:
- key: "${SECONDARY_KEY}"
weight: 1 # 25% of traffic[routing]
strategy = "weighted_round_robin"
[[providers]]
name = "primary"
type = "anthropic"
[[providers.keys]]
key = "${PRIMARY_KEY}"
weight = 3 # 75% of traffic
[[providers]]
name = "secondary"
type = "anthropic"
[[providers.keys]]
key = "${SECONDARY_KEY}"
weight = 1 # 25% of trafficDevelopment with Debug Headers
Enable debug headers for troubleshooting:
routing:
strategy: round_robin
debug: true
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}"
- name: "zai"
type: "zai"
keys:
- key: "${ZAI_API_KEY}"[routing]
strategy = "round_robin"
debug = true
[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
[[providers]]
name = "zai"
type = "zai"
[[providers.keys]]
key = "${ZAI_API_KEY}"High Availability with Fast Failover
Minimize failover latency:
routing:
strategy: failover
failover_timeout: 3000 # 3 second timeout
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}"
priority: 2
- name: "zai"
type: "zai"
keys:
- key: "${ZAI_API_KEY}"
priority: 1[routing]
strategy = "failover"
failover_timeout = 3000 # 3 second timeout
[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
priority = 2
[[providers]]
name = "zai"
type = "zai"
[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1Multi-Model with Model-Based Routing
Route different model families to different providers:
routing:
strategy: model_based
model_mapping:
claude-opus: anthropic
claude-sonnet: anthropic
claude-haiku: anthropic
glm-4: zai
glm-3: zai
qwen: ollama
llama: ollama
default_provider: anthropic
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}"
- name: "zai"
type: "zai"
keys:
- key: "${ZAI_API_KEY}"
- name: "ollama"
type: "ollama"
base_url: "http://localhost:11434"[routing]
strategy = "model_based"
default_provider = "anthropic"
[routing.model_mapping]
claude-opus = "anthropic"
claude-sonnet = "anthropic"
claude-haiku = "anthropic"
glm-4 = "zai"
glm-3 = "zai"
qwen = "ollama"
llama = "ollama"
[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
[[providers]]
name = "zai"
type = "zai"
[[providers.keys]]
key = "${ZAI_API_KEY}"
[[providers]]
name = "ollama"
type = "ollama"
base_url = "http://localhost:11434"With this configuration:
claude-opus-4→ anthropicglm-4.7→ zaiqwen3:8b→ ollamaunknown-model→ anthropic (default)
Provider Weight and Priority
Weight and priority are specified in the provider’s key configuration:
providers:
- name: "example"
type: "anthropic"
keys:
- key: "${API_KEY}"
weight: 3 # For weighted-round-robin (higher = more traffic)
priority: 2 # For failover (higher = tried first)
rpm_limit: 60 # Rate limit tracking[[providers]]
name = "example"
type = "anthropic"
[[providers.keys]]
key = "${API_KEY}"
weight = 3 # For weighted-round-robin (higher = more traffic)
priority = 2 # For failover (higher = tried first)
rpm_limit = 60 # Rate limit trackingNote: Weight and priority are read from the first key in the provider’s key list.
Next Steps
- Configuration reference - Complete configuration options
- Architecture overview - How cc-relay works internally