Routing

CC-Relay supports multiple routing strategies to distribute requests across providers. This page explains each strategy and how to configure them.

Overview

Routing determines how cc-relay chooses which provider handles each request. The right strategy depends on your priorities: availability, cost, latency, or load distribution.

StrategyConfig ValueDescriptionUse Case
Round-Robinround_robinSequential rotation through providersEven distribution
Weighted Round-Robinweighted_round_robinProportional distribution by weightCapacity-based distribution
ShuffleshuffleFair random (“dealing cards”)Randomized load balancing
Failoverfailover (default)Priority-based with automatic retryHigh availability
Model-Basedmodel_basedRoute by model name prefixMulti-model deployments

Configuration

Configure routing in your config file:

routing:
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default), model_based
strategy: failover

# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout: 5000

# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug: false

# Model-based routing configuration (only used when strategy: model_based)
model_mapping:
  claude-opus: anthropic    # claude-opus-* models → anthropic provider
  claude-sonnet: anthropic  # claude-sonnet-* models → anthropic provider
  glm-4: zai                # glm-4* models → zai provider
  qwen: ollama              # qwen* models → ollama provider

# Default provider when no model mapping matches
default_provider: anthropic
[routing]
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default), model_based
strategy = "failover"

# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout = 5000

# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug = false

# Default provider when no model mapping matches
default_provider = "anthropic"

# Model-based routing configuration (only used when strategy: model_based)
[routing.model_mapping]
claude-opus = "anthropic"    # claude-opus-* models → anthropic provider
claude-sonnet = "anthropic"  # claude-sonnet-* models → anthropic provider
glm-4 = "zai"                # glm-4* models → zai provider
qwen = "ollama"              # qwen* models → ollama provider

Default: If strategy is not specified, cc-relay uses failover as the safest option.

Strategies

Round-Robin

Sequential distribution using an atomic counter. Each provider receives one request before any provider receives a second.

routing:
strategy: round_robin
[routing]
strategy = "round_robin"

How it works:

  1. Request 1 → Provider A
  2. Request 2 → Provider B
  3. Request 3 → Provider C
  4. Request 4 → Provider A (cycle repeats)

Best for: Equal distribution across providers with similar capacity.

Weighted Round-Robin

Distributes requests proportionally based on provider weights. Uses the Nginx smooth weighted round-robin algorithm for even distribution.

routing:
strategy: weighted_round_robin

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"
      weight: 3  # Receives 3x more requests

- name: "zai"
  type: "zai"
  keys:
    - key: "${ZAI_API_KEY}"
      weight: 1  # Receives 1x requests
[routing]
strategy = "weighted_round_robin"

[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
weight = 3  # Receives 3x more requests

[[providers]]
name = "zai"
type = "zai"

[[providers.keys]]
key = "${ZAI_API_KEY}"
weight = 1  # Receives 1x requests

How it works:

With weights 3:1, out of every 4 requests:

  • 3 requests → anthropic
  • 1 request → zai

Default weight: 1 (if not specified)

Best for: Distributing load based on provider capacity, rate limits, or cost allocation.

Shuffle

Fair random distribution using the Fisher-Yates “dealing cards” pattern. Everyone gets one card before anyone gets a second.

routing:
strategy: shuffle
[routing]
strategy = "shuffle"

How it works:

  1. All providers start in a “deck”
  2. Random provider selected and removed from deck
  3. When deck empty, reshuffle all providers
  4. Guarantees fair distribution over time

Best for: Randomized load balancing while ensuring fairness.

Failover

Tries providers in priority order. On failure, parallel races remaining providers for the fastest successful response. This is the default strategy.

routing:
strategy: failover

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"
      priority: 2  # Tried first (higher = higher priority)

- name: "zai"
  type: "zai"
  keys:
    - key: "${ZAI_API_KEY}"
      priority: 1  # Fallback
[routing]
strategy = "failover"

[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
priority = 2  # Tried first (higher = higher priority)

[[providers]]
name = "zai"
type = "zai"

[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1  # Fallback

How it works:

  1. Try highest priority provider first
  2. If it fails (see Failover Triggers), launch parallel requests to all remaining providers
  3. Return first successful response, cancel others
  4. Respects failover_timeout for total operation time

Default priority: 1 (if not specified)

Best for: High availability with automatic fallback.

Model-Based

Routes requests to providers based on the model name in the request. Uses longest prefix matching for specificity.

routing:
strategy: model_based

model_mapping:
  claude-opus: anthropic
  claude-sonnet: anthropic
  glm-4: zai
  qwen: ollama
  llama: ollama

default_provider: anthropic

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"

- name: "zai"
  type: "zai"
  keys:
    - key: "${ZAI_API_KEY}"

- name: "ollama"
  type: "ollama"
  base_url: "http://localhost:11434"
[routing]
strategy = "model_based"
default_provider = "anthropic"

[routing.model_mapping]
claude-opus = "anthropic"
claude-sonnet = "anthropic"
glm-4 = "zai"
qwen = "ollama"
llama = "ollama"

[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"

[[providers]]
name = "zai"
type = "zai"

[[providers.keys]]
key = "${ZAI_API_KEY}"

[[providers]]
name = "ollama"
type = "ollama"
base_url = "http://localhost:11434"

How it works:

  1. Extract model name from request body (e.g., claude-opus-4)
  2. Find longest matching prefix in model_mapping (e.g., claude-opus)
  3. Route to mapped provider (e.g., anthropic)
  4. If no match, use default_provider
  5. If no default, route to any available provider

Prefix matching examples:

Request ModelMappingResult
claude-opus-4claude-opus: anthropicanthropic
claude-sonnet-4-20250514claude-sonnet: anthropicanthropic
glm-4.7glm-4: zaizai
qwen3:8bqwen: ollamaollama
unknown-model(no match)default_provider

Best for: Multi-provider setups where different providers handle different model families.

Debug Headers

When routing.debug: true, cc-relay adds diagnostic headers to responses:

HeaderValueDescription
X-CC-Relay-StrategyStrategy nameWhich routing strategy was used
X-CC-Relay-ProviderProvider nameWhich provider handled the request

Example response headers:

X-CC-Relay-Strategy: failover
X-CC-Relay-Provider: anthropic

Security Warning: Debug headers expose internal routing decisions. Use only in development or trusted environments. Never enable in production with untrusted clients.

Failover Triggers

The failover strategy triggers retry on specific error conditions:

TriggerConditionsDescription
Status Code429, 500, 502, 503, 504Rate limit or server errors
Timeoutcontext.DeadlineExceededRequest timeout exceeded
Connectionnet.ErrorNetwork errors, DNS failures, connection refused

Important: Client errors (4xx except 429) do not trigger failover. These indicate issues with the request itself, not the provider.

Status Codes Explained

CodeMeaningFailover?
429Rate LimitedYes - try another provider
500Internal Server ErrorYes - server issue
502Bad GatewayYes - upstream issue
503Service UnavailableYes - temporarily down
504Gateway TimeoutYes - upstream timeout
400Bad RequestNo - fix the request
401UnauthorizedNo - fix authentication
403ForbiddenNo - permission issue

Examples

Simple Failover (Recommended for Most Users)

Use the default strategy with prioritized providers:

routing:
strategy: failover

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"
      priority: 2

- name: "zai"
  type: "zai"
  keys:
    - key: "${ZAI_API_KEY}"
      priority: 1
[routing]
strategy = "failover"

[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
priority = 2

[[providers]]
name = "zai"
type = "zai"

[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1

Load Balanced with Weights

Distribute load based on provider capacity:

routing:
strategy: weighted_round_robin

providers:
- name: "primary"
  type: "anthropic"
  keys:
    - key: "${PRIMARY_KEY}"
      weight: 3  # 75% of traffic

- name: "secondary"
  type: "anthropic"
  keys:
    - key: "${SECONDARY_KEY}"
      weight: 1  # 25% of traffic
[routing]
strategy = "weighted_round_robin"

[[providers]]
name = "primary"
type = "anthropic"

[[providers.keys]]
key = "${PRIMARY_KEY}"
weight = 3  # 75% of traffic

[[providers]]
name = "secondary"
type = "anthropic"

[[providers.keys]]
key = "${SECONDARY_KEY}"
weight = 1  # 25% of traffic

Development with Debug Headers

Enable debug headers for troubleshooting:

routing:
strategy: round_robin
debug: true

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"

- name: "zai"
  type: "zai"
  keys:
    - key: "${ZAI_API_KEY}"
[routing]
strategy = "round_robin"
debug = true

[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"

[[providers]]
name = "zai"
type = "zai"

[[providers.keys]]
key = "${ZAI_API_KEY}"

High Availability with Fast Failover

Minimize failover latency:

routing:
strategy: failover
failover_timeout: 3000  # 3 second timeout

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"
      priority: 2

- name: "zai"
  type: "zai"
  keys:
    - key: "${ZAI_API_KEY}"
      priority: 1
[routing]
strategy = "failover"
failover_timeout = 3000  # 3 second timeout

[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
priority = 2

[[providers]]
name = "zai"
type = "zai"

[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1

Multi-Model with Model-Based Routing

Route different model families to different providers:

routing:
strategy: model_based

model_mapping:
  claude-opus: anthropic
  claude-sonnet: anthropic
  claude-haiku: anthropic
  glm-4: zai
  glm-3: zai
  qwen: ollama
  llama: ollama

default_provider: anthropic

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"

- name: "zai"
  type: "zai"
  keys:
    - key: "${ZAI_API_KEY}"

- name: "ollama"
  type: "ollama"
  base_url: "http://localhost:11434"
[routing]
strategy = "model_based"
default_provider = "anthropic"

[routing.model_mapping]
claude-opus = "anthropic"
claude-sonnet = "anthropic"
claude-haiku = "anthropic"
glm-4 = "zai"
glm-3 = "zai"
qwen = "ollama"
llama = "ollama"

[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"

[[providers]]
name = "zai"
type = "zai"

[[providers.keys]]
key = "${ZAI_API_KEY}"

[[providers]]
name = "ollama"
type = "ollama"
base_url = "http://localhost:11434"

With this configuration:

  • claude-opus-4 → anthropic
  • glm-4.7 → zai
  • qwen3:8b → ollama
  • unknown-model → anthropic (default)

Provider Weight and Priority

Weight and priority are specified in the provider’s key configuration:

providers:
- name: "example"
  type: "anthropic"
  keys:
    - key: "${API_KEY}"
      weight: 3      # For weighted-round-robin (higher = more traffic)
      priority: 2    # For failover (higher = tried first)
      rpm_limit: 60  # Rate limit tracking
[[providers]]
name = "example"
type = "anthropic"

[[providers.keys]]
key = "${API_KEY}"
weight = 3      # For weighted-round-robin (higher = more traffic)
priority = 2    # For failover (higher = tried first)
rpm_limit = 60  # Rate limit tracking

Note: Weight and priority are read from the first key in the provider’s key list.

Next Steps