Configuration

Configuration

CC-Relay is configured via YAML or TOML files. This guide covers all configuration options.

Configuration File Location

Default locations (checked in order):

  1. ./config.yaml or ./config.toml (current directory)
  2. ~/.config/cc-relay/config.yaml or ~/.config/cc-relay/config.toml
  3. Path specified via --config flag

The format is automatically detected from the file extension (.yaml, .yml, or .toml).

Generate a default config with:

cc-relay config init

Environment Variable Expansion

CC-Relay supports environment variable expansion using ${VAR_NAME} syntax in both YAML and TOML formats:

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"  # Expanded at load time
[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"  # Expanded at load time

Complete Configuration Reference

# ==========================================================================
# Server Configuration
# ==========================================================================
server:
# Address to listen on
listen: "127.0.0.1:8787"

# Request timeout in milliseconds (default: 600000 = 10 minutes)
timeout_ms: 600000

# Maximum concurrent requests (0 = unlimited)
max_concurrent: 0

# Enable HTTP/2 for better performance
enable_http2: true

# Authentication configuration
auth:
  # Require specific API key for proxy access
  api_key: "${PROXY_API_KEY}"

  # Allow Claude Code subscription Bearer tokens
  allow_subscription: true

  # Specific Bearer token to validate (optional)
  bearer_secret: "${BEARER_SECRET}"

# ==========================================================================
# Provider Configurations
# ==========================================================================
providers:
# Anthropic Direct API
- name: "anthropic"
  type: "anthropic"
  enabled: true
  base_url: "https://api.anthropic.com"  # Optional, uses default

  keys:
    - key: "${ANTHROPIC_API_KEY}"
      rpm_limit: 60       # Requests per minute
      tpm_limit: 100000   # Tokens per minute

  # Optional: Specify available models
  models:
    - "claude-sonnet-4-5-20250514"
    - "claude-opus-4-5-20250514"
    - "claude-haiku-3-5-20241022"

# Z.AI / Zhipu GLM
- name: "zai"
  type: "zai"
  enabled: true
  base_url: "https://api.z.ai/api/anthropic"

  keys:
    - key: "${ZAI_API_KEY}"

  # Map Claude model names to Z.AI models
  model_mapping:
    "claude-sonnet-4-5-20250514": "GLM-4.7"
    "claude-haiku-3-5-20241022": "GLM-4.5-Air"

  # Optional: Specify available models
  models:
    - "GLM-4.7"
    - "GLM-4.5-Air"
    - "GLM-4-Plus"

# ==========================================================================
# Logging Configuration
# ==========================================================================
logging:
# Log level: debug, info, warn, error
level: "info"

# Log format: json, text
format: "text"

# Enable colored output (for text format)
pretty: true

# Granular debug options
debug_options:
  log_request_body: false
  log_response_headers: false
  log_tls_metrics: false
  max_body_log_size: 1000

# ==========================================================================
# Cache Configuration
# ==========================================================================
cache:
# Cache mode: single, ha, disabled
mode: single

# Single mode (Ristretto) configuration
ristretto:
  num_counters: 1000000  # 10x expected max items
  max_cost: 104857600    # 100 MB
  buffer_items: 64       # Admission buffer size

# HA mode (Olric) configuration
olric:
  embedded: true                 # Run embedded Olric node
  bind_addr: "0.0.0.0:3320"      # Olric client port
  dmap_name: "cc-relay"          # Distributed map name
  environment: lan               # local, lan, or wan
  peers:                         # Memberlist addresses (bind_addr + 2)
    - "other-node:3322"
  replica_count: 2               # Copies per key
  read_quorum: 1                 # Min reads for success
  write_quorum: 1                # Min writes for success
  member_count_quorum: 2         # Min cluster members
  leave_timeout: 5s              # Leave broadcast duration

# ==========================================================================
# Routing Configuration
# ==========================================================================
routing:
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default)
strategy: failover

# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout: 5000

# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug: false
# ==========================================================================
# Server Configuration
# ==========================================================================
[server]
# Address to listen on
listen = "127.0.0.1:8787"

# Request timeout in milliseconds (default: 600000 = 10 minutes)
timeout_ms = 600000

# Maximum concurrent requests (0 = unlimited)
max_concurrent = 0

# Enable HTTP/2 for better performance
enable_http2 = true

# Authentication configuration
[server.auth]
# Require specific API key for proxy access
api_key = "${PROXY_API_KEY}"

# Allow Claude Code subscription Bearer tokens
allow_subscription = true

# Specific Bearer token to validate (optional)
bearer_secret = "${BEARER_SECRET}"

# ==========================================================================
# Provider Configurations
# ==========================================================================

# Anthropic Direct API
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
base_url = "https://api.anthropic.com"  # Optional, uses default

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
rpm_limit = 60       # Requests per minute
tpm_limit = 100000   # Tokens per minute

# Optional: Specify available models
models = [
"claude-sonnet-4-5-20250514",
"claude-opus-4-5-20250514",
"claude-haiku-3-5-20241022"
]

# Z.AI / Zhipu GLM
[[providers]]
name = "zai"
type = "zai"
enabled = true
base_url = "https://api.z.ai/api/anthropic"

[[providers.keys]]
key = "${ZAI_API_KEY}"

# Map Claude model names to Z.AI models
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-haiku-3-5-20241022" = "GLM-4.5-Air"

# Optional: Specify available models
models = [
"GLM-4.7",
"GLM-4.5-Air",
"GLM-4-Plus"
]

# ==========================================================================
# Logging Configuration
# ==========================================================================
[logging]
# Log level: debug, info, warn, error
level = "info"

# Log format: json, text
format = "text"

# Enable colored output (for text format)
pretty = true

# Granular debug options
[logging.debug_options]
log_request_body = false
log_response_headers = false
log_tls_metrics = false
max_body_log_size = 1000

# ==========================================================================
# Cache Configuration
# ==========================================================================
[cache]
# Cache mode: single, ha, disabled
mode = "single"

# Single mode (Ristretto) configuration
[cache.ristretto]
num_counters = 1000000  # 10x expected max items
max_cost = 104857600    # 100 MB
buffer_items = 64       # Admission buffer size

# HA mode (Olric) configuration
[cache.olric]
embedded = true                 # Run embedded Olric node
bind_addr = "0.0.0.0:3320"      # Olric client port
dmap_name = "cc-relay"          # Distributed map name
environment = "lan"             # local, lan, or wan
peers = ["other-node:3322"]     # Memberlist addresses (bind_addr + 2)
replica_count = 2               # Copies per key
read_quorum = 1                 # Min reads for success
write_quorum = 1                # Min writes for success
member_count_quorum = 2         # Min cluster members
leave_timeout = "5s"            # Leave broadcast duration

# ==========================================================================
# Routing Configuration
# ==========================================================================
[routing]
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default)
strategy = "failover"

# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout = 5000

# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug = false

Server Configuration

Listen Address

The listen field specifies where the proxy listens for incoming requests:

server:
listen: "127.0.0.1:8787"  # Local only (recommended)
# listen: "0.0.0.0:8787"  # All interfaces (use with caution)
[server]
listen = "127.0.0.1:8787"  # Local only (recommended)
# listen = "0.0.0.0:8787"  # All interfaces (use with caution)

Authentication

CC-Relay supports multiple authentication methods:

API Key Authentication

Require clients to provide a specific API key:

server:
auth:
  api_key: "${PROXY_API_KEY}"
[server.auth]
api_key = "${PROXY_API_KEY}"

Clients must include the header: x-api-key: <your-proxy-key>

Claude Code Subscription Passthrough

Allow Claude Code subscription users to connect:

server:
auth:
  allow_subscription: true
[server.auth]
allow_subscription = true

This accepts Authorization: Bearer tokens from Claude Code.

Combined Authentication

Allow both API key and subscription authentication:

server:
auth:
  api_key: "${PROXY_API_KEY}"
  allow_subscription: true
[server.auth]
api_key = "${PROXY_API_KEY}"
allow_subscription = true

No Authentication

To disable authentication (not recommended for production):

server:
auth: {}
# Or simply omit the auth section
# Simply omit the [server.auth] section
# or define an empty section:
[server.auth]

HTTP/2 Support

Enable HTTP/2 for better performance with concurrent requests:

server:
enable_http2: true
[server]
enable_http2 = true

Transparent Authentication

cc-relay automatically detects how to handle authentication based on what the client sends:

How It Works

Client Sendscc-relay BehaviorUse Case
Authorization: Bearer <token>Forward unchangedClaude Code subscription users
x-api-key: <key>Forward unchangedDirect API key users
No auth headersUse configured provider keysEnterprise/team deployments

Claude Code Subscription Users

If you have a Claude Code subscription (Max/Team/Enterprise plan), you can use cc-relay as a transparent proxy:

# Set cc-relay as your API endpoint
export ANTHROPIC_BASE_URL="http://localhost:8787"

# Your subscription token flows through unchanged
# ANTHROPIC_AUTH_TOKEN is already set by Claude Code
claude

No API key required - cc-relay forwards your subscription token to Anthropic.

Enterprise/Team Deployments

For centralized API key management, don’t provide client auth - cc-relay uses configured keys:

# config.yaml
providers:
- name: anthropic
  type: anthropic
  base_url: https://api.anthropic.com
  enabled: true
  keys:
    - key: ${ANTHROPIC_API_KEY}
      rpm_limit: 50
# config.toml
[[providers]]
name = "anthropic"
type = "anthropic"
base_url = "https://api.anthropic.com"
enabled = true

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
rpm_limit = 50
# Client has no auth - uses configured keys
export ANTHROPIC_BASE_URL="http://localhost:8787"
unset ANTHROPIC_AUTH_TOKEN
unset ANTHROPIC_API_KEY
claude

Mixed Mode

You can run both modes simultaneously:

  • Subscription users: Their auth flows through (no key pool overhead)
  • Team users: Use configured keys with rate limit pooling

Rate limiting and key pooling only apply when using configured keys, not client-provided auth.

Key Points

  1. Auto-detection: No configuration needed - behavior determined by client headers
  2. Subscription passthrough: Authorization: Bearer forwarded unchanged
  3. Fallback keys: Used only when client has no auth
  4. Key pool efficiency: Only tracks usage of YOUR keys, not client subscriptions

Provider Configuration

Provider Types

CC-Relay currently supports two provider types:

TypeDescriptionDefault Base URL
anthropicAnthropic Direct APIhttps://api.anthropic.com
zaiZ.AI / Zhipu GLMhttps://api.z.ai/api/anthropic

Anthropic Provider

providers:
- name: "anthropic"
  type: "anthropic"
  enabled: true
  base_url: "https://api.anthropic.com"  # Optional

  keys:
    - key: "${ANTHROPIC_API_KEY}"
      rpm_limit: 60
      tpm_limit: 100000

  models:
    - "claude-sonnet-4-5-20250514"
    - "claude-opus-4-5-20250514"
    - "claude-haiku-3-5-20241022"
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
base_url = "https://api.anthropic.com"  # Optional

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
rpm_limit = 60
tpm_limit = 100000

models = [
"claude-sonnet-4-5-20250514",
"claude-opus-4-5-20250514",
"claude-haiku-3-5-20241022"
]

Z.AI Provider

Z.AI offers Anthropic-compatible APIs with GLM models at lower cost:

providers:
- name: "zai"
  type: "zai"
  enabled: true
  base_url: "https://api.z.ai/api/anthropic"

  keys:
    - key: "${ZAI_API_KEY}"

  model_mapping:
    "claude-sonnet-4-5-20250514": "GLM-4.7"
    "claude-haiku-3-5-20241022": "GLM-4.5-Air"

  models:
    - "GLM-4.7"
    - "GLM-4.5-Air"
    - "GLM-4-Plus"
[[providers]]
name = "zai"
type = "zai"
enabled = true
base_url = "https://api.z.ai/api/anthropic"

[[providers.keys]]
key = "${ZAI_API_KEY}"

[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-haiku-3-5-20241022" = "GLM-4.5-Air"

models = [
"GLM-4.7",
"GLM-4.5-Air",
"GLM-4-Plus"
]

Multiple API Keys

Pool multiple API keys for higher throughput:

providers:
- name: "anthropic"
  type: "anthropic"
  enabled: true

  keys:
    - key: "${ANTHROPIC_API_KEY_1}"
      rpm_limit: 60
      tpm_limit: 100000
    - key: "${ANTHROPIC_API_KEY_2}"
      rpm_limit: 60
      tpm_limit: 100000
    - key: "${ANTHROPIC_API_KEY_3}"
      rpm_limit: 60
      tpm_limit: 100000
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true

[[providers.keys]]
key = "${ANTHROPIC_API_KEY_1}"
rpm_limit = 60
tpm_limit = 100000

[[providers.keys]]
key = "${ANTHROPIC_API_KEY_2}"
rpm_limit = 60
tpm_limit = 100000

[[providers.keys]]
key = "${ANTHROPIC_API_KEY_3}"
rpm_limit = 60
tpm_limit = 100000

Custom Base URL

Override the default API endpoint:

providers:
- name: "anthropic-custom"
  type: "anthropic"
  base_url: "https://custom-endpoint.example.com"
[[providers]]
name = "anthropic-custom"
type = "anthropic"
base_url = "https://custom-endpoint.example.com"

Logging Configuration

Log Levels

LevelDescription
debugVerbose output for development
infoNormal operation messages
warnWarning messages
errorError messages only

Log Format

logging:
format: "text"   # Human-readable (default)
# format: "json" # Machine-readable, for log aggregation
[logging]
format = "text"   # Human-readable (default)
# format = "json" # Machine-readable, for log aggregation

Debug Options

Fine-grained control over debug logging:

logging:
level: "debug"
debug_options:
  log_request_body: true      # Log request bodies (redacted)
  log_response_headers: true  # Log response headers
  log_tls_metrics: true       # Log TLS connection info
  max_body_log_size: 1000     # Max bytes to log from bodies
[logging]
level = "debug"

[logging.debug_options]
log_request_body = true      # Log request bodies (redacted)
log_response_headers = true  # Log response headers
log_tls_metrics = true       # Log TLS connection info
max_body_log_size = 1000     # Max bytes to log from bodies

Cache Configuration

CC-Relay provides a unified caching layer with multiple backend options for different deployment scenarios.

Cache Modes

ModeBackendUse Case
singleRistrettoSingle-instance deployments, high performance
haOlricMulti-instance deployments, shared state
disabledNoopNo caching, passthrough

Single Mode (Ristretto)

Ristretto is a high-performance, concurrent in-memory cache. This is the default mode for single-instance deployments.

cache:
mode: single
ristretto:
  num_counters: 1000000  # 10x expected max items
  max_cost: 104857600    # 100 MB
  buffer_items: 64       # Admission buffer size
[cache]
mode = "single"

[cache.ristretto]
num_counters = 1000000  # 10x expected max items
max_cost = 104857600    # 100 MB
buffer_items = 64       # Admission buffer size
FieldTypeDefaultDescription
num_countersint641,000,000Number of 4-bit access counters. Recommended: 10x expected max items.
max_costint64104,857,600 (100 MB)Maximum memory in bytes the cache can hold.
buffer_itemsint6464Number of keys per Get buffer. Controls admission buffer size.

HA Mode (Olric) - Embedded

For multi-instance deployments requiring shared cache state, use embedded Olric mode where each cc-relay instance runs an Olric node.

cache:
mode: ha
olric:
  embedded: true
  bind_addr: "0.0.0.0:3320"
  dmap_name: "cc-relay"
  environment: lan
  peers:
    - "other-node:3322"  # Memberlist port = bind_addr + 2
  replica_count: 2
  read_quorum: 1
  write_quorum: 1
  member_count_quorum: 2
  leave_timeout: 5s
[cache]
mode = "ha"

[cache.olric]
embedded = true
bind_addr = "0.0.0.0:3320"
dmap_name = "cc-relay"
environment = "lan"
peers = ["other-node:3322"]  # Memberlist port = bind_addr + 2
replica_count = 2
read_quorum = 1
write_quorum = 1
member_count_quorum = 2
leave_timeout = "5s"
FieldTypeDefaultDescription
embeddedboolfalseRun embedded Olric node (true) vs. connect to external cluster (false).
bind_addrstringrequiredAddress for Olric client connections (e.g., “0.0.0.0:3320”).
dmap_namestring“cc-relay”Name of the distributed map. All nodes must use the same name.
environmentstring“local”Memberlist preset: “local”, “lan”, or “wan”.
peers[]string-Memberlist addresses for peer discovery. Uses port bind_addr + 2.
replica_countint1Number of copies per key. 1 = no replication.
read_quorumint1Minimum successful reads for response.
write_quorumint1Minimum successful writes for response.
member_count_quorumint321Minimum cluster members required to operate.
leave_timeoutduration5sTime to broadcast leave message before shutdown.

Important: Olric uses two ports - the bind_addr port for client connections and bind_addr + 2 for memberlist gossip. Ensure both ports are open in your firewall.

HA Mode (Olric) - Client Mode

Connect to an external Olric cluster instead of running embedded nodes:

cache:
mode: ha
olric:
  embedded: false
  addresses:
    - "olric-node-1:3320"
    - "olric-node-2:3320"
  dmap_name: "cc-relay"
[cache]
mode = "ha"

[cache.olric]
embedded = false
addresses = ["olric-node-1:3320", "olric-node-2:3320"]
dmap_name = "cc-relay"
FieldTypeDescription
embeddedboolSet to false for client mode.
addresses[]stringExternal Olric cluster addresses.
dmap_namestringDistributed map name (must match cluster configuration).

Disabled Mode

Disable caching entirely for debugging or when caching is handled elsewhere:

cache:
mode: disabled
[cache]
mode = "disabled"

For detailed cache configuration including cache key conventions, cache busting strategies, HA clustering guides, and troubleshooting, see the Cache System documentation.

Routing Configuration

CC-Relay supports multiple routing strategies for distributing requests across providers.

# ==========================================================================
# Routing Configuration
# ==========================================================================
routing:
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default)
strategy: failover

# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout: 5000

# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug: false
# ==========================================================================
# Routing Configuration
# ==========================================================================
[routing]
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default)
strategy = "failover"

# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout = 5000

# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug = false

Routing Strategies

StrategyDescription
failoverTry providers in priority order, fallback on failure (default)
round_robinSequential rotation through providers
weighted_round_robinDistribute proportionally by weight
shuffleFair random distribution

Provider Weight and Priority

Weight and priority are configured in the provider’s first key:

providers:
- name: "anthropic"
  type: "anthropic"
  keys:
    - key: "${ANTHROPIC_API_KEY}"
      weight: 3      # For weighted-round-robin (higher = more traffic)
      priority: 2    # For failover (higher = tried first)
[[providers]]
name = "anthropic"
type = "anthropic"

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
weight = 3      # For weighted-round-robin (higher = more traffic)
priority = 2    # For failover (higher = tried first)

For detailed routing configuration including strategy explanations, debug headers, and failover triggers, see the Routing documentation.

Example Configurations

Minimal Single Provider

server:
listen: "127.0.0.1:8787"

providers:
- name: "anthropic"
  type: "anthropic"
  enabled: true
  keys:
    - key: "${ANTHROPIC_API_KEY}"
[server]
listen = "127.0.0.1:8787"

[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"

Multi-Provider Setup

server:
listen: "127.0.0.1:8787"
auth:
  allow_subscription: true

providers:
- name: "anthropic"
  type: "anthropic"
  enabled: true
  keys:
    - key: "${ANTHROPIC_API_KEY}"

- name: "zai"
  type: "zai"
  enabled: true
  keys:
    - key: "${ZAI_API_KEY}"
  model_mapping:
    "claude-sonnet-4-5-20250514": "GLM-4.7"

logging:
level: "info"
format: "text"
[server]
listen = "127.0.0.1:8787"

[server.auth]
allow_subscription = true

[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"

[[providers]]
name = "zai"
type = "zai"
enabled = true

[[providers.keys]]
key = "${ZAI_API_KEY}"

[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "GLM-4.7"

[logging]
level = "info"
format = "text"

Development with Debug Logging

server:
listen: "127.0.0.1:8787"

providers:
- name: "anthropic"
  type: "anthropic"
  enabled: true
  keys:
    - key: "${ANTHROPIC_API_KEY}"

logging:
level: "debug"
format: "text"
pretty: true
debug_options:
  log_request_body: true
  log_response_headers: true
  log_tls_metrics: true
[server]
listen = "127.0.0.1:8787"

[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"

[logging]
level = "debug"
format = "text"
pretty = true

[logging.debug_options]
log_request_body = true
log_response_headers = true
log_tls_metrics = true

Validating Configuration

Validate your configuration file:

cc-relay config validate

Tip: Always validate configuration changes before deploying. Hot-reload will reject invalid configurations, but validation catches errors before they reach production.

Hot Reloading

CC-Relay automatically detects and applies configuration changes without requiring a restart. This enables zero-downtime configuration updates.

How It Works

CC-Relay uses fsnotify to monitor the config file for changes:

  1. File watching: The parent directory is monitored to properly detect atomic writes (temp file + rename pattern used by most editors)
  2. Debouncing: Multiple rapid file events are coalesced with a 100ms debounce delay to handle editor save behavior
  3. Atomic swap: New configuration is loaded and swapped atomically using Go’s sync/atomic.Pointer
  4. In-flight preservation: Requests in progress continue with the old configuration; new requests use the updated configuration

Events That Trigger Reload

EventTriggers Reload
File writeYes
File create (atomic rename)Yes
File chmodNo (ignored)
Other file in directoryNo (ignored)

Logging

When hot-reload occurs, you’ll see log messages:

INF config file reloaded path=/path/to/config.yaml
INF config hot-reloaded successfully

If the new configuration is invalid:

ERR failed to reload config path=/path/to/config.yaml error="validation error"

Invalid configurations are rejected and the proxy continues with the previous valid configuration.

Limitations

  • Listen address: Changing server.listen requires a restart
  • gRPC address: Changing grpc.listen requires a restart

Configuration options that can be hot-reloaded:

  • Logging level and format
  • Routing strategy, failover timeout, weights, and priorities
  • Provider enable/disable, base URL, and model mapping
  • Keypool strategy, key weights, and per-key limits
  • Max concurrent requests and max body size
  • Health check intervals and circuit breaker thresholds

Hot-reload guarantees

  • New requests use the latest configuration after reload completes.
  • In-flight requests continue with the previous configuration.
  • Reload applies atomically to routing/provider/keypool state.
  • Invalid configs are rejected and the previous config remains active.

Next Steps