Configuration
CC-Relay is configured via YAML or TOML files. This guide covers all configuration options.
Configuration File Location
Default locations (checked in order):
./config.yamlor./config.toml(current directory)~/.config/cc-relay/config.yamlor~/.config/cc-relay/config.toml- Path specified via
--configflag
The format is automatically detected from the file extension (.yaml, .yml, or .toml).
Generate a default config with:
cc-relay config initEnvironment Variable Expansion
CC-Relay supports environment variable expansion using ${VAR_NAME} syntax in both YAML and TOML formats:
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}" # Expanded at load time[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}" # Expanded at load timeComplete Configuration Reference
# ==========================================================================
# Server Configuration
# ==========================================================================
server:
# Address to listen on
listen: "127.0.0.1:8787"
# Request timeout in milliseconds (default: 600000 = 10 minutes)
timeout_ms: 600000
# Maximum concurrent requests (0 = unlimited)
max_concurrent: 0
# Enable HTTP/2 for better performance
enable_http2: true
# Authentication configuration
auth:
# Require specific API key for proxy access
api_key: "${PROXY_API_KEY}"
# Allow Claude Code subscription Bearer tokens
allow_subscription: true
# Specific Bearer token to validate (optional)
bearer_secret: "${BEARER_SECRET}"
# ==========================================================================
# Provider Configurations
# ==========================================================================
providers:
# Anthropic Direct API
- name: "anthropic"
type: "anthropic"
enabled: true
base_url: "https://api.anthropic.com" # Optional, uses default
keys:
- key: "${ANTHROPIC_API_KEY}"
rpm_limit: 60 # Requests per minute
tpm_limit: 100000 # Tokens per minute
# Optional: Specify available models
models:
- "claude-sonnet-4-5-20250514"
- "claude-opus-4-5-20250514"
- "claude-haiku-3-5-20241022"
# Z.AI / Zhipu GLM
- name: "zai"
type: "zai"
enabled: true
base_url: "https://api.z.ai/api/anthropic"
keys:
- key: "${ZAI_API_KEY}"
# Map Claude model names to Z.AI models
model_mapping:
"claude-sonnet-4-5-20250514": "GLM-4.7"
"claude-haiku-3-5-20241022": "GLM-4.5-Air"
# Optional: Specify available models
models:
- "GLM-4.7"
- "GLM-4.5-Air"
- "GLM-4-Plus"
# ==========================================================================
# Logging Configuration
# ==========================================================================
logging:
# Log level: debug, info, warn, error
level: "info"
# Log format: json, text
format: "text"
# Enable colored output (for text format)
pretty: true
# Granular debug options
debug_options:
log_request_body: false
log_response_headers: false
log_tls_metrics: false
max_body_log_size: 1000
# ==========================================================================
# Cache Configuration
# ==========================================================================
cache:
# Cache mode: single, ha, disabled
mode: single
# Single mode (Ristretto) configuration
ristretto:
num_counters: 1000000 # 10x expected max items
max_cost: 104857600 # 100 MB
buffer_items: 64 # Admission buffer size
# HA mode (Olric) configuration
olric:
embedded: true # Run embedded Olric node
bind_addr: "0.0.0.0:3320" # Olric client port
dmap_name: "cc-relay" # Distributed map name
environment: lan # local, lan, or wan
peers: # Memberlist addresses (bind_addr + 2)
- "other-node:3322"
replica_count: 2 # Copies per key
read_quorum: 1 # Min reads for success
write_quorum: 1 # Min writes for success
member_count_quorum: 2 # Min cluster members
leave_timeout: 5s # Leave broadcast duration
# ==========================================================================
# Routing Configuration
# ==========================================================================
routing:
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default)
strategy: failover
# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout: 5000
# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug: false# ==========================================================================
# Server Configuration
# ==========================================================================
[server]
# Address to listen on
listen = "127.0.0.1:8787"
# Request timeout in milliseconds (default: 600000 = 10 minutes)
timeout_ms = 600000
# Maximum concurrent requests (0 = unlimited)
max_concurrent = 0
# Enable HTTP/2 for better performance
enable_http2 = true
# Authentication configuration
[server.auth]
# Require specific API key for proxy access
api_key = "${PROXY_API_KEY}"
# Allow Claude Code subscription Bearer tokens
allow_subscription = true
# Specific Bearer token to validate (optional)
bearer_secret = "${BEARER_SECRET}"
# ==========================================================================
# Provider Configurations
# ==========================================================================
# Anthropic Direct API
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
base_url = "https://api.anthropic.com" # Optional, uses default
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
rpm_limit = 60 # Requests per minute
tpm_limit = 100000 # Tokens per minute
# Optional: Specify available models
models = [
"claude-sonnet-4-5-20250514",
"claude-opus-4-5-20250514",
"claude-haiku-3-5-20241022"
]
# Z.AI / Zhipu GLM
[[providers]]
name = "zai"
type = "zai"
enabled = true
base_url = "https://api.z.ai/api/anthropic"
[[providers.keys]]
key = "${ZAI_API_KEY}"
# Map Claude model names to Z.AI models
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-haiku-3-5-20241022" = "GLM-4.5-Air"
# Optional: Specify available models
models = [
"GLM-4.7",
"GLM-4.5-Air",
"GLM-4-Plus"
]
# ==========================================================================
# Logging Configuration
# ==========================================================================
[logging]
# Log level: debug, info, warn, error
level = "info"
# Log format: json, text
format = "text"
# Enable colored output (for text format)
pretty = true
# Granular debug options
[logging.debug_options]
log_request_body = false
log_response_headers = false
log_tls_metrics = false
max_body_log_size = 1000
# ==========================================================================
# Cache Configuration
# ==========================================================================
[cache]
# Cache mode: single, ha, disabled
mode = "single"
# Single mode (Ristretto) configuration
[cache.ristretto]
num_counters = 1000000 # 10x expected max items
max_cost = 104857600 # 100 MB
buffer_items = 64 # Admission buffer size
# HA mode (Olric) configuration
[cache.olric]
embedded = true # Run embedded Olric node
bind_addr = "0.0.0.0:3320" # Olric client port
dmap_name = "cc-relay" # Distributed map name
environment = "lan" # local, lan, or wan
peers = ["other-node:3322"] # Memberlist addresses (bind_addr + 2)
replica_count = 2 # Copies per key
read_quorum = 1 # Min reads for success
write_quorum = 1 # Min writes for success
member_count_quorum = 2 # Min cluster members
leave_timeout = "5s" # Leave broadcast duration
# ==========================================================================
# Routing Configuration
# ==========================================================================
[routing]
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default)
strategy = "failover"
# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout = 5000
# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug = falseServer Configuration
Listen Address
The listen field specifies where the proxy listens for incoming requests:
server:
listen: "127.0.0.1:8787" # Local only (recommended)
# listen: "0.0.0.0:8787" # All interfaces (use with caution)[server]
listen = "127.0.0.1:8787" # Local only (recommended)
# listen = "0.0.0.0:8787" # All interfaces (use with caution)Authentication
CC-Relay supports multiple authentication methods:
API Key Authentication
Require clients to provide a specific API key:
server:
auth:
api_key: "${PROXY_API_KEY}"[server.auth]
api_key = "${PROXY_API_KEY}"Clients must include the header: x-api-key: <your-proxy-key>
Claude Code Subscription Passthrough
Allow Claude Code subscription users to connect:
server:
auth:
allow_subscription: true[server.auth]
allow_subscription = trueThis accepts Authorization: Bearer tokens from Claude Code.
Combined Authentication
Allow both API key and subscription authentication:
server:
auth:
api_key: "${PROXY_API_KEY}"
allow_subscription: true[server.auth]
api_key = "${PROXY_API_KEY}"
allow_subscription = trueNo Authentication
To disable authentication (not recommended for production):
server:
auth: {}
# Or simply omit the auth section# Simply omit the [server.auth] section
# or define an empty section:
[server.auth]HTTP/2 Support
Enable HTTP/2 for better performance with concurrent requests:
server:
enable_http2: true[server]
enable_http2 = trueTransparent Authentication
cc-relay automatically detects how to handle authentication based on what the client sends:
How It Works
| Client Sends | cc-relay Behavior | Use Case |
|---|---|---|
Authorization: Bearer <token> | Forward unchanged | Claude Code subscription users |
x-api-key: <key> | Forward unchanged | Direct API key users |
| No auth headers | Use configured provider keys | Enterprise/team deployments |
Claude Code Subscription Users
If you have a Claude Code subscription (Max/Team/Enterprise plan), you can use cc-relay as a transparent proxy:
# Set cc-relay as your API endpoint
export ANTHROPIC_BASE_URL="http://localhost:8787"
# Your subscription token flows through unchanged
# ANTHROPIC_AUTH_TOKEN is already set by Claude Code
claudeNo API key required - cc-relay forwards your subscription token to Anthropic.
Enterprise/Team Deployments
For centralized API key management, don’t provide client auth - cc-relay uses configured keys:
# config.yaml
providers:
- name: anthropic
type: anthropic
base_url: https://api.anthropic.com
enabled: true
keys:
- key: ${ANTHROPIC_API_KEY}
rpm_limit: 50# config.toml
[[providers]]
name = "anthropic"
type = "anthropic"
base_url = "https://api.anthropic.com"
enabled = true
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
rpm_limit = 50# Client has no auth - uses configured keys
export ANTHROPIC_BASE_URL="http://localhost:8787"
unset ANTHROPIC_AUTH_TOKEN
unset ANTHROPIC_API_KEY
claudeMixed Mode
You can run both modes simultaneously:
- Subscription users: Their auth flows through (no key pool overhead)
- Team users: Use configured keys with rate limit pooling
Rate limiting and key pooling only apply when using configured keys, not client-provided auth.
Key Points
- Auto-detection: No configuration needed - behavior determined by client headers
- Subscription passthrough:
Authorization: Bearerforwarded unchanged - Fallback keys: Used only when client has no auth
- Key pool efficiency: Only tracks usage of YOUR keys, not client subscriptions
Provider Configuration
Provider Types
CC-Relay currently supports two provider types:
| Type | Description | Default Base URL |
|---|---|---|
anthropic | Anthropic Direct API | https://api.anthropic.com |
zai | Z.AI / Zhipu GLM | https://api.z.ai/api/anthropic |
Anthropic Provider
providers:
- name: "anthropic"
type: "anthropic"
enabled: true
base_url: "https://api.anthropic.com" # Optional
keys:
- key: "${ANTHROPIC_API_KEY}"
rpm_limit: 60
tpm_limit: 100000
models:
- "claude-sonnet-4-5-20250514"
- "claude-opus-4-5-20250514"
- "claude-haiku-3-5-20241022"[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
base_url = "https://api.anthropic.com" # Optional
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
rpm_limit = 60
tpm_limit = 100000
models = [
"claude-sonnet-4-5-20250514",
"claude-opus-4-5-20250514",
"claude-haiku-3-5-20241022"
]Z.AI Provider
Z.AI offers Anthropic-compatible APIs with GLM models at lower cost:
providers:
- name: "zai"
type: "zai"
enabled: true
base_url: "https://api.z.ai/api/anthropic"
keys:
- key: "${ZAI_API_KEY}"
model_mapping:
"claude-sonnet-4-5-20250514": "GLM-4.7"
"claude-haiku-3-5-20241022": "GLM-4.5-Air"
models:
- "GLM-4.7"
- "GLM-4.5-Air"
- "GLM-4-Plus"[[providers]]
name = "zai"
type = "zai"
enabled = true
base_url = "https://api.z.ai/api/anthropic"
[[providers.keys]]
key = "${ZAI_API_KEY}"
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-haiku-3-5-20241022" = "GLM-4.5-Air"
models = [
"GLM-4.7",
"GLM-4.5-Air",
"GLM-4-Plus"
]Multiple API Keys
Pool multiple API keys for higher throughput:
providers:
- name: "anthropic"
type: "anthropic"
enabled: true
keys:
- key: "${ANTHROPIC_API_KEY_1}"
rpm_limit: 60
tpm_limit: 100000
- key: "${ANTHROPIC_API_KEY_2}"
rpm_limit: 60
tpm_limit: 100000
- key: "${ANTHROPIC_API_KEY_3}"
rpm_limit: 60
tpm_limit: 100000[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
[[providers.keys]]
key = "${ANTHROPIC_API_KEY_1}"
rpm_limit = 60
tpm_limit = 100000
[[providers.keys]]
key = "${ANTHROPIC_API_KEY_2}"
rpm_limit = 60
tpm_limit = 100000
[[providers.keys]]
key = "${ANTHROPIC_API_KEY_3}"
rpm_limit = 60
tpm_limit = 100000Custom Base URL
Override the default API endpoint:
providers:
- name: "anthropic-custom"
type: "anthropic"
base_url: "https://custom-endpoint.example.com"[[providers]]
name = "anthropic-custom"
type = "anthropic"
base_url = "https://custom-endpoint.example.com"Logging Configuration
Log Levels
| Level | Description |
|---|---|
debug | Verbose output for development |
info | Normal operation messages |
warn | Warning messages |
error | Error messages only |
Log Format
logging:
format: "text" # Human-readable (default)
# format: "json" # Machine-readable, for log aggregation[logging]
format = "text" # Human-readable (default)
# format = "json" # Machine-readable, for log aggregationDebug Options
Fine-grained control over debug logging:
logging:
level: "debug"
debug_options:
log_request_body: true # Log request bodies (redacted)
log_response_headers: true # Log response headers
log_tls_metrics: true # Log TLS connection info
max_body_log_size: 1000 # Max bytes to log from bodies[logging]
level = "debug"
[logging.debug_options]
log_request_body = true # Log request bodies (redacted)
log_response_headers = true # Log response headers
log_tls_metrics = true # Log TLS connection info
max_body_log_size = 1000 # Max bytes to log from bodiesCache Configuration
CC-Relay provides a unified caching layer with multiple backend options for different deployment scenarios.
Cache Modes
| Mode | Backend | Use Case |
|---|---|---|
single | Ristretto | Single-instance deployments, high performance |
ha | Olric | Multi-instance deployments, shared state |
disabled | Noop | No caching, passthrough |
Single Mode (Ristretto)
Ristretto is a high-performance, concurrent in-memory cache. This is the default mode for single-instance deployments.
cache:
mode: single
ristretto:
num_counters: 1000000 # 10x expected max items
max_cost: 104857600 # 100 MB
buffer_items: 64 # Admission buffer size[cache]
mode = "single"
[cache.ristretto]
num_counters = 1000000 # 10x expected max items
max_cost = 104857600 # 100 MB
buffer_items = 64 # Admission buffer size| Field | Type | Default | Description |
|---|---|---|---|
num_counters | int64 | 1,000,000 | Number of 4-bit access counters. Recommended: 10x expected max items. |
max_cost | int64 | 104,857,600 (100 MB) | Maximum memory in bytes the cache can hold. |
buffer_items | int64 | 64 | Number of keys per Get buffer. Controls admission buffer size. |
HA Mode (Olric) - Embedded
For multi-instance deployments requiring shared cache state, use embedded Olric mode where each cc-relay instance runs an Olric node.
cache:
mode: ha
olric:
embedded: true
bind_addr: "0.0.0.0:3320"
dmap_name: "cc-relay"
environment: lan
peers:
- "other-node:3322" # Memberlist port = bind_addr + 2
replica_count: 2
read_quorum: 1
write_quorum: 1
member_count_quorum: 2
leave_timeout: 5s[cache]
mode = "ha"
[cache.olric]
embedded = true
bind_addr = "0.0.0.0:3320"
dmap_name = "cc-relay"
environment = "lan"
peers = ["other-node:3322"] # Memberlist port = bind_addr + 2
replica_count = 2
read_quorum = 1
write_quorum = 1
member_count_quorum = 2
leave_timeout = "5s"| Field | Type | Default | Description |
|---|---|---|---|
embedded | bool | false | Run embedded Olric node (true) vs. connect to external cluster (false). |
bind_addr | string | required | Address for Olric client connections (e.g., “0.0.0.0:3320”). |
dmap_name | string | “cc-relay” | Name of the distributed map. All nodes must use the same name. |
environment | string | “local” | Memberlist preset: “local”, “lan”, or “wan”. |
peers | []string | - | Memberlist addresses for peer discovery. Uses port bind_addr + 2. |
replica_count | int | 1 | Number of copies per key. 1 = no replication. |
read_quorum | int | 1 | Minimum successful reads for response. |
write_quorum | int | 1 | Minimum successful writes for response. |
member_count_quorum | int32 | 1 | Minimum cluster members required to operate. |
leave_timeout | duration | 5s | Time to broadcast leave message before shutdown. |
Important: Olric uses two ports - the bind_addr port for client connections and bind_addr + 2 for memberlist gossip. Ensure both ports are open in your firewall.
HA Mode (Olric) - Client Mode
Connect to an external Olric cluster instead of running embedded nodes:
cache:
mode: ha
olric:
embedded: false
addresses:
- "olric-node-1:3320"
- "olric-node-2:3320"
dmap_name: "cc-relay"[cache]
mode = "ha"
[cache.olric]
embedded = false
addresses = ["olric-node-1:3320", "olric-node-2:3320"]
dmap_name = "cc-relay"| Field | Type | Description |
|---|---|---|
embedded | bool | Set to false for client mode. |
addresses | []string | External Olric cluster addresses. |
dmap_name | string | Distributed map name (must match cluster configuration). |
Disabled Mode
Disable caching entirely for debugging or when caching is handled elsewhere:
cache:
mode: disabled[cache]
mode = "disabled"For detailed cache configuration including cache key conventions, cache busting strategies, HA clustering guides, and troubleshooting, see the Cache System documentation.
Routing Configuration
CC-Relay supports multiple routing strategies for distributing requests across providers.
# ==========================================================================
# Routing Configuration
# ==========================================================================
routing:
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default)
strategy: failover
# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout: 5000
# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug: false# ==========================================================================
# Routing Configuration
# ==========================================================================
[routing]
# Strategy: round_robin, weighted_round_robin, shuffle, failover (default)
strategy = "failover"
# Timeout for failover attempts in milliseconds (default: 5000)
failover_timeout = 5000
# Enable debug headers (X-CC-Relay-Strategy, X-CC-Relay-Provider)
debug = falseRouting Strategies
| Strategy | Description |
|---|---|
failover | Try providers in priority order, fallback on failure (default) |
round_robin | Sequential rotation through providers |
weighted_round_robin | Distribute proportionally by weight |
shuffle | Fair random distribution |
Provider Weight and Priority
Weight and priority are configured in the provider’s first key:
providers:
- name: "anthropic"
type: "anthropic"
keys:
- key: "${ANTHROPIC_API_KEY}"
weight: 3 # For weighted-round-robin (higher = more traffic)
priority: 2 # For failover (higher = tried first)[[providers]]
name = "anthropic"
type = "anthropic"
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
weight = 3 # For weighted-round-robin (higher = more traffic)
priority = 2 # For failover (higher = tried first)For detailed routing configuration including strategy explanations, debug headers, and failover triggers, see the Routing documentation.
Example Configurations
Minimal Single Provider
server:
listen: "127.0.0.1:8787"
providers:
- name: "anthropic"
type: "anthropic"
enabled: true
keys:
- key: "${ANTHROPIC_API_KEY}"[server]
listen = "127.0.0.1:8787"
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"Multi-Provider Setup
server:
listen: "127.0.0.1:8787"
auth:
allow_subscription: true
providers:
- name: "anthropic"
type: "anthropic"
enabled: true
keys:
- key: "${ANTHROPIC_API_KEY}"
- name: "zai"
type: "zai"
enabled: true
keys:
- key: "${ZAI_API_KEY}"
model_mapping:
"claude-sonnet-4-5-20250514": "GLM-4.7"
logging:
level: "info"
format: "text"[server]
listen = "127.0.0.1:8787"
[server.auth]
allow_subscription = true
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
[[providers]]
name = "zai"
type = "zai"
enabled = true
[[providers.keys]]
key = "${ZAI_API_KEY}"
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "GLM-4.7"
[logging]
level = "info"
format = "text"Development with Debug Logging
server:
listen: "127.0.0.1:8787"
providers:
- name: "anthropic"
type: "anthropic"
enabled: true
keys:
- key: "${ANTHROPIC_API_KEY}"
logging:
level: "debug"
format: "text"
pretty: true
debug_options:
log_request_body: true
log_response_headers: true
log_tls_metrics: true[server]
listen = "127.0.0.1:8787"
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
[logging]
level = "debug"
format = "text"
pretty = true
[logging.debug_options]
log_request_body = true
log_response_headers = true
log_tls_metrics = trueValidating Configuration
Validate your configuration file:
cc-relay config validateTip: Always validate configuration changes before deploying. Hot-reload will reject invalid configurations, but validation catches errors before they reach production.
Hot Reloading
CC-Relay automatically detects and applies configuration changes without requiring a restart. This enables zero-downtime configuration updates.
How It Works
CC-Relay uses fsnotify to monitor the config file for changes:
- File watching: The parent directory is monitored to properly detect atomic writes (temp file + rename pattern used by most editors)
- Debouncing: Multiple rapid file events are coalesced with a 100ms debounce delay to handle editor save behavior
- Atomic swap: New configuration is loaded and swapped atomically using Go’s
sync/atomic.Pointer - In-flight preservation: Requests in progress continue with the old configuration; new requests use the updated configuration
Events That Trigger Reload
| Event | Triggers Reload |
|---|---|
| File write | Yes |
| File create (atomic rename) | Yes |
| File chmod | No (ignored) |
| Other file in directory | No (ignored) |
Logging
When hot-reload occurs, you’ll see log messages:
INF config file reloaded path=/path/to/config.yaml
INF config hot-reloaded successfullyIf the new configuration is invalid:
ERR failed to reload config path=/path/to/config.yaml error="validation error"Invalid configurations are rejected and the proxy continues with the previous valid configuration.
Limitations
- Listen address: Changing
server.listenrequires a restart - gRPC address: Changing
grpc.listenrequires a restart
Configuration options that can be hot-reloaded:
- Logging level and format
- Routing strategy, failover timeout, weights, and priorities
- Provider enable/disable, base URL, and model mapping
- Keypool strategy, key weights, and per-key limits
- Max concurrent requests and max body size
- Health check intervals and circuit breaker thresholds
Hot-reload guarantees
- New requests use the latest configuration after reload completes.
- In-flight requests continue with the previous configuration.
- Reload applies atomically to routing/provider/keypool state.
- Invalid configs are rejected and the previous config remains active.
Next Steps
- Routing strategies - Provider selection and failover
- Understanding the architecture
- API reference