Providers
CC-Relay supports multiple LLM providers through a unified interface. This page explains how to configure each provider.
Overview
CC-Relay acts as a proxy between Claude Code and various LLM backends. All providers expose an Anthropic-compatible Messages API, enabling seamless switching between providers.
| Provider | Type | Description | Cost |
|---|---|---|---|
| Anthropic | anthropic | Direct Anthropic API access | Standard Anthropic pricing |
| Z.AI | zai | Zhipu AI GLM models, Anthropic-compatible | ~1/7 of Anthropic pricing |
| MiniMax | minimax | MiniMax models, Anthropic-compatible | MiniMax pricing |
| Ollama | ollama | Local LLM inference | Free (local compute) |
| AWS Bedrock | bedrock | Claude via AWS with SigV4 auth | AWS Bedrock pricing |
| Azure AI Foundry | azure | Claude via Azure MAAS | Azure AI pricing |
| Google Vertex AI | vertex | Claude via Google Cloud | Vertex AI pricing |
Anthropic Provider
The Anthropic provider connects directly to Anthropic’s API. This is the default provider for full Claude model access.
Configuration
providers:
- name: "anthropic"
type: "anthropic"
enabled: true
base_url: "https://api.anthropic.com" # Optional, uses default
keys:
- key: "${ANTHROPIC_API_KEY}"
rpm_limit: 60 # Requests per minute
tpm_limit: 100000 # Tokens per minute
priority: 2 # Higher = tried first in failover
models:
- "claude-sonnet-4-5-20250514"
- "claude-opus-4-5-20250514"
- "claude-haiku-3-5-20241022"[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
base_url = "https://api.anthropic.com" # Optional, uses default
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
rpm_limit = 60 # Requests per minute
tpm_limit = 100000 # Tokens per minute
priority = 2 # Higher = tried first in failover
models = [
"claude-sonnet-4-5-20250514",
"claude-opus-4-5-20250514",
"claude-haiku-3-5-20241022"
]API Key Setup
- Create an account at console.anthropic.com
- Navigate to Settings > API Keys
- Create a new API key
- Store in environment variable:
export ANTHROPIC_API_KEY="sk-ant-..."
Transparent Auth Support
The Anthropic provider supports transparent authentication for Claude Code subscription users. When enabled, cc-relay forwards your subscription token unchanged:
server:
auth:
allow_subscription: true[server.auth]
allow_subscription = true# Your subscription token flows through unchanged
export ANTHROPIC_BASE_URL="http://localhost:8787"
claudeSee Transparent Authentication for details.
Z.AI Provider
Z.AI (Zhipu AI) offers GLM models through an Anthropic-compatible API. This provides significant cost savings (~1/7 of Anthropic pricing) while maintaining API compatibility.
Configuration
providers:
- name: "zai"
type: "zai"
enabled: true
base_url: "https://api.z.ai/api/anthropic" # Optional, uses default
keys:
- key: "${ZAI_API_KEY}"
priority: 1 # Lower priority than Anthropic for failover
# Map Claude model names to Z.AI models
model_mapping:
"claude-sonnet-4-5-20250514": "GLM-4.7"
"claude-sonnet-4-5": "GLM-4.7"
"claude-haiku-3-5-20241022": "GLM-4.5-Air"
"claude-haiku-3-5": "GLM-4.5-Air"
models:
- "GLM-4.7"
- "GLM-4.5-Air"
- "GLM-4-Plus"[[providers]]
name = "zai"
type = "zai"
enabled = true
base_url = "https://api.z.ai/api/anthropic" # Optional, uses default
[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1 # Lower priority than Anthropic for failover
# Map Claude model names to Z.AI models
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-sonnet-4-5" = "GLM-4.7"
"claude-haiku-3-5-20241022" = "GLM-4.5-Air"
"claude-haiku-3-5" = "GLM-4.5-Air"
models = [
"GLM-4.7",
"GLM-4.5-Air",
"GLM-4-Plus"
]API Key Setup
- Create an account at z.ai/model-api
- Navigate to API Keys section
- Create a new API key
- Store in environment variable:
export ZAI_API_KEY="..."
Get 10% off: Use this invite link when subscribing — both you and the referrer get 10% off.
Model Mapping
Model mapping translates Anthropic model names to Z.AI equivalents. When Claude Code requests claude-sonnet-4-5-20250514, cc-relay automatically routes to GLM-4.7:
model_mapping:
# Claude Sonnet -> GLM-4.7 (flagship model)
"claude-sonnet-4-5-20250514": "GLM-4.7"
"claude-sonnet-4-5": "GLM-4.7"
# Claude Haiku -> GLM-4.5-Air (fast, economical)
"claude-haiku-3-5-20241022": "GLM-4.5-Air"
"claude-haiku-3-5": "GLM-4.5-Air"[model_mapping]
# Claude Sonnet -> GLM-4.7 (flagship model)
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-sonnet-4-5" = "GLM-4.7"
# Claude Haiku -> GLM-4.5-Air (fast, economical)
"claude-haiku-3-5-20241022" = "GLM-4.5-Air"
"claude-haiku-3-5" = "GLM-4.5-Air"Cost Comparison
| Model | Anthropic (per 1M tokens) | Z.AI Equivalent | Z.AI Cost |
|---|---|---|---|
| claude-sonnet-4-5 | $3 input / $15 output | GLM-4.7 | ~$0.43 / $2.14 |
| claude-haiku-3-5 | $0.25 input / $1.25 output | GLM-4.5-Air | ~$0.04 / $0.18 |
Prices are approximate and subject to change.
Ollama Provider
Ollama enables local LLM inference through an Anthropic-compatible API (available since Ollama v0.14). Run models locally for privacy, zero API costs, and offline operation.
Configuration
providers:
- name: "ollama"
type: "ollama"
enabled: true
base_url: "http://localhost:11434" # Optional, uses default
keys:
- key: "ollama" # Ollama accepts but ignores API keys
priority: 0 # Lowest priority for failover
# Map Claude model names to local Ollama models
model_mapping:
"claude-sonnet-4-5-20250514": "qwen3:32b"
"claude-sonnet-4-5": "qwen3:32b"
"claude-haiku-3-5-20241022": "qwen3:8b"
"claude-haiku-3-5": "qwen3:8b"
models:
- "qwen3:32b"
- "qwen3:8b"
- "codestral:latest"[[providers]]
name = "ollama"
type = "ollama"
enabled = true
base_url = "http://localhost:11434" # Optional, uses default
[[providers.keys]]
key = "ollama" # Ollama accepts but ignores API keys
priority = 0 # Lowest priority for failover
# Map Claude model names to local Ollama models
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "qwen3:32b"
"claude-sonnet-4-5" = "qwen3:32b"
"claude-haiku-3-5-20241022" = "qwen3:8b"
"claude-haiku-3-5" = "qwen3:8b"
models = [
"qwen3:32b",
"qwen3:8b",
"codestral:latest"
]Ollama Setup
- Install Ollama from ollama.com
- Pull models you want to use:
ollama pull qwen3:32b ollama pull qwen3:8b ollama pull codestral:latest - Start Ollama (runs automatically on install)
Recommended Models
For Claude Code workflows, choose models with at least 32K context:
| Model | Context | Size | Best For |
|---|---|---|---|
qwen3:32b | 128K | 32B params | General coding, complex reasoning |
qwen3:8b | 128K | 8B params | Fast iteration, simpler tasks |
codestral:latest | 32K | 22B params | Code generation, specialized coding |
llama3.2:3b | 128K | 3B params | Very fast, basic tasks |
Feature Limitations
Ollama’s Anthropic compatibility is partial. Some features are not supported:
| Feature | Supported | Notes |
|---|---|---|
| Streaming (SSE) | Yes | Same event sequence as Anthropic |
| Tool calling | Yes | Same format as Anthropic |
| Extended thinking | Partial | budget_tokens accepted but not enforced |
| Prompt caching | No | cache_control blocks ignored |
| PDF input | No | Not supported |
| Image URLs | No | Base64 encoding only |
| Token counting | No | /v1/messages/count_tokens not available |
tool_choice | No | Cannot force specific tool usage |
Docker Networking
When running cc-relay in Docker but Ollama on the host:
providers:
- name: "ollama"
type: "ollama"
# Use Docker's host gateway instead of localhost
base_url: "http://host.docker.internal:11434"[[providers]]
name = "ollama"
type = "ollama"
# Use Docker's host gateway instead of localhost
base_url = "http://host.docker.internal:11434"Alternatively, run cc-relay with --network host:
docker run --network host cc-relayAWS Bedrock Provider
AWS Bedrock provides Claude access through Amazon Web Services with enterprise-grade security and SigV4 authentication.
Configuration
providers:
- name: "bedrock"
type: "bedrock"
enabled: true
# AWS region (required)
aws_region: "us-east-1"
# Explicit AWS credentials (optional)
# If not set, uses AWS SDK default credential chain:
# 1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
# 2. Shared credentials file (~/.aws/credentials)
# 3. IAM role (EC2, ECS, Lambda)
aws_access_key_id: "${AWS_ACCESS_KEY_ID}"
aws_secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
# Map Claude model names to Bedrock model IDs
model_mapping:
"claude-sonnet-4-5-20250514": "anthropic.claude-sonnet-4-5-20250514-v1:0"
"claude-sonnet-4-5": "anthropic.claude-sonnet-4-5-20250514-v1:0"
"claude-haiku-3-5-20241022": "anthropic.claude-haiku-3-5-20241022-v1:0"
keys:
- key: "bedrock-internal" # Internal key for cc-relay auth[[providers]]
name = "bedrock"
type = "bedrock"
enabled = true
# AWS region (required)
aws_region = "us-east-1"
# Explicit AWS credentials (optional)
# If not set, uses AWS SDK default credential chain:
# 1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
# 2. Shared credentials file (~/.aws/credentials)
# 3. IAM role (EC2, ECS, Lambda)
aws_access_key_id = "${AWS_ACCESS_KEY_ID}"
aws_secret_access_key = "${AWS_SECRET_ACCESS_KEY}"
# Map Claude model names to Bedrock model IDs
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "anthropic.claude-sonnet-4-5-20250514-v1:0"
"claude-sonnet-4-5" = "anthropic.claude-sonnet-4-5-20250514-v1:0"
"claude-haiku-3-5-20241022" = "anthropic.claude-haiku-3-5-20241022-v1:0"
[[providers.keys]]
key = "bedrock-internal" # Internal key for cc-relay authAWS Setup
- Enable Bedrock Access: In AWS Console, navigate to Bedrock > Model access and enable Claude models
- Configure Credentials: Use one of these methods:
- Environment Variables:
export AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... - AWS CLI:
aws configure - IAM Role: Attach Bedrock access policy to EC2/ECS/Lambda role
- Environment Variables:
Bedrock Model IDs
Note: Model IDs change frequently as AWS Bedrock adds new Claude versions. Verify the current list in AWS Bedrock model access documentation before deploying.
Bedrock uses a specific model ID format: anthropic.{model}-v{version}:{minor}
| Claude Model | Bedrock Model ID |
|---|---|
| claude-sonnet-4-5-20250514 | anthropic.claude-sonnet-4-5-20250514-v1:0 |
| claude-opus-4-5-20250514 | anthropic.claude-opus-4-5-20250514-v1:0 |
| claude-haiku-3-5-20241022 | anthropic.claude-haiku-3-5-20241022-v1:0 |
Event Stream Conversion
Bedrock returns responses in AWS Event Stream format. CC-Relay automatically converts this to SSE format for Claude Code compatibility. No additional configuration is needed.
Azure AI Foundry Provider
Azure AI Foundry provides Claude access through Microsoft Azure with enterprise Azure integration.
Configuration
providers:
- name: "azure"
type: "azure"
enabled: true
# Your Azure resource name (appears in URL: {name}.services.ai.azure.com)
azure_resource_name: "my-azure-resource"
# Azure API version (default: 2024-06-01)
azure_api_version: "2024-06-01"
# Azure uses x-api-key authentication (Anthropic-compatible)
keys:
- key: "${AZURE_API_KEY}"
# Map Claude model names to Azure deployment names
model_mapping:
"claude-sonnet-4-5-20250514": "claude-sonnet-4-5"
"claude-sonnet-4-5": "claude-sonnet-4-5"
"claude-haiku-3-5": "claude-haiku-3-5"[[providers]]
name = "azure"
type = "azure"
enabled = true
# Your Azure resource name (appears in URL: {name}.services.ai.azure.com)
azure_resource_name = "my-azure-resource"
# Azure API version (default: 2024-06-01)
azure_api_version = "2024-06-01"
# Azure uses x-api-key authentication (Anthropic-compatible)
[[providers.keys]]
key = "${AZURE_API_KEY}"
# Map Claude model names to Azure deployment names
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "claude-sonnet-4-5"
"claude-sonnet-4-5" = "claude-sonnet-4-5"
"claude-haiku-3-5" = "claude-haiku-3-5"Azure Setup
- Create Azure AI Resource: In Azure Portal, create an Azure AI Foundry resource
- Deploy Claude Model: Deploy a Claude model in your AI Foundry workspace
- Get API Key: Copy the API key from Keys and Endpoint section
- Note Resource Name: Your URL is
https://{resource_name}.services.ai.azure.com
Deployment Names
Azure uses deployment names as model identifiers. Create deployments in Azure AI Foundry, then map them:
model_mapping:
"claude-sonnet-4-5": "my-sonnet-deployment" # Your deployment name[model_mapping]
"claude-sonnet-4-5" = "my-sonnet-deployment" # Your deployment nameGoogle Vertex AI Provider
Vertex AI provides Claude access through Google Cloud with seamless GCP integration.
Configuration
providers:
- name: "vertex"
type: "vertex"
enabled: true
# Google Cloud project ID (required)
gcp_project_id: "${GOOGLE_CLOUD_PROJECT}"
# Google Cloud region (required)
gcp_region: "us-east5"
# Map Claude model names to Vertex AI model IDs
model_mapping:
"claude-sonnet-4-5-20250514": "claude-sonnet-4-5@20250514"
"claude-sonnet-4-5": "claude-sonnet-4-5@20250514"
"claude-haiku-3-5-20241022": "claude-haiku-3-5@20241022"
keys:
- key: "vertex-internal" # Internal key for cc-relay auth[[providers]]
name = "vertex"
type = "vertex"
enabled = true
# Google Cloud project ID (required)
gcp_project_id = "${GOOGLE_CLOUD_PROJECT}"
# Google Cloud region (required)
gcp_region = "us-east5"
# Map Claude model names to Vertex AI model IDs
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "claude-sonnet-4-5@20250514"
"claude-sonnet-4-5" = "claude-sonnet-4-5@20250514"
"claude-haiku-3-5-20241022" = "claude-haiku-3-5@20241022"
[[providers.keys]]
key = "vertex-internal" # Internal key for cc-relay authGCP Setup
- Enable Vertex AI API: In GCP Console, enable the Vertex AI API
- Request Claude Access: Request access to Claude models through Vertex AI Model Garden
- Configure Authentication: Use one of these methods:
- Application Default Credentials:
gcloud auth application-default login - Service Account: Set
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - GCE/GKE: Uses attached service account automatically
- Application Default Credentials:
Vertex AI Model IDs
Vertex AI uses {model}@{version} format:
| Claude Model | Vertex AI Model ID |
|---|---|
| claude-sonnet-4-5-20250514 | claude-sonnet-4-5@20250514 |
| claude-opus-4-5-20250514 | claude-opus-4-5@20250514 |
| claude-haiku-3-5-20241022 | claude-haiku-3-5@20241022 |
Regions
Available regions for Claude on Vertex AI (check Google Cloud documentation for the complete current list):
us-east5(default)us-central1europe-west1
MiniMax Provider
MiniMax offers large language models through an Anthropic-compatible API. MiniMax provides competitive pricing with high-quality models suitable for coding tasks.
Configuration
providers:
- name: "minimax"
type: "minimax"
enabled: true
base_url: "https://api.minimax.io/anthropic" # Optional, uses default
keys:
- key: "${MINIMAX_API_KEY}"
priority: 1 # Lower priority than Anthropic for failover
# Map Claude model names to MiniMax models
model_mapping:
"claude-opus-4-6": "MiniMax-M2.5"
"claude-sonnet-4-5-20250514": "MiniMax-M2.5-highspeed"
"claude-sonnet-4-5": "MiniMax-M2.5-highspeed"
"claude-haiku-4-5-20251001": "MiniMax-M2.1-highspeed"
"claude-haiku-4-5": "MiniMax-M2.1-highspeed"
models:
- "MiniMax-M2.5"
- "MiniMax-M2.5-highspeed"
- "MiniMax-M2.1"
- "MiniMax-M2.1-highspeed"
- "MiniMax-M2"[[providers]]
name = "minimax"
type = "minimax"
enabled = true
base_url = "https://api.minimax.io/anthropic" # Optional, uses default
[[providers.keys]]
key = "${MINIMAX_API_KEY}"
priority = 1 # Lower priority than Anthropic for failover
# Map Claude model names to MiniMax models
[providers.model_mapping]
"claude-opus-4-6" = "MiniMax-M2.5"
"claude-sonnet-4-5-20250514" = "MiniMax-M2.5-highspeed"
"claude-sonnet-4-5" = "MiniMax-M2.5-highspeed"
"claude-haiku-4-5-20251001" = "MiniMax-M2.1-highspeed"
"claude-haiku-4-5" = "MiniMax-M2.1-highspeed"
models = [
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
"MiniMax-M2.1-highspeed",
"MiniMax-M2"
]API Key Setup
- Create an account at minimax.io
- Navigate to the API Keys section
- Create a new API key
- Store in environment variable:
export MINIMAX_API_KEY="..."
Authentication
MiniMax uses Bearer token authentication instead of the x-api-key header used by Anthropic. CC-Relay handles this automatically — no additional configuration is needed.
Available Models
| Model | Description |
|---|---|
MiniMax-M2.5 | Flagship model, best quality |
MiniMax-M2.5-highspeed | Fast variant of M2.5 |
MiniMax-M2.1 | Previous generation |
MiniMax-M2.1-highspeed | Fast variant of M2.1 |
MiniMax-M2 | Base model |
Model Mapping
Model mapping translates Anthropic model names to MiniMax equivalents:
model_mapping:
# Claude Opus -> MiniMax-M2.5 (flagship)
"claude-opus-4-6": "MiniMax-M2.5"
# Claude Sonnet -> MiniMax-M2.5-highspeed (fast, high quality)
"claude-sonnet-4-5-20250514": "MiniMax-M2.5-highspeed"
"claude-sonnet-4-5": "MiniMax-M2.5-highspeed"
# Claude Haiku -> MiniMax-M2.1-highspeed (fast, economical)
"claude-haiku-4-5-20251001": "MiniMax-M2.1-highspeed"
"claude-haiku-4-5": "MiniMax-M2.1-highspeed"[model_mapping]
# Claude Opus -> MiniMax-M2.5 (flagship)
"claude-opus-4-6" = "MiniMax-M2.5"
# Claude Sonnet -> MiniMax-M2.5-highspeed (fast, high quality)
"claude-sonnet-4-5-20250514" = "MiniMax-M2.5-highspeed"
"claude-sonnet-4-5" = "MiniMax-M2.5-highspeed"
# Claude Haiku -> MiniMax-M2.1-highspeed (fast, economical)
"claude-haiku-4-5-20251001" = "MiniMax-M2.1-highspeed"
"claude-haiku-4-5" = "MiniMax-M2.1-highspeed"Cloud Provider Comparison
| Feature | Bedrock | Azure | Vertex AI |
|---|---|---|---|
| Authentication | SigV4 (AWS) | API Key | OAuth2 (GCP) |
| Streaming Format | Event Stream | SSE | SSE |
| Body Transform | Yes | No | Yes |
| Model in URL | Yes | No | Yes |
| Enterprise SSO | AWS IAM | Entra ID | GCP IAM |
| Regions | US, EU, APAC | Global | US, EU |
Model Mapping
The model_mapping field translates incoming model names to provider-specific models:
providers:
- name: "zai"
type: "zai"
model_mapping:
# Format: "incoming-model": "provider-model"
"claude-sonnet-4-5-20250514": "GLM-4.7"
"claude-sonnet-4-5": "GLM-4.7"[[providers]]
name = "zai"
type = "zai"
[providers.model_mapping]
# Format: "incoming-model" = "provider-model"
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-sonnet-4-5" = "GLM-4.7"When Claude Code sends:
{"model": "claude-sonnet-4-5-20250514", ...}CC-Relay routes to Z.AI with:
{"model": "GLM-4.7", ...}Mapping Tips
- Include version suffixes: Map both
claude-sonnet-4-5andclaude-sonnet-4-5-20250514 - Consider context length: Match models with similar capabilities
- Test quality: Verify output quality matches your needs
Multi-Provider Setup
Configure multiple providers for failover, cost optimization, or load distribution:
providers:
# Primary: Anthropic (highest quality)
- name: "anthropic"
type: "anthropic"
enabled: true
keys:
- key: "${ANTHROPIC_API_KEY}"
priority: 2 # Tried first
# Secondary: Z.AI (cost-effective)
- name: "zai"
type: "zai"
enabled: true
keys:
- key: "${ZAI_API_KEY}"
priority: 1 # Fallback
# Tertiary: Ollama (local, free)
- name: "ollama"
type: "ollama"
enabled: true
keys:
- key: "ollama"
priority: 0 # Last resort
routing:
strategy: failover # Try providers in priority order# Primary: Anthropic (highest quality)
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
priority = 2 # Tried first
# Secondary: Z.AI (cost-effective)
[[providers]]
name = "zai"
type = "zai"
enabled = true
[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1 # Fallback
# Tertiary: Ollama (local, free)
[[providers]]
name = "ollama"
type = "ollama"
enabled = true
[[providers.keys]]
key = "ollama"
priority = 0 # Last resort
[routing]
strategy = "failover" # Try providers in priority orderWith this configuration:
- Requests go to Anthropic first (priority 2)
- If Anthropic fails (429, 5xx), try Z.AI (priority 1)
- If Z.AI fails, try Ollama (priority 0)
See Routing Strategies for more options.
Troubleshooting
Connection Refused (Ollama)
Symptom: connection refused when connecting to Ollama
Causes:
- Ollama not running
- Wrong port
- Docker networking issue
Solutions:
# Check if Ollama is running
ollama list
# Verify port
curl http://localhost:11434/api/version
# For Docker, use host gateway
base_url: "http://host.docker.internal:11434"Authentication Failed (Z.AI)
Symptom: 401 Unauthorized from Z.AI
Causes:
- Invalid API key
- Environment variable not set
- Key not activated
Solutions:
# Verify environment variable is set
echo $ZAI_API_KEY
# Test key directly
curl -X POST https://api.z.ai/api/anthropic/v1/messages \
-H "x-api-key: $ZAI_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"GLM-4.7","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'Model Not Found
Symptom: model not found errors
Causes:
- Model not configured in
modelslist - Missing
model_mappingentry - Model not installed (Ollama)
Solutions:
# Ensure model is listed
models:
- "GLM-4.7"
# Ensure mapping exists
model_mapping:
"claude-sonnet-4-5": "GLM-4.7"# Ensure model is listed
models = ["GLM-4.7"]
# Ensure mapping exists
[model_mapping]
"claude-sonnet-4-5" = "GLM-4.7"For Ollama, verify model is installed:
ollama list
ollama pull qwen3:32bSlow Response (Ollama)
Symptom: Very slow responses from Ollama
Causes:
- Model too large for hardware
- GPU not being used
- Insufficient RAM
Solutions:
- Use smaller model (
qwen3:8binstead ofqwen3:32b) - Verify GPU is enabled:
ollama run qwen3:8b --verbose - Check memory usage during inference
Next Steps
- Configuration Reference - Complete configuration options
- Routing Strategies - Provider selection and failover
- Health Monitoring - Circuit breakers and health checks