Providers

Providers

CC-Relay supports multiple LLM providers through a unified interface. This page explains how to configure each provider.

Overview

CC-Relay acts as a proxy between Claude Code and various LLM backends. All providers expose an Anthropic-compatible Messages API, enabling seamless switching between providers.

ProviderTypeDescriptionCost
AnthropicanthropicDirect Anthropic API accessStandard Anthropic pricing
Z.AIzaiZhipu AI GLM models, Anthropic-compatible~1/7 of Anthropic pricing
MiniMaxminimaxMiniMax models, Anthropic-compatibleMiniMax pricing
OllamaollamaLocal LLM inferenceFree (local compute)
AWS BedrockbedrockClaude via AWS with SigV4 authAWS Bedrock pricing
Azure AI FoundryazureClaude via Azure MAASAzure AI pricing
Google Vertex AIvertexClaude via Google CloudVertex AI pricing

Anthropic Provider

The Anthropic provider connects directly to Anthropic’s API. This is the default provider for full Claude model access.

Configuration

providers:
- name: "anthropic"
  type: "anthropic"
  enabled: true
  base_url: "https://api.anthropic.com"  # Optional, uses default

  keys:
    - key: "${ANTHROPIC_API_KEY}"
      rpm_limit: 60        # Requests per minute
      tpm_limit: 100000    # Tokens per minute
      priority: 2          # Higher = tried first in failover

  models:
    - "claude-sonnet-4-5-20250514"
    - "claude-opus-4-5-20250514"
    - "claude-haiku-3-5-20241022"
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true
base_url = "https://api.anthropic.com"  # Optional, uses default

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
rpm_limit = 60        # Requests per minute
tpm_limit = 100000    # Tokens per minute
priority = 2          # Higher = tried first in failover

models = [
"claude-sonnet-4-5-20250514",
"claude-opus-4-5-20250514",
"claude-haiku-3-5-20241022"
]

API Key Setup

  1. Create an account at console.anthropic.com
  2. Navigate to Settings > API Keys
  3. Create a new API key
  4. Store in environment variable: export ANTHROPIC_API_KEY="sk-ant-..."

Transparent Auth Support

The Anthropic provider supports transparent authentication for Claude Code subscription users. When enabled, cc-relay forwards your subscription token unchanged:

server:
auth:
  allow_subscription: true
[server.auth]
allow_subscription = true
# Your subscription token flows through unchanged
export ANTHROPIC_BASE_URL="http://localhost:8787"
claude

See Transparent Authentication for details.

Z.AI Provider

Z.AI (Zhipu AI) offers GLM models through an Anthropic-compatible API. This provides significant cost savings (~1/7 of Anthropic pricing) while maintaining API compatibility.

Configuration

providers:
- name: "zai"
  type: "zai"
  enabled: true
  base_url: "https://api.z.ai/api/anthropic"  # Optional, uses default

  keys:
    - key: "${ZAI_API_KEY}"
      priority: 1  # Lower priority than Anthropic for failover

  # Map Claude model names to Z.AI models
  model_mapping:
    "claude-sonnet-4-5-20250514": "GLM-4.7"
    "claude-sonnet-4-5": "GLM-4.7"
    "claude-haiku-3-5-20241022": "GLM-4.5-Air"
    "claude-haiku-3-5": "GLM-4.5-Air"

  models:
    - "GLM-4.7"
    - "GLM-4.5-Air"
    - "GLM-4-Plus"
[[providers]]
name = "zai"
type = "zai"
enabled = true
base_url = "https://api.z.ai/api/anthropic"  # Optional, uses default

[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1  # Lower priority than Anthropic for failover

# Map Claude model names to Z.AI models
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-sonnet-4-5" = "GLM-4.7"
"claude-haiku-3-5-20241022" = "GLM-4.5-Air"
"claude-haiku-3-5" = "GLM-4.5-Air"

models = [
"GLM-4.7",
"GLM-4.5-Air",
"GLM-4-Plus"
]

API Key Setup

  1. Create an account at z.ai/model-api
  2. Navigate to API Keys section
  3. Create a new API key
  4. Store in environment variable: export ZAI_API_KEY="..."

Get 10% off: Use this invite link when subscribing — both you and the referrer get 10% off.

Model Mapping

Model mapping translates Anthropic model names to Z.AI equivalents. When Claude Code requests claude-sonnet-4-5-20250514, cc-relay automatically routes to GLM-4.7:

model_mapping:
# Claude Sonnet -> GLM-4.7 (flagship model)
"claude-sonnet-4-5-20250514": "GLM-4.7"
"claude-sonnet-4-5": "GLM-4.7"

# Claude Haiku -> GLM-4.5-Air (fast, economical)
"claude-haiku-3-5-20241022": "GLM-4.5-Air"
"claude-haiku-3-5": "GLM-4.5-Air"
[model_mapping]
# Claude Sonnet -> GLM-4.7 (flagship model)
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-sonnet-4-5" = "GLM-4.7"

# Claude Haiku -> GLM-4.5-Air (fast, economical)
"claude-haiku-3-5-20241022" = "GLM-4.5-Air"
"claude-haiku-3-5" = "GLM-4.5-Air"

Cost Comparison

ModelAnthropic (per 1M tokens)Z.AI EquivalentZ.AI Cost
claude-sonnet-4-5$3 input / $15 outputGLM-4.7~$0.43 / $2.14
claude-haiku-3-5$0.25 input / $1.25 outputGLM-4.5-Air~$0.04 / $0.18

Prices are approximate and subject to change.

Ollama Provider

Ollama enables local LLM inference through an Anthropic-compatible API (available since Ollama v0.14). Run models locally for privacy, zero API costs, and offline operation.

Configuration

providers:
- name: "ollama"
  type: "ollama"
  enabled: true
  base_url: "http://localhost:11434"  # Optional, uses default

  keys:
    - key: "ollama"  # Ollama accepts but ignores API keys
      priority: 0    # Lowest priority for failover

  # Map Claude model names to local Ollama models
  model_mapping:
    "claude-sonnet-4-5-20250514": "qwen3:32b"
    "claude-sonnet-4-5": "qwen3:32b"
    "claude-haiku-3-5-20241022": "qwen3:8b"
    "claude-haiku-3-5": "qwen3:8b"

  models:
    - "qwen3:32b"
    - "qwen3:8b"
    - "codestral:latest"
[[providers]]
name = "ollama"
type = "ollama"
enabled = true
base_url = "http://localhost:11434"  # Optional, uses default

[[providers.keys]]
key = "ollama"  # Ollama accepts but ignores API keys
priority = 0    # Lowest priority for failover

# Map Claude model names to local Ollama models
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "qwen3:32b"
"claude-sonnet-4-5" = "qwen3:32b"
"claude-haiku-3-5-20241022" = "qwen3:8b"
"claude-haiku-3-5" = "qwen3:8b"

models = [
"qwen3:32b",
"qwen3:8b",
"codestral:latest"
]

Ollama Setup

  1. Install Ollama from ollama.com
  2. Pull models you want to use:
    ollama pull qwen3:32b
    ollama pull qwen3:8b
    ollama pull codestral:latest
  3. Start Ollama (runs automatically on install)

Recommended Models

For Claude Code workflows, choose models with at least 32K context:

ModelContextSizeBest For
qwen3:32b128K32B paramsGeneral coding, complex reasoning
qwen3:8b128K8B paramsFast iteration, simpler tasks
codestral:latest32K22B paramsCode generation, specialized coding
llama3.2:3b128K3B paramsVery fast, basic tasks

Feature Limitations

Ollama’s Anthropic compatibility is partial. Some features are not supported:

FeatureSupportedNotes
Streaming (SSE)YesSame event sequence as Anthropic
Tool callingYesSame format as Anthropic
Extended thinkingPartialbudget_tokens accepted but not enforced
Prompt cachingNocache_control blocks ignored
PDF inputNoNot supported
Image URLsNoBase64 encoding only
Token countingNo/v1/messages/count_tokens not available
tool_choiceNoCannot force specific tool usage

Docker Networking

When running cc-relay in Docker but Ollama on the host:

providers:
- name: "ollama"
  type: "ollama"
  # Use Docker's host gateway instead of localhost
  base_url: "http://host.docker.internal:11434"
[[providers]]
name = "ollama"
type = "ollama"
# Use Docker's host gateway instead of localhost
base_url = "http://host.docker.internal:11434"

Alternatively, run cc-relay with --network host:

docker run --network host cc-relay

AWS Bedrock Provider

AWS Bedrock provides Claude access through Amazon Web Services with enterprise-grade security and SigV4 authentication.

Configuration

providers:
- name: "bedrock"
  type: "bedrock"
  enabled: true

  # AWS region (required)
  aws_region: "us-east-1"

  # Explicit AWS credentials (optional)
  # If not set, uses AWS SDK default credential chain:
  # 1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  # 2. Shared credentials file (~/.aws/credentials)
  # 3. IAM role (EC2, ECS, Lambda)
  aws_access_key_id: "${AWS_ACCESS_KEY_ID}"
  aws_secret_access_key: "${AWS_SECRET_ACCESS_KEY}"

  # Map Claude model names to Bedrock model IDs
  model_mapping:
    "claude-sonnet-4-5-20250514": "anthropic.claude-sonnet-4-5-20250514-v1:0"
    "claude-sonnet-4-5": "anthropic.claude-sonnet-4-5-20250514-v1:0"
    "claude-haiku-3-5-20241022": "anthropic.claude-haiku-3-5-20241022-v1:0"

  keys:
    - key: "bedrock-internal"  # Internal key for cc-relay auth
[[providers]]
name = "bedrock"
type = "bedrock"
enabled = true

# AWS region (required)
aws_region = "us-east-1"

# Explicit AWS credentials (optional)
# If not set, uses AWS SDK default credential chain:
# 1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
# 2. Shared credentials file (~/.aws/credentials)
# 3. IAM role (EC2, ECS, Lambda)
aws_access_key_id = "${AWS_ACCESS_KEY_ID}"
aws_secret_access_key = "${AWS_SECRET_ACCESS_KEY}"

# Map Claude model names to Bedrock model IDs
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "anthropic.claude-sonnet-4-5-20250514-v1:0"
"claude-sonnet-4-5" = "anthropic.claude-sonnet-4-5-20250514-v1:0"
"claude-haiku-3-5-20241022" = "anthropic.claude-haiku-3-5-20241022-v1:0"

[[providers.keys]]
key = "bedrock-internal"  # Internal key for cc-relay auth

AWS Setup

  1. Enable Bedrock Access: In AWS Console, navigate to Bedrock > Model access and enable Claude models
  2. Configure Credentials: Use one of these methods:
    • Environment Variables: export AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=...
    • AWS CLI: aws configure
    • IAM Role: Attach Bedrock access policy to EC2/ECS/Lambda role

Bedrock Model IDs

Note: Model IDs change frequently as AWS Bedrock adds new Claude versions. Verify the current list in AWS Bedrock model access documentation before deploying.

Bedrock uses a specific model ID format: anthropic.{model}-v{version}:{minor}

Claude ModelBedrock Model ID
claude-sonnet-4-5-20250514anthropic.claude-sonnet-4-5-20250514-v1:0
claude-opus-4-5-20250514anthropic.claude-opus-4-5-20250514-v1:0
claude-haiku-3-5-20241022anthropic.claude-haiku-3-5-20241022-v1:0

Event Stream Conversion

Bedrock returns responses in AWS Event Stream format. CC-Relay automatically converts this to SSE format for Claude Code compatibility. No additional configuration is needed.

Azure AI Foundry Provider

Azure AI Foundry provides Claude access through Microsoft Azure with enterprise Azure integration.

Configuration

providers:
- name: "azure"
  type: "azure"
  enabled: true

  # Your Azure resource name (appears in URL: {name}.services.ai.azure.com)
  azure_resource_name: "my-azure-resource"

  # Azure API version (default: 2024-06-01)
  azure_api_version: "2024-06-01"

  # Azure uses x-api-key authentication (Anthropic-compatible)
  keys:
    - key: "${AZURE_API_KEY}"

  # Map Claude model names to Azure deployment names
  model_mapping:
    "claude-sonnet-4-5-20250514": "claude-sonnet-4-5"
    "claude-sonnet-4-5": "claude-sonnet-4-5"
    "claude-haiku-3-5": "claude-haiku-3-5"
[[providers]]
name = "azure"
type = "azure"
enabled = true

# Your Azure resource name (appears in URL: {name}.services.ai.azure.com)
azure_resource_name = "my-azure-resource"

# Azure API version (default: 2024-06-01)
azure_api_version = "2024-06-01"

# Azure uses x-api-key authentication (Anthropic-compatible)
[[providers.keys]]
key = "${AZURE_API_KEY}"

# Map Claude model names to Azure deployment names
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "claude-sonnet-4-5"
"claude-sonnet-4-5" = "claude-sonnet-4-5"
"claude-haiku-3-5" = "claude-haiku-3-5"

Azure Setup

  1. Create Azure AI Resource: In Azure Portal, create an Azure AI Foundry resource
  2. Deploy Claude Model: Deploy a Claude model in your AI Foundry workspace
  3. Get API Key: Copy the API key from Keys and Endpoint section
  4. Note Resource Name: Your URL is https://{resource_name}.services.ai.azure.com

Deployment Names

Azure uses deployment names as model identifiers. Create deployments in Azure AI Foundry, then map them:

model_mapping:
"claude-sonnet-4-5": "my-sonnet-deployment"  # Your deployment name
[model_mapping]
"claude-sonnet-4-5" = "my-sonnet-deployment"  # Your deployment name

Google Vertex AI Provider

Vertex AI provides Claude access through Google Cloud with seamless GCP integration.

Configuration

providers:
- name: "vertex"
  type: "vertex"
  enabled: true

  # Google Cloud project ID (required)
  gcp_project_id: "${GOOGLE_CLOUD_PROJECT}"

  # Google Cloud region (required)
  gcp_region: "us-east5"

  # Map Claude model names to Vertex AI model IDs
  model_mapping:
    "claude-sonnet-4-5-20250514": "claude-sonnet-4-5@20250514"
    "claude-sonnet-4-5": "claude-sonnet-4-5@20250514"
    "claude-haiku-3-5-20241022": "claude-haiku-3-5@20241022"

  keys:
    - key: "vertex-internal"  # Internal key for cc-relay auth
[[providers]]
name = "vertex"
type = "vertex"
enabled = true

# Google Cloud project ID (required)
gcp_project_id = "${GOOGLE_CLOUD_PROJECT}"

# Google Cloud region (required)
gcp_region = "us-east5"

# Map Claude model names to Vertex AI model IDs
[providers.model_mapping]
"claude-sonnet-4-5-20250514" = "claude-sonnet-4-5@20250514"
"claude-sonnet-4-5" = "claude-sonnet-4-5@20250514"
"claude-haiku-3-5-20241022" = "claude-haiku-3-5@20241022"

[[providers.keys]]
key = "vertex-internal"  # Internal key for cc-relay auth

GCP Setup

  1. Enable Vertex AI API: In GCP Console, enable the Vertex AI API
  2. Request Claude Access: Request access to Claude models through Vertex AI Model Garden
  3. Configure Authentication: Use one of these methods:
    • Application Default Credentials: gcloud auth application-default login
    • Service Account: Set GOOGLE_APPLICATION_CREDENTIALS environment variable
    • GCE/GKE: Uses attached service account automatically

Vertex AI Model IDs

Vertex AI uses {model}@{version} format:

Claude ModelVertex AI Model ID
claude-sonnet-4-5-20250514claude-sonnet-4-5@20250514
claude-opus-4-5-20250514claude-opus-4-5@20250514
claude-haiku-3-5-20241022claude-haiku-3-5@20241022

Regions

Available regions for Claude on Vertex AI (check Google Cloud documentation for the complete current list):

  • us-east5 (default)
  • us-central1
  • europe-west1

MiniMax Provider

MiniMax offers large language models through an Anthropic-compatible API. MiniMax provides competitive pricing with high-quality models suitable for coding tasks.

Configuration

providers:
- name: "minimax"
  type: "minimax"
  enabled: true
  base_url: "https://api.minimax.io/anthropic"  # Optional, uses default

  keys:
    - key: "${MINIMAX_API_KEY}"
      priority: 1  # Lower priority than Anthropic for failover

  # Map Claude model names to MiniMax models
  model_mapping:
    "claude-opus-4-6": "MiniMax-M2.5"
    "claude-sonnet-4-5-20250514": "MiniMax-M2.5-highspeed"
    "claude-sonnet-4-5": "MiniMax-M2.5-highspeed"
    "claude-haiku-4-5-20251001": "MiniMax-M2.1-highspeed"
    "claude-haiku-4-5": "MiniMax-M2.1-highspeed"

  models:
    - "MiniMax-M2.5"
    - "MiniMax-M2.5-highspeed"
    - "MiniMax-M2.1"
    - "MiniMax-M2.1-highspeed"
    - "MiniMax-M2"
[[providers]]
name = "minimax"
type = "minimax"
enabled = true
base_url = "https://api.minimax.io/anthropic"  # Optional, uses default

[[providers.keys]]
key = "${MINIMAX_API_KEY}"
priority = 1  # Lower priority than Anthropic for failover

# Map Claude model names to MiniMax models
[providers.model_mapping]
"claude-opus-4-6" = "MiniMax-M2.5"
"claude-sonnet-4-5-20250514" = "MiniMax-M2.5-highspeed"
"claude-sonnet-4-5" = "MiniMax-M2.5-highspeed"
"claude-haiku-4-5-20251001" = "MiniMax-M2.1-highspeed"
"claude-haiku-4-5" = "MiniMax-M2.1-highspeed"

models = [
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
"MiniMax-M2.1-highspeed",
"MiniMax-M2"
]

API Key Setup

  1. Create an account at minimax.io
  2. Navigate to the API Keys section
  3. Create a new API key
  4. Store in environment variable: export MINIMAX_API_KEY="..."

Authentication

MiniMax uses Bearer token authentication instead of the x-api-key header used by Anthropic. CC-Relay handles this automatically — no additional configuration is needed.

Available Models

ModelDescription
MiniMax-M2.5Flagship model, best quality
MiniMax-M2.5-highspeedFast variant of M2.5
MiniMax-M2.1Previous generation
MiniMax-M2.1-highspeedFast variant of M2.1
MiniMax-M2Base model

Model Mapping

Model mapping translates Anthropic model names to MiniMax equivalents:

model_mapping:
# Claude Opus -> MiniMax-M2.5 (flagship)
"claude-opus-4-6": "MiniMax-M2.5"

# Claude Sonnet -> MiniMax-M2.5-highspeed (fast, high quality)
"claude-sonnet-4-5-20250514": "MiniMax-M2.5-highspeed"
"claude-sonnet-4-5": "MiniMax-M2.5-highspeed"

# Claude Haiku -> MiniMax-M2.1-highspeed (fast, economical)
"claude-haiku-4-5-20251001": "MiniMax-M2.1-highspeed"
"claude-haiku-4-5": "MiniMax-M2.1-highspeed"
[model_mapping]
# Claude Opus -> MiniMax-M2.5 (flagship)
"claude-opus-4-6" = "MiniMax-M2.5"

# Claude Sonnet -> MiniMax-M2.5-highspeed (fast, high quality)
"claude-sonnet-4-5-20250514" = "MiniMax-M2.5-highspeed"
"claude-sonnet-4-5" = "MiniMax-M2.5-highspeed"

# Claude Haiku -> MiniMax-M2.1-highspeed (fast, economical)
"claude-haiku-4-5-20251001" = "MiniMax-M2.1-highspeed"
"claude-haiku-4-5" = "MiniMax-M2.1-highspeed"

Cloud Provider Comparison

FeatureBedrockAzureVertex AI
AuthenticationSigV4 (AWS)API KeyOAuth2 (GCP)
Streaming FormatEvent StreamSSESSE
Body TransformYesNoYes
Model in URLYesNoYes
Enterprise SSOAWS IAMEntra IDGCP IAM
RegionsUS, EU, APACGlobalUS, EU

Model Mapping

The model_mapping field translates incoming model names to provider-specific models:

providers:
- name: "zai"
  type: "zai"
  model_mapping:
    # Format: "incoming-model": "provider-model"
    "claude-sonnet-4-5-20250514": "GLM-4.7"
    "claude-sonnet-4-5": "GLM-4.7"
[[providers]]
name = "zai"
type = "zai"

[providers.model_mapping]
# Format: "incoming-model" = "provider-model"
"claude-sonnet-4-5-20250514" = "GLM-4.7"
"claude-sonnet-4-5" = "GLM-4.7"

When Claude Code sends:

{"model": "claude-sonnet-4-5-20250514", ...}

CC-Relay routes to Z.AI with:

{"model": "GLM-4.7", ...}

Mapping Tips

  1. Include version suffixes: Map both claude-sonnet-4-5 and claude-sonnet-4-5-20250514
  2. Consider context length: Match models with similar capabilities
  3. Test quality: Verify output quality matches your needs

Multi-Provider Setup

Configure multiple providers for failover, cost optimization, or load distribution:

providers:
# Primary: Anthropic (highest quality)
- name: "anthropic"
  type: "anthropic"
  enabled: true
  keys:
    - key: "${ANTHROPIC_API_KEY}"
      priority: 2  # Tried first

# Secondary: Z.AI (cost-effective)
- name: "zai"
  type: "zai"
  enabled: true
  keys:
    - key: "${ZAI_API_KEY}"
      priority: 1  # Fallback

# Tertiary: Ollama (local, free)
- name: "ollama"
  type: "ollama"
  enabled: true
  keys:
    - key: "ollama"
      priority: 0  # Last resort

routing:
strategy: failover  # Try providers in priority order
# Primary: Anthropic (highest quality)
[[providers]]
name = "anthropic"
type = "anthropic"
enabled = true

[[providers.keys]]
key = "${ANTHROPIC_API_KEY}"
priority = 2  # Tried first

# Secondary: Z.AI (cost-effective)
[[providers]]
name = "zai"
type = "zai"
enabled = true

[[providers.keys]]
key = "${ZAI_API_KEY}"
priority = 1  # Fallback

# Tertiary: Ollama (local, free)
[[providers]]
name = "ollama"
type = "ollama"
enabled = true

[[providers.keys]]
key = "ollama"
priority = 0  # Last resort

[routing]
strategy = "failover"  # Try providers in priority order

With this configuration:

  1. Requests go to Anthropic first (priority 2)
  2. If Anthropic fails (429, 5xx), try Z.AI (priority 1)
  3. If Z.AI fails, try Ollama (priority 0)

See Routing Strategies for more options.

Troubleshooting

Connection Refused (Ollama)

Symptom: connection refused when connecting to Ollama

Causes:

  • Ollama not running
  • Wrong port
  • Docker networking issue

Solutions:

# Check if Ollama is running
ollama list

# Verify port
curl http://localhost:11434/api/version

# For Docker, use host gateway
base_url: "http://host.docker.internal:11434"

Authentication Failed (Z.AI)

Symptom: 401 Unauthorized from Z.AI

Causes:

  • Invalid API key
  • Environment variable not set
  • Key not activated

Solutions:

# Verify environment variable is set
echo $ZAI_API_KEY

# Test key directly
curl -X POST https://api.z.ai/api/anthropic/v1/messages \
  -H "x-api-key: $ZAI_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"GLM-4.7","max_tokens":10,"messages":[{"role":"user","content":"Hi"}]}'

Model Not Found

Symptom: model not found errors

Causes:

  • Model not configured in models list
  • Missing model_mapping entry
  • Model not installed (Ollama)

Solutions:

# Ensure model is listed
models:
- "GLM-4.7"

# Ensure mapping exists
model_mapping:
"claude-sonnet-4-5": "GLM-4.7"
# Ensure model is listed
models = ["GLM-4.7"]

# Ensure mapping exists
[model_mapping]
"claude-sonnet-4-5" = "GLM-4.7"

For Ollama, verify model is installed:

ollama list
ollama pull qwen3:32b

Slow Response (Ollama)

Symptom: Very slow responses from Ollama

Causes:

  • Model too large for hardware
  • GPU not being used
  • Insufficient RAM

Solutions:

  • Use smaller model (qwen3:8b instead of qwen3:32b)
  • Verify GPU is enabled: ollama run qwen3:8b --verbose
  • Check memory usage during inference

Next Steps