Skip to main content
By default, workers use the Control Plane’s centralized LLM gateway. You can override this to run your own local LiteLLM proxy for:
  • Custom providers: AWS Bedrock, Azure OpenAI, Ollama, or 100+ other providers
  • Cost control: Use your own API keys and budgets
  • Observability: Track usage with Langfuse
  • Network isolation: Keep LLM traffic in your infrastructure
  • Offline operation: Run local models (Ollama) with no internet

How It Works

The CLI starts a local LiteLLM proxy process that intercepts all LLM requests from the worker:
  1. CLI starts proxy on 127.0.0.1:<auto-port>
  2. Worker environment gets LITELLM_API_BASE pointing to local proxy
  3. All LLM requests go through your local proxy instead of Control Plane
  4. Proxy forwards to your configured provider (AWS/Azure/Ollama/etc.)
No code changes needed - the proxy is OpenAI-compatible.

Supported Providers

LiteLLM supports 100+ providers. Most commonly used: Cloud Providers:
  • AWS Bedrock (Claude, Llama, Mistral)
  • Azure OpenAI (GPT-4, GPT-3.5)
  • GCP Vertex AI (Gemini, PaLM)
  • OpenAI, Anthropic, Cohere, Mistral AI
Local/Self-Hosted:
  • Ollama (Llama, Mistral, CodeLlama)
  • vLLM, LocalAI, LM Studio
  • Hugging Face TGI
See full list: https://docs.litellm.ai/docs/providers

Configuration Methods

You can configure the custom LLM gateway in 4 ways (in priority order):

1. CLI Flags (Highest Priority)

Override everything with command-line flags:
kubiya worker start \
  --queue-id=my-queue \
  --type=local \
  --enable-local-proxy \
  --proxy-config-file=/path/to/litellm_config.yaml

2. Environment Variables

Set once, use everywhere:
export KUBIYA_ENABLE_LOCAL_PROXY=true
export KUBIYA_PROXY_CONFIG_FILE=/path/to/litellm_config.yaml

# Worker automatically uses local proxy
kubiya worker start --queue-id=my-queue --type=local

3. Context Configuration

Persistent configuration in ~/.kubiya/config - just add litellm-proxy to your current context:
apiVersion: v1
kind: Config
current-context: default

contexts:
  - name: default
    context:
      api-url: https://control-plane.kubiya.ai
      # Add LiteLLM proxy configuration here
      litellm-proxy:
        enabled: true
        config-file: /Users/myuser/.kubiya/litellm_production.yaml

# ... rest of your config (users, organizations)
Once configured, starting any worker automatically uses this LiteLLM configuration. See Configuration File for complete details.

4. Control Plane Queue Settings

Configure via Composer UI for shared team configuration.

5. Control Plane Gateway (Default Fallback)

If nothing is configured, workers use the centralized Control Plane gateway.

Provider Configuration Examples

AWS Bedrock (Claude)

Create litellm_bedrock.yaml:
model_list:
  - model_name: bedrock-claude-3-5-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: os.environ/AWS_REGION_NAME

  - model_name: bedrock-claude-3-opus
    litellm_params:
      model: bedrock/anthropic.claude-3-opus-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  - model_name: bedrock-llama3-70b
    litellm_params:
      model: bedrock/meta.llama3-70b-instruct-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-west-2

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]
  drop_params: true

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
AWS Bedrock Authentication: Supports IAM roles, access keys, session tokens, profiles, and more. See AWS Bedrock Provider Docs for all auth methods.

Ollama (Local Models)

Run completely offline with open-source models:
Create litellm_ollama.json:
{
  "model_list": [
    {
      "model_name": "llama3",
      "litellm_params": {
        "model": "ollama/llama3",
        "api_base": "http://localhost:11434"
      }
    },
    {
      "model_name": "codellama",
      "litellm_params": {
        "model": "ollama/codellama",
        "api_base": "http://localhost:11434"
      }
    },
    {
      "model_name": "mistral",
      "litellm_params": {
        "model": "ollama/mistral",
        "api_base": "http://localhost:11434"
      }
    },
    {
      "model_name": "llama3-70b",
      "litellm_params": {
        "model": "ollama/llama3:70b",
        "api_base": "http://localhost:11434"
      }
    }
  ]
}
Ollama Benefits: Perfect for development, privacy-sensitive workloads, air-gapped environments, or cost-free experimentation. See Ollama Provider Docs.

Azure OpenAI

Create litellm_azure.yaml:
model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-15-preview"

  - model_name: gpt-4-turbo
    litellm_params:
      model: azure/gpt-4-turbo
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-15-preview"

  - model_name: gpt-35-turbo
    litellm_params:
      model: azure/gpt-35-turbo
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-15-preview"

litellm_settings:
  success_callback: ["langfuse"]
  drop_params: true

Multi-Provider (Fallback & Load Balancing)

Configure multiple providers for reliability and cost optimization:
model_list:
  # Primary: AWS Bedrock (cost-effective)
  - model_name: claude-3-opus
    litellm_params:
      model: bedrock/anthropic.claude-3-opus-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  # Fallback: Azure OpenAI
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY

  # Fast/cheap tasks: Groq
  - model_name: llama3-groq
    litellm_params:
      model: groq/llama3-70b-8192
      api_key: os.environ/GROQ_API_KEY

  # Development: Local Ollama
  - model_name: llama3-local
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434

  # Code tasks: OpenAI
  - model_name: gpt-4-code
    litellm_params:
      model: gpt-4-turbo-preview
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  success_callback: ["langfuse"]
  drop_params: true
  # Optional: Set default model
  default_team_settings:
    - team_id: default
      success_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"

LiteLLM Configuration Reference

Model List Structure

Each model in model_list requires:
model_name
string
required
User-facing model name used in requests (e.g., “gpt-4”, “claude-3-opus”)
litellm_params
object
required
Parameters passed to LiteLLM:
  • model: Provider-specific model identifier (e.g., “azure/gpt-4”, “bedrock/claude-3-opus”)
  • api_key: API key (use os.environ/VAR_NAME for environment variables)
  • api_base: API endpoint URL (for Azure, self-hosted, etc.)
  • Provider-specific params (region, version, etc.)

LiteLLM Settings

litellm_settings
object
Global settings for the proxy:
  • success_callback: List of callbacks on success (e.g., ["langfuse"])
  • failure_callback: List of callbacks on failure
  • drop_params: Drop extra params not supported by provider
  • num_retries: Number of retry attempts on failure
  • timeout: Request timeout in seconds

Environment Variables

environment_variables
object
Environment variables for the proxy process (API keys, Langfuse config, etc.)

Complete Schema Example

model_list:
  - model_name: "user-facing-name"
    litellm_params:
      model: "provider/model-id"
      api_key: "os.environ/API_KEY_VAR"
      api_base: "https://api.provider.com"
      # Provider-specific params
      api_version: "2024-01-01"
      aws_region_name: "us-east-1"
      # ... other params

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]
  drop_params: true
  num_retries: 3
  timeout: 600

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-..."
  LANGFUSE_SECRET_KEY: "sk-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
  # Add any provider credentials
  AZURE_API_KEY: "your-key"
  OPENAI_API_KEY: "your-key"

Observability with Langfuse

Track LLM usage, costs, and performance by adding Langfuse to your config:
litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
This gives you:
  • Token usage and cost per model
  • Request latency and success rates
  • Error tracking
  • Per-user analytics
View metrics at https://cloud.langfuse.com (sign up at langfuse.com)

Advanced Features

Rate Limiting

litellm_settings:
  rpm: 60  # Requests per minute
  tpm: 100000  # Tokens per minute

Caching

litellm_settings:
  cache: true
  cache_params:
    type: "redis"
    host: "localhost"
    port: 6379

Load Balancing

router_settings:
  routing_strategy: "least-busy"  # or "latency-based", "usage-based"
  num_retries: 3
  timeout: 600

Custom Metadata

litellm_settings:
  metadata:
    environment: "production"
    team: "engineering"
    project: "agent-workflows"

Troubleshooting

Proxy Won’t Start

# Check if litellm is installed in venv
~/.kubiya/workers/<queue-id>/venv/bin/litellm --version

# View proxy logs
tail -f ~/.kubiya/workers/<queue-id>/litellm_proxy.log

# Verify config syntax
cat litellm_config.yaml | python3 -c "import sys, yaml; yaml.safe_load(sys.stdin)"

Worker Not Using Local Proxy

# Check worker environment
ps aux | grep worker.py
# Look for LITELLM_API_BASE in process env

# Verify configuration priority
# CLI flags > Env vars > Context > Control Plane

# Check worker logs
tail -f ~/.kubiya/workers/<queue-id>/logs/worker.log | grep -i litellm

Authentication Errors

# Check environment variables are set
echo $AWS_ACCESS_KEY_ID
echo $AZURE_API_KEY

# Verify keys in proxy config
cat litellm_config.yaml | grep api_key

# Test provider connection
curl -v https://bedrock-runtime.us-east-1.amazonaws.com

Model Not Found

# Check model name in config
cat litellm_config.yaml | grep model_name

# Verify model ID is correct for provider
# Example: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0

# Check provider documentation
# https://docs.litellm.ai/docs/providers/<provider-name>

Performance Issues

# Monitor proxy resource usage
ps aux | grep litellm

# Check for rate limiting
tail -f ~/.kubiya/workers/<queue-id>/litellm_proxy.log | grep -i "rate limit"

# Increase timeout
# Add to litellm_settings:
#   timeout: 1200  # 20 minutes

Best Practices

Security:
  • Use environment variables for API keys (never hardcode)
  • Restrict file permissions: chmod 600 ~/.kubiya/litellm_*.yaml
  • Rotate credentials regularly
  • Use IAM roles when possible (AWS Bedrock on EC2/ECS)
Cost Optimization:
  • Use cheaper models for simple tasks
  • Enable caching to avoid duplicate requests
  • Set rate limits to prevent runaway costs
  • Monitor usage with Langfuse
Reliability:
  • Configure fallback models across providers
  • Set appropriate timeouts
  • Enable retries for transient failures
  • Monitor proxy logs
Development Workflow:
  1. Start with Ollama for local dev (free, fast)
  2. Test with staging before production
  3. Enable Langfuse early
  4. Use context configuration for environment switching

Examples by Use Case

Development Environment (Local Ollama)

Quick setup for local development:
# 1. Install and start Ollama
ollama serve

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: dev
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-json: |
          {
            "model_list": [
              {"model_name": "llama3", "litellm_params": {"model": "ollama/llama3", "api_base": "http://localhost:11434"}}
            ]
          }
EOF

# 3. Switch to dev context and start worker
kubiya config use-context dev
kubiya worker start --queue-id=dev-queue --type=local

Staging Environment (Azure OpenAI)

Create separate config file for staging:
# 1. Create staging LiteLLM config
cat > ~/.kubiya/litellm_staging.yaml <<EOF
model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4-staging
      api_base: https://staging-instance.openai.azure.com
      api_key: os.environ/AZURE_STAGING_API_KEY
      api_version: "2024-02-15-preview"
EOF

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: staging
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-file: ~/.kubiya/litellm_staging.yaml
EOF

# 3. Use it
kubiya config use-context staging
kubiya worker start --queue-id=staging-queue --type=local

Production Environment (AWS Bedrock)

Production setup with Langfuse observability:
# 1. Create production LiteLLM config
cat > ~/.kubiya/litellm_production.yaml <<EOF
model_list:
  - model_name: claude-3-5-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
EOF

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: production
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-file: ~/.kubiya/litellm_production.yaml
EOF

# 3. Set AWS credentials and start
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
kubiya config use-context production
kubiya worker start --queue-id=prod-queue --type=local --daemon