Custom LLM Gateway

By default, workers use the Control Plane’s centralized LLM gateway. You can override this to run your own local LiteLLM proxy for:

Custom providers: AWS Bedrock, Azure OpenAI, Ollama, or 100+ other providers
Cost control: Use your own API keys and budgets
Observability: Track usage with Langfuse
Network isolation: Keep LLM traffic in your infrastructure
Offline operation: Run local models (Ollama) with no internet

How It Works

The CLI starts a local LiteLLM proxy process that intercepts all LLM requests from the worker:

CLI starts proxy on 127.0.0.1:<auto-port>
Worker environment gets LITELLM_API_BASE pointing to local proxy
All LLM requests go through your local proxy instead of Control Plane
Proxy forwards to your configured provider (AWS/Azure/Ollama/etc.)

No code changes needed - the proxy is OpenAI-compatible.

Supported Providers

LiteLLM supports 100+ providers. Most commonly used: Cloud Providers:

AWS Bedrock (Claude, Llama, Mistral)
Azure OpenAI (GPT-4, GPT-3.5)
GCP Vertex AI (Gemini, PaLM)
OpenAI, Anthropic, Cohere, Mistral AI

Local/Self-Hosted:

Ollama (Llama, Mistral, CodeLlama)
vLLM, LocalAI, LM Studio
Hugging Face TGI

See full list: https://docs.litellm.ai/docs/providers

Configuration Methods

You can configure the custom LLM gateway in 4 ways (in priority order):

1. CLI Flags (Highest Priority)

Override everything with command-line flags:

Config File
Inline JSON

kubiya worker start \
  --queue-id=my-queue \
  --type=local \
  --enable-local-proxy \
  --proxy-config-file=/path/to/litellm_config.yaml

kubiya worker start \
  --queue-id=my-queue \
  --type=local \
  --enable-local-proxy \
  --proxy-config-json='{"model_list":[{"model_name":"gpt-4","litellm_params":{"model":"azure/gpt-4","api_key":"env:AZURE_API_KEY"}}]}'

2. Environment Variables

Set once, use everywhere:

export KUBIYA_ENABLE_LOCAL_PROXY=true
export KUBIYA_PROXY_CONFIG_FILE=/path/to/litellm_config.yaml

# Worker automatically uses local proxy
kubiya worker start --queue-id=my-queue --type=local

3. Context Configuration

Persistent configuration in ~/.kubiya/config - just add litellm-proxy to your current context:

apiVersion: v1
kind: Config
current-context: default

contexts:
  - name: default
    context:
      api-url: https://control-plane.kubiya.ai
      # Add LiteLLM proxy configuration here
      litellm-proxy:
        enabled: true
        config-file: /Users/myuser/.kubiya/litellm_production.yaml

# ... rest of your config (users, organizations)

Once configured, starting any worker automatically uses this LiteLLM configuration. See Configuration File for complete details.

4. Control Plane Queue Settings

Configure via Composer UI for shared team configuration.

5. Control Plane Gateway (Default Fallback)

If nothing is configured, workers use the centralized Control Plane gateway.

Provider Configuration Examples

AWS Bedrock (Claude)

Configuration File
Start Worker
IAM Roles

Create litellm_bedrock.yaml:

model_list:
  - model_name: bedrock-claude-3-5-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: os.environ/AWS_REGION_NAME

  - model_name: bedrock-claude-3-opus
    litellm_params:
      model: bedrock/anthropic.claude-3-opus-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  - model_name: bedrock-llama3-70b
    litellm_params:
      model: bedrock/meta.llama3-70b-instruct-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-west-2

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]
  drop_params: true

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"

# Set AWS credentials
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION_NAME="us-east-1"

# Start worker with Bedrock
kubiya worker start \
  --queue-id=bedrock-queue \
  --type=local \
  --enable-local-proxy \
  --proxy-config-file=./litellm_bedrock.yaml

For EC2/ECS with IAM roles:

model_list:
  - model_name: bedrock-claude-3-5-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      # No credentials needed - uses IAM role
      aws_region_name: us-east-1

For assumed roles:

model_list:
  - model_name: bedrock-claude-3-5-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_role_name: arn:aws:iam::123456789:role/BedrockAccessRole
      aws_session_name: kubiya-worker-session
      aws_region_name: us-east-1

AWS Bedrock Authentication: Supports IAM roles, access keys, session tokens, profiles, and more. See AWS Bedrock Provider Docs for all auth methods.

Ollama (Local Models)

Run completely offline with open-source models:

Configuration File
Setup Ollama
Remote Ollama

Create litellm_ollama.json:

{
  "model_list": [
    {
      "model_name": "llama3",
      "litellm_params": {
        "model": "ollama/llama3",
        "api_base": "http://localhost:11434"
      }
    },
    {
      "model_name": "codellama",
      "litellm_params": {
        "model": "ollama/codellama",
        "api_base": "http://localhost:11434"
      }
    },
    {
      "model_name": "mistral",
      "litellm_params": {
        "model": "ollama/mistral",
        "api_base": "http://localhost:11434"
      }
    },
    {
      "model_name": "llama3-70b",
      "litellm_params": {
        "model": "ollama/llama3:70b",
        "api_base": "http://localhost:11434"
      }
    }
  ]
}

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull models
ollama pull llama3
ollama pull codellama
ollama pull mistral

# Start Ollama server
ollama serve

# In another terminal, start worker
kubiya worker start \
  --queue-id=ollama-queue \
  --type=local \
  --enable-local-proxy \
  --proxy-config-file=./litellm_ollama.json

For remote Ollama server:

{
  "model_list": [
    {
      "model_name": "llama3",
      "litellm_params": {
        "model": "ollama/llama3",
        "api_base": "http://ollama-server.company.internal:11434"
      }
    }
  ]
}

Ollama Benefits: Perfect for development, privacy-sensitive workloads, air-gapped environments, or cost-free experimentation. See Ollama Provider Docs.

Azure OpenAI

Configuration File
Start Worker
Multiple Deployments

Create litellm_azure.yaml:

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-15-preview"

  - model_name: gpt-4-turbo
    litellm_params:
      model: azure/gpt-4-turbo
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-15-preview"

  - model_name: gpt-35-turbo
    litellm_params:
      model: azure/gpt-35-turbo
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-02-15-preview"

litellm_settings:
  success_callback: ["langfuse"]
  drop_params: true

export AZURE_API_KEY="your-azure-key"

kubiya worker start \
  --queue-id=azure-queue \
  --type=local \
  --enable-local-proxy \
  --proxy-config-file=./litellm_azure.yaml

Use different Azure deployments:

model_list:
  # Production deployment
  - model_name: gpt-4-prod
    litellm_params:
      model: azure/gpt-4-prod-deployment
      api_base: https://prod-instance.openai.azure.com
      api_key: os.environ/AZURE_PROD_API_KEY
      api_version: "2024-02-15-preview"

  # Development deployment
  - model_name: gpt-4-dev
    litellm_params:
      model: azure/gpt-4-dev-deployment
      api_base: https://dev-instance.openai.azure.com
      api_key: os.environ/AZURE_DEV_API_KEY
      api_version: "2024-02-15-preview"

Multi-Provider (Fallback & Load Balancing)

Configure multiple providers for reliability and cost optimization:

model_list:
  # Primary: AWS Bedrock (cost-effective)
  - model_name: claude-3-opus
    litellm_params:
      model: bedrock/anthropic.claude-3-opus-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  # Fallback: Azure OpenAI
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY

  # Fast/cheap tasks: Groq
  - model_name: llama3-groq
    litellm_params:
      model: groq/llama3-70b-8192
      api_key: os.environ/GROQ_API_KEY

  # Development: Local Ollama
  - model_name: llama3-local
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434

  # Code tasks: OpenAI
  - model_name: gpt-4-code
    litellm_params:
      model: gpt-4-turbo-preview
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  success_callback: ["langfuse"]
  drop_params: true
  # Optional: Set default model
  default_team_settings:
    - team_id: default
      success_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"

LiteLLM Configuration Reference

Model List Structure

Each model in model_list requires:

model_name

string

required

User-facing model name used in requests (e.g., “gpt-4”, “claude-3-opus”)

litellm_params

object

required

Parameters passed to LiteLLM:

model: Provider-specific model identifier (e.g., “azure/gpt-4”, “bedrock/claude-3-opus”)
api_key: API key (use os.environ/VAR_NAME for environment variables)
api_base: API endpoint URL (for Azure, self-hosted, etc.)
Provider-specific params (region, version, etc.)

LiteLLM Settings

litellm_settings

object

Global settings for the proxy:

success_callback: List of callbacks on success (e.g., ["langfuse"])
failure_callback: List of callbacks on failure
drop_params: Drop extra params not supported by provider
num_retries: Number of retry attempts on failure
timeout: Request timeout in seconds

Environment Variables

environment_variables

object

Environment variables for the proxy process (API keys, Langfuse config, etc.)

Complete Schema Example

model_list:
  - model_name: "user-facing-name"
    litellm_params:
      model: "provider/model-id"
      api_key: "os.environ/API_KEY_VAR"
      api_base: "https://api.provider.com"
      # Provider-specific params
      api_version: "2024-01-01"
      aws_region_name: "us-east-1"
      # ... other params

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]
  drop_params: true
  num_retries: 3
  timeout: 600

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-..."
  LANGFUSE_SECRET_KEY: "sk-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
  # Add any provider credentials
  AZURE_API_KEY: "your-key"
  OPENAI_API_KEY: "your-key"

Observability with Langfuse

Track LLM usage, costs, and performance by adding Langfuse to your config:

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"

This gives you:

Token usage and cost per model
Request latency and success rates
Error tracking
Per-user analytics

View metrics at https://cloud.langfuse.com (sign up at langfuse.com)

Advanced Features

Rate Limiting

litellm_settings:
  rpm: 60  # Requests per minute
  tpm: 100000  # Tokens per minute

Caching

litellm_settings:
  cache: true
  cache_params:
    type: "redis"
    host: "localhost"
    port: 6379

Load Balancing

router_settings:
  routing_strategy: "least-busy"  # or "latency-based", "usage-based"
  num_retries: 3
  timeout: 600

Custom Metadata

litellm_settings:
  metadata:
    environment: "production"
    team: "engineering"
    project: "agent-workflows"

Troubleshooting

Proxy Won’t Start

# Check if litellm is installed in venv
~/.kubiya/workers/<queue-id>/venv/bin/litellm --version

# View proxy logs
tail -f ~/.kubiya/workers/<queue-id>/litellm_proxy.log

# Verify config syntax
cat litellm_config.yaml | python3 -c "import sys, yaml; yaml.safe_load(sys.stdin)"

Worker Not Using Local Proxy

# Check worker environment
ps aux | grep worker.py
# Look for LITELLM_API_BASE in process env

# Verify configuration priority
# CLI flags > Env vars > Context > Control Plane

# Check worker logs
tail -f ~/.kubiya/workers/<queue-id>/logs/worker.log | grep -i litellm

Authentication Errors

# Check environment variables are set
echo $AWS_ACCESS_KEY_ID
echo $AZURE_API_KEY

# Verify keys in proxy config
cat litellm_config.yaml | grep api_key

# Test provider connection
curl -v https://bedrock-runtime.us-east-1.amazonaws.com

Model Not Found

# Check model name in config
cat litellm_config.yaml | grep model_name

# Verify model ID is correct for provider
# Example: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0

# Check provider documentation
# https://docs.litellm.ai/docs/providers/<provider-name>

Performance Issues

# Monitor proxy resource usage
ps aux | grep litellm

# Check for rate limiting
tail -f ~/.kubiya/workers/<queue-id>/litellm_proxy.log | grep -i "rate limit"

# Increase timeout
# Add to litellm_settings:
#   timeout: 1200  # 20 minutes

Best Practices

Security:

Use environment variables for API keys (never hardcode)
Restrict file permissions: chmod 600 ~/.kubiya/litellm_*.yaml
Rotate credentials regularly
Use IAM roles when possible (AWS Bedrock on EC2/ECS)

Cost Optimization:

Use cheaper models for simple tasks
Enable caching to avoid duplicate requests
Set rate limits to prevent runaway costs
Monitor usage with Langfuse

Reliability:

Configure fallback models across providers
Set appropriate timeouts
Enable retries for transient failures
Monitor proxy logs

Development Workflow:

Start with Ollama for local dev (free, fast)
Test with staging before production
Enable Langfuse early
Use context configuration for environment switching

Examples by Use Case

Development Environment (Local Ollama)

Quick setup for local development:

# 1. Install and start Ollama
ollama serve

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: dev
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-json: |
          {
            "model_list": [
              {"model_name": "llama3", "litellm_params": {"model": "ollama/llama3", "api_base": "http://localhost:11434"}}
            ]
          }
EOF

# 3. Switch to dev context and start worker
kubiya config use-context dev
kubiya worker start --queue-id=dev-queue --type=local

Staging Environment (Azure OpenAI)

Create separate config file for staging:

# 1. Create staging LiteLLM config
cat > ~/.kubiya/litellm_staging.yaml <<EOF
model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4-staging
      api_base: https://staging-instance.openai.azure.com
      api_key: os.environ/AZURE_STAGING_API_KEY
      api_version: "2024-02-15-preview"
EOF

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: staging
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-file: ~/.kubiya/litellm_staging.yaml
EOF

# 3. Use it
kubiya config use-context staging
kubiya worker start --queue-id=staging-queue --type=local

Production Environment (AWS Bedrock)

Production setup with Langfuse observability:

# 1. Create production LiteLLM config
cat > ~/.kubiya/litellm_production.yaml <<EOF
model_list:
  - model_name: claude-3-5-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
EOF

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: production
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-file: ~/.kubiya/litellm_production.yaml
EOF

# 3. Set AWS credentials and start
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
kubiya config use-context production
kubiya worker start --queue-id=prod-queue --type=local --daemon

Configuration File - Persistent LLM gateway config
Environment Variables - LiteLLM proxy env vars
Worker Management - Deploy workers
LiteLLM Providers - All 100+ providers
AWS Bedrock - Bedrock setup
Ollama - Local models

Getting Started

On-Demand Execution

Control Plane

Control Plane Resources

Context Graph

Cognitive Memory

Advanced

​How It Works

​Supported Providers

​Configuration Methods

​1. CLI Flags (Highest Priority)

​2. Environment Variables

​3. Context Configuration

​4. Control Plane Queue Settings

​5. Control Plane Gateway (Default Fallback)

​Provider Configuration Examples

​AWS Bedrock (Claude)

​Ollama (Local Models)

​Azure OpenAI

​Multi-Provider (Fallback & Load Balancing)

​LiteLLM Configuration Reference

​Model List Structure

​LiteLLM Settings

​Environment Variables

​Complete Schema Example

​Observability with Langfuse

​Advanced Features

​Rate Limiting

​Caching

​Load Balancing

​Custom Metadata

​Troubleshooting

​Proxy Won’t Start

​Worker Not Using Local Proxy

​Authentication Errors

​Model Not Found

​Performance Issues

​Best Practices

​Examples by Use Case

​Development Environment (Local Ollama)

​Staging Environment (Azure OpenAI)

​Production Environment (AWS Bedrock)

​Related Documentation

How It Works

Supported Providers

Configuration Methods

1. CLI Flags (Highest Priority)

2. Environment Variables

3. Context Configuration

4. Control Plane Queue Settings

5. Control Plane Gateway (Default Fallback)

Provider Configuration Examples

AWS Bedrock (Claude)

Ollama (Local Models)

Azure OpenAI

Multi-Provider (Fallback & Load Balancing)

LiteLLM Configuration Reference

Model List Structure

LiteLLM Settings

Environment Variables

Complete Schema Example

Observability with Langfuse

Advanced Features

Rate Limiting

Caching

Load Balancing

Custom Metadata

Troubleshooting

Proxy Won’t Start

Worker Not Using Local Proxy

Authentication Errors

Model Not Found

Performance Issues

Best Practices

Examples by Use Case

Development Environment (Local Ollama)

Staging Environment (Azure OpenAI)

Production Environment (AWS Bedrock)

Related Documentation