By default, workers use the Control Plane’s centralized LLM gateway. You can override this to run your own local LiteLLM proxy for:
- Custom providers: AWS Bedrock, Azure OpenAI, Ollama, or 100+ other providers
- Cost control: Use your own API keys and budgets
- Observability: Track usage with Langfuse
- Network isolation: Keep LLM traffic in your infrastructure
- Offline operation: Run local models (Ollama) with no internet
How It Works
The CLI starts a local LiteLLM proxy process that intercepts all LLM requests from the worker:
- CLI starts proxy on
127.0.0.1:<auto-port>
- Worker environment gets
LITELLM_API_BASE pointing to local proxy
- All LLM requests go through your local proxy instead of Control Plane
- Proxy forwards to your configured provider (AWS/Azure/Ollama/etc.)
No code changes needed - the proxy is OpenAI-compatible.
Supported Providers
LiteLLM supports 100+ providers. Most commonly used:
Cloud Providers:
- AWS Bedrock (Claude, Llama, Mistral)
- Azure OpenAI (GPT-4, GPT-3.5)
- GCP Vertex AI (Gemini, PaLM)
- OpenAI, Anthropic, Cohere, Mistral AI
Local/Self-Hosted:
- Ollama (Llama, Mistral, CodeLlama)
- vLLM, LocalAI, LM Studio
- Hugging Face TGI
See full list: https://docs.litellm.ai/docs/providers
Configuration Methods
You can configure the custom LLM gateway in 4 ways (in priority order):
1. CLI Flags (Highest Priority)
Override everything with command-line flags:
kubiya worker start \
--queue-id=my-queue \
--type=local \
--enable-local-proxy \
--proxy-config-file=/path/to/litellm_config.yaml
kubiya worker start \
--queue-id=my-queue \
--type=local \
--enable-local-proxy \
--proxy-config-json='{"model_list":[{"model_name":"gpt-4","litellm_params":{"model":"azure/gpt-4","api_key":"env:AZURE_API_KEY"}}]}'
2. Environment Variables
Set once, use everywhere:
export KUBIYA_ENABLE_LOCAL_PROXY=true
export KUBIYA_PROXY_CONFIG_FILE=/path/to/litellm_config.yaml
# Worker automatically uses local proxy
kubiya worker start --queue-id=my-queue --type=local
3. Context Configuration
Persistent configuration in ~/.kubiya/config - just add litellm-proxy to your current context:
apiVersion: v1
kind: Config
current-context: default
contexts:
- name: default
context:
api-url: https://control-plane.kubiya.ai
# Add LiteLLM proxy configuration here
litellm-proxy:
enabled: true
config-file: /Users/myuser/.kubiya/litellm_production.yaml
# ... rest of your config (users, organizations)
Once configured, starting any worker automatically uses this LiteLLM configuration. See Configuration File for complete details.
4. Control Plane Queue Settings
Configure via Composer UI for shared team configuration.
5. Control Plane Gateway (Default Fallback)
If nothing is configured, workers use the centralized Control Plane gateway.
Provider Configuration Examples
AWS Bedrock (Claude)
Configuration File
Start Worker
IAM Roles
Create litellm_bedrock.yaml:model_list:
- model_name: bedrock-claude-3-5-sonnet
litellm_params:
model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: os.environ/AWS_REGION_NAME
- model_name: bedrock-claude-3-opus
litellm_params:
model: bedrock/anthropic.claude-3-opus-20240229-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1
- model_name: bedrock-llama3-70b
litellm_params:
model: bedrock/meta.llama3-70b-instruct-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-west-2
litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
drop_params: true
environment_variables:
LANGFUSE_PUBLIC_KEY: "pk-lf-..."
LANGFUSE_SECRET_KEY: "sk-lf-..."
LANGFUSE_HOST: "https://cloud.langfuse.com"
# Set AWS credentials
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION_NAME="us-east-1"
# Start worker with Bedrock
kubiya worker start \
--queue-id=bedrock-queue \
--type=local \
--enable-local-proxy \
--proxy-config-file=./litellm_bedrock.yaml
For EC2/ECS with IAM roles:model_list:
- model_name: bedrock-claude-3-5-sonnet
litellm_params:
model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
# No credentials needed - uses IAM role
aws_region_name: us-east-1
For assumed roles:model_list:
- model_name: bedrock-claude-3-5-sonnet
litellm_params:
model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
aws_role_name: arn:aws:iam::123456789:role/BedrockAccessRole
aws_session_name: kubiya-worker-session
aws_region_name: us-east-1
AWS Bedrock Authentication: Supports IAM roles, access keys, session tokens, profiles, and more. See AWS Bedrock Provider Docs for all auth methods.
Ollama (Local Models)
Run completely offline with open-source models:
Configuration File
Setup Ollama
Remote Ollama
Create litellm_ollama.json:{
"model_list": [
{
"model_name": "llama3",
"litellm_params": {
"model": "ollama/llama3",
"api_base": "http://localhost:11434"
}
},
{
"model_name": "codellama",
"litellm_params": {
"model": "ollama/codellama",
"api_base": "http://localhost:11434"
}
},
{
"model_name": "mistral",
"litellm_params": {
"model": "ollama/mistral",
"api_base": "http://localhost:11434"
}
},
{
"model_name": "llama3-70b",
"litellm_params": {
"model": "ollama/llama3:70b",
"api_base": "http://localhost:11434"
}
}
]
}
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull models
ollama pull llama3
ollama pull codellama
ollama pull mistral
# Start Ollama server
ollama serve
# In another terminal, start worker
kubiya worker start \
--queue-id=ollama-queue \
--type=local \
--enable-local-proxy \
--proxy-config-file=./litellm_ollama.json
For remote Ollama server:{
"model_list": [
{
"model_name": "llama3",
"litellm_params": {
"model": "ollama/llama3",
"api_base": "http://ollama-server.company.internal:11434"
}
}
]
}
Ollama Benefits: Perfect for development, privacy-sensitive workloads, air-gapped environments, or cost-free experimentation. See Ollama Provider Docs.
Azure OpenAI
Configuration File
Start Worker
Multiple Deployments
Create litellm_azure.yaml:model_list:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_base: https://your-instance.openai.azure.com
api_key: os.environ/AZURE_API_KEY
api_version: "2024-02-15-preview"
- model_name: gpt-4-turbo
litellm_params:
model: azure/gpt-4-turbo
api_base: https://your-instance.openai.azure.com
api_key: os.environ/AZURE_API_KEY
api_version: "2024-02-15-preview"
- model_name: gpt-35-turbo
litellm_params:
model: azure/gpt-35-turbo
api_base: https://your-instance.openai.azure.com
api_key: os.environ/AZURE_API_KEY
api_version: "2024-02-15-preview"
litellm_settings:
success_callback: ["langfuse"]
drop_params: true
export AZURE_API_KEY="your-azure-key"
kubiya worker start \
--queue-id=azure-queue \
--type=local \
--enable-local-proxy \
--proxy-config-file=./litellm_azure.yaml
Use different Azure deployments:model_list:
# Production deployment
- model_name: gpt-4-prod
litellm_params:
model: azure/gpt-4-prod-deployment
api_base: https://prod-instance.openai.azure.com
api_key: os.environ/AZURE_PROD_API_KEY
api_version: "2024-02-15-preview"
# Development deployment
- model_name: gpt-4-dev
litellm_params:
model: azure/gpt-4-dev-deployment
api_base: https://dev-instance.openai.azure.com
api_key: os.environ/AZURE_DEV_API_KEY
api_version: "2024-02-15-preview"
Multi-Provider (Fallback & Load Balancing)
Configure multiple providers for reliability and cost optimization:
model_list:
# Primary: AWS Bedrock (cost-effective)
- model_name: claude-3-opus
litellm_params:
model: bedrock/anthropic.claude-3-opus-20240229-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1
# Fallback: Azure OpenAI
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_base: https://your-instance.openai.azure.com
api_key: os.environ/AZURE_API_KEY
# Fast/cheap tasks: Groq
- model_name: llama3-groq
litellm_params:
model: groq/llama3-70b-8192
api_key: os.environ/GROQ_API_KEY
# Development: Local Ollama
- model_name: llama3-local
litellm_params:
model: ollama/llama3
api_base: http://localhost:11434
# Code tasks: OpenAI
- model_name: gpt-4-code
litellm_params:
model: gpt-4-turbo-preview
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
success_callback: ["langfuse"]
drop_params: true
# Optional: Set default model
default_team_settings:
- team_id: default
success_callback: ["langfuse"]
environment_variables:
LANGFUSE_PUBLIC_KEY: "pk-lf-..."
LANGFUSE_SECRET_KEY: "sk-lf-..."
LANGFUSE_HOST: "https://cloud.langfuse.com"
LiteLLM Configuration Reference
Model List Structure
Each model in model_list requires:
User-facing model name used in requests (e.g., “gpt-4”, “claude-3-opus”)
Parameters passed to LiteLLM:
model: Provider-specific model identifier (e.g., “azure/gpt-4”, “bedrock/claude-3-opus”)
api_key: API key (use os.environ/VAR_NAME for environment variables)
api_base: API endpoint URL (for Azure, self-hosted, etc.)
- Provider-specific params (region, version, etc.)
LiteLLM Settings
Global settings for the proxy:
success_callback: List of callbacks on success (e.g., ["langfuse"])
failure_callback: List of callbacks on failure
drop_params: Drop extra params not supported by provider
num_retries: Number of retry attempts on failure
timeout: Request timeout in seconds
Environment Variables
Environment variables for the proxy process (API keys, Langfuse config, etc.)
Complete Schema Example
model_list:
- model_name: "user-facing-name"
litellm_params:
model: "provider/model-id"
api_key: "os.environ/API_KEY_VAR"
api_base: "https://api.provider.com"
# Provider-specific params
api_version: "2024-01-01"
aws_region_name: "us-east-1"
# ... other params
litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
drop_params: true
num_retries: 3
timeout: 600
environment_variables:
LANGFUSE_PUBLIC_KEY: "pk-..."
LANGFUSE_SECRET_KEY: "sk-..."
LANGFUSE_HOST: "https://cloud.langfuse.com"
# Add any provider credentials
AZURE_API_KEY: "your-key"
OPENAI_API_KEY: "your-key"
Observability with Langfuse
Track LLM usage, costs, and performance by adding Langfuse to your config:
litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
environment_variables:
LANGFUSE_PUBLIC_KEY: "pk-lf-..."
LANGFUSE_SECRET_KEY: "sk-lf-..."
LANGFUSE_HOST: "https://cloud.langfuse.com"
This gives you:
- Token usage and cost per model
- Request latency and success rates
- Error tracking
- Per-user analytics
View metrics at https://cloud.langfuse.com (sign up at langfuse.com)
Advanced Features
Rate Limiting
litellm_settings:
rpm: 60 # Requests per minute
tpm: 100000 # Tokens per minute
Caching
litellm_settings:
cache: true
cache_params:
type: "redis"
host: "localhost"
port: 6379
Load Balancing
router_settings:
routing_strategy: "least-busy" # or "latency-based", "usage-based"
num_retries: 3
timeout: 600
litellm_settings:
metadata:
environment: "production"
team: "engineering"
project: "agent-workflows"
Troubleshooting
Proxy Won’t Start
# Check if litellm is installed in venv
~/.kubiya/workers/<queue-id>/venv/bin/litellm --version
# View proxy logs
tail -f ~/.kubiya/workers/<queue-id>/litellm_proxy.log
# Verify config syntax
cat litellm_config.yaml | python3 -c "import sys, yaml; yaml.safe_load(sys.stdin)"
Worker Not Using Local Proxy
# Check worker environment
ps aux | grep worker.py
# Look for LITELLM_API_BASE in process env
# Verify configuration priority
# CLI flags > Env vars > Context > Control Plane
# Check worker logs
tail -f ~/.kubiya/workers/<queue-id>/logs/worker.log | grep -i litellm
Authentication Errors
# Check environment variables are set
echo $AWS_ACCESS_KEY_ID
echo $AZURE_API_KEY
# Verify keys in proxy config
cat litellm_config.yaml | grep api_key
# Test provider connection
curl -v https://bedrock-runtime.us-east-1.amazonaws.com
Model Not Found
# Check model name in config
cat litellm_config.yaml | grep model_name
# Verify model ID is correct for provider
# Example: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
# Check provider documentation
# https://docs.litellm.ai/docs/providers/<provider-name>
# Monitor proxy resource usage
ps aux | grep litellm
# Check for rate limiting
tail -f ~/.kubiya/workers/<queue-id>/litellm_proxy.log | grep -i "rate limit"
# Increase timeout
# Add to litellm_settings:
# timeout: 1200 # 20 minutes
Best Practices
Security:
- Use environment variables for API keys (never hardcode)
- Restrict file permissions:
chmod 600 ~/.kubiya/litellm_*.yaml
- Rotate credentials regularly
- Use IAM roles when possible (AWS Bedrock on EC2/ECS)
Cost Optimization:
- Use cheaper models for simple tasks
- Enable caching to avoid duplicate requests
- Set rate limits to prevent runaway costs
- Monitor usage with Langfuse
Reliability:
- Configure fallback models across providers
- Set appropriate timeouts
- Enable retries for transient failures
- Monitor proxy logs
Development Workflow:
- Start with Ollama for local dev (free, fast)
- Test with staging before production
- Enable Langfuse early
- Use context configuration for environment switching
Examples by Use Case
Development Environment (Local Ollama)
Quick setup for local development:
# 1. Install and start Ollama
ollama serve
# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
- name: dev
context:
api-url: https://control-plane.kubiya.ai
litellm-proxy:
enabled: true
config-json: |
{
"model_list": [
{"model_name": "llama3", "litellm_params": {"model": "ollama/llama3", "api_base": "http://localhost:11434"}}
]
}
EOF
# 3. Switch to dev context and start worker
kubiya config use-context dev
kubiya worker start --queue-id=dev-queue --type=local
Staging Environment (Azure OpenAI)
Create separate config file for staging:
# 1. Create staging LiteLLM config
cat > ~/.kubiya/litellm_staging.yaml <<EOF
model_list:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4-staging
api_base: https://staging-instance.openai.azure.com
api_key: os.environ/AZURE_STAGING_API_KEY
api_version: "2024-02-15-preview"
EOF
# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
- name: staging
context:
api-url: https://control-plane.kubiya.ai
litellm-proxy:
enabled: true
config-file: ~/.kubiya/litellm_staging.yaml
EOF
# 3. Use it
kubiya config use-context staging
kubiya worker start --queue-id=staging-queue --type=local
Production Environment (AWS Bedrock)
Production setup with Langfuse observability:
# 1. Create production LiteLLM config
cat > ~/.kubiya/litellm_production.yaml <<EOF
model_list:
- model_name: claude-3-5-sonnet
litellm_params:
model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1
litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]
environment_variables:
LANGFUSE_PUBLIC_KEY: "pk-lf-..."
LANGFUSE_SECRET_KEY: "sk-lf-..."
LANGFUSE_HOST: "https://cloud.langfuse.com"
EOF
# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
- name: production
context:
api-url: https://control-plane.kubiya.ai
litellm-proxy:
enabled: true
config-file: ~/.kubiya/litellm_production.yaml
EOF
# 3. Set AWS credentials and start
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
kubiya config use-context production
kubiya worker start --queue-id=prod-queue --type=local --daemon