> ## Documentation Index > Fetch the complete documentation index at: https://docs.kubiya.ai/llms.txt > Use this file to discover all available pages before exploring further. # Custom LLM Gateway > Route LLM requests through your own providers (AWS Bedrock, Azure, Ollama, etc.) instead of the default Control Plane gateway By default, workers use the Control Plane's centralized LLM gateway. You can override this to run your own local LiteLLM proxy for: * **Custom providers**: AWS Bedrock, Azure OpenAI, Ollama, or 100+ other providers * **Cost control**: Use your own API keys and budgets * **Observability**: Track usage with Langfuse * **Network isolation**: Keep LLM traffic in your infrastructure * **Offline operation**: Run local models (Ollama) with no internet ## How It Works The CLI starts a local LiteLLM proxy process that intercepts all LLM requests from the worker: 1. CLI starts proxy on `127.0.0.1:` 2. Worker environment gets `LITELLM_API_BASE` pointing to local proxy 3. All LLM requests go through your local proxy instead of Control Plane 4. Proxy forwards to your configured provider (AWS/Azure/Ollama/etc.) No code changes needed - the proxy is OpenAI-compatible. ## Supported Providers LiteLLM supports **100+ providers**. Most commonly used: **Cloud Providers:** * AWS Bedrock (Claude, Llama, Mistral) * Azure OpenAI (GPT-4, GPT-3.5) * GCP Vertex AI (Gemini, PaLM) * OpenAI, Anthropic, Cohere, Mistral AI **Local/Self-Hosted:** * Ollama (Llama, Mistral, CodeLlama) * vLLM, LocalAI, LM Studio * Hugging Face TGI **See full list:** [https://docs.litellm.ai/docs/providers](https://docs.litellm.ai/docs/providers) ## Configuration Methods You can configure the custom LLM gateway in **4 ways** (in priority order): ### 1. CLI Flags (Highest Priority) Override everything with command-line flags: ```bash theme={null} kubiya worker start \ --queue-id=my-queue \ --type=local \ --enable-local-proxy \ --proxy-config-file=/path/to/litellm_config.yaml ``` ```bash theme={null} kubiya worker start \ --queue-id=my-queue \ --type=local \ --enable-local-proxy \ --proxy-config-json='{"model_list":[{"model_name":"gpt-4","litellm_params":{"model":"azure/gpt-4","api_key":"env:AZURE_API_KEY"}}]}' ``` ### 2. Environment Variables Set once, use everywhere: ```bash theme={null} export KUBIYA_ENABLE_LOCAL_PROXY=true export KUBIYA_PROXY_CONFIG_FILE=/path/to/litellm_config.yaml # Worker automatically uses local proxy kubiya worker start --queue-id=my-queue --type=local ``` ### 3. Context Configuration Persistent configuration in `~/.kubiya/config` - just add `litellm-proxy` to your current context: ```yaml theme={null} apiVersion: v1 kind: Config current-context: default contexts: - name: default context: api-url: https://control-plane.kubiya.ai # Add LiteLLM proxy configuration here litellm-proxy: enabled: true config-file: /Users/myuser/.kubiya/litellm_production.yaml # ... rest of your config (users, organizations) ``` Once configured, starting any worker automatically uses this LiteLLM configuration. See [Configuration File](/cli/configuration-file) for complete details. ### 4. Control Plane Queue Settings Configure via Composer UI for shared team configuration. ### 5. Control Plane Gateway (Default Fallback) If nothing is configured, workers use the centralized Control Plane gateway. ## Provider Configuration Examples ### AWS Bedrock (Claude) **Create `litellm_bedrock.yaml`:** ```yaml theme={null} model_list: - model_name: bedrock-claude-3-5-sonnet litellm_params: model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0 aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY aws_region_name: os.environ/AWS_REGION_NAME - model_name: bedrock-claude-3-opus litellm_params: model: bedrock/anthropic.claude-3-opus-20240229-v1:0 aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY aws_region_name: us-east-1 - model_name: bedrock-llama3-70b litellm_params: model: bedrock/meta.llama3-70b-instruct-v1:0 aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY aws_region_name: us-west-2 litellm_settings: success_callback: ["langfuse"] failure_callback: ["langfuse"] drop_params: true environment_variables: LANGFUSE_PUBLIC_KEY: "pk-lf-..." LANGFUSE_SECRET_KEY: "sk-lf-..." LANGFUSE_HOST: "https://cloud.langfuse.com" ``` ```bash theme={null} # Set AWS credentials export AWS_ACCESS_KEY_ID="your-access-key" export AWS_SECRET_ACCESS_KEY="your-secret-key" export AWS_REGION_NAME="us-east-1" # Start worker with Bedrock kubiya worker start \ --queue-id=bedrock-queue \ --type=local \ --enable-local-proxy \ --proxy-config-file=./litellm_bedrock.yaml ``` **For EC2/ECS with IAM roles:** ```yaml theme={null} model_list: - model_name: bedrock-claude-3-5-sonnet litellm_params: model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0 # No credentials needed - uses IAM role aws_region_name: us-east-1 ``` **For assumed roles:** ```yaml theme={null} model_list: - model_name: bedrock-claude-3-5-sonnet litellm_params: model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0 aws_role_name: arn:aws:iam::123456789:role/BedrockAccessRole aws_session_name: kubiya-worker-session aws_region_name: us-east-1 ``` **AWS Bedrock Authentication**: Supports IAM roles, access keys, session tokens, profiles, and more. See [AWS Bedrock Provider Docs](https://docs.litellm.ai/docs/providers/bedrock) for all auth methods. ### Ollama (Local Models) Run completely offline with open-source models: **Create `litellm_ollama.json`:** ```json theme={null} { "model_list": [ { "model_name": "llama3", "litellm_params": { "model": "ollama/llama3", "api_base": "http://localhost:11434" } }, { "model_name": "codellama", "litellm_params": { "model": "ollama/codellama", "api_base": "http://localhost:11434" } }, { "model_name": "mistral", "litellm_params": { "model": "ollama/mistral", "api_base": "http://localhost:11434" } }, { "model_name": "llama3-70b", "litellm_params": { "model": "ollama/llama3:70b", "api_base": "http://localhost:11434" } } ] } ``` ```bash theme={null} # Install Ollama curl -fsSL https://ollama.com/install.sh | sh # Pull models ollama pull llama3 ollama pull codellama ollama pull mistral # Start Ollama server ollama serve # In another terminal, start worker kubiya worker start \ --queue-id=ollama-queue \ --type=local \ --enable-local-proxy \ --proxy-config-file=./litellm_ollama.json ``` **For remote Ollama server:** ```json theme={null} { "model_list": [ { "model_name": "llama3", "litellm_params": { "model": "ollama/llama3", "api_base": "http://ollama-server.company.internal:11434" } } ] } ``` **Ollama Benefits**: Perfect for development, privacy-sensitive workloads, air-gapped environments, or cost-free experimentation. See [Ollama Provider Docs](https://docs.litellm.ai/docs/providers/ollama). ### Azure OpenAI **Create `litellm_azure.yaml`:** ```yaml theme={null} model_list: - model_name: gpt-4 litellm_params: model: azure/gpt-4 api_base: https://your-instance.openai.azure.com api_key: os.environ/AZURE_API_KEY api_version: "2024-02-15-preview" - model_name: gpt-4-turbo litellm_params: model: azure/gpt-4-turbo api_base: https://your-instance.openai.azure.com api_key: os.environ/AZURE_API_KEY api_version: "2024-02-15-preview" - model_name: gpt-35-turbo litellm_params: model: azure/gpt-35-turbo api_base: https://your-instance.openai.azure.com api_key: os.environ/AZURE_API_KEY api_version: "2024-02-15-preview" litellm_settings: success_callback: ["langfuse"] drop_params: true ``` ```bash theme={null} export AZURE_API_KEY="your-azure-key" kubiya worker start \ --queue-id=azure-queue \ --type=local \ --enable-local-proxy \ --proxy-config-file=./litellm_azure.yaml ``` **Use different Azure deployments:** ```yaml theme={null} model_list: # Production deployment - model_name: gpt-4-prod litellm_params: model: azure/gpt-4-prod-deployment api_base: https://prod-instance.openai.azure.com api_key: os.environ/AZURE_PROD_API_KEY api_version: "2024-02-15-preview" # Development deployment - model_name: gpt-4-dev litellm_params: model: azure/gpt-4-dev-deployment api_base: https://dev-instance.openai.azure.com api_key: os.environ/AZURE_DEV_API_KEY api_version: "2024-02-15-preview" ``` ### Multi-Provider (Fallback & Load Balancing) Configure multiple providers for reliability and cost optimization: ```yaml theme={null} model_list: # Primary: AWS Bedrock (cost-effective) - model_name: claude-3-opus litellm_params: model: bedrock/anthropic.claude-3-opus-20240229-v1:0 aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY aws_region_name: us-east-1 # Fallback: Azure OpenAI - model_name: gpt-4 litellm_params: model: azure/gpt-4 api_base: https://your-instance.openai.azure.com api_key: os.environ/AZURE_API_KEY # Fast/cheap tasks: Groq - model_name: llama3-groq litellm_params: model: groq/llama3-70b-8192 api_key: os.environ/GROQ_API_KEY # Development: Local Ollama - model_name: llama3-local litellm_params: model: ollama/llama3 api_base: http://localhost:11434 # Code tasks: OpenAI - model_name: gpt-4-code litellm_params: model: gpt-4-turbo-preview api_key: os.environ/OPENAI_API_KEY litellm_settings: success_callback: ["langfuse"] drop_params: true # Optional: Set default model default_team_settings: - team_id: default success_callback: ["langfuse"] environment_variables: LANGFUSE_PUBLIC_KEY: "pk-lf-..." LANGFUSE_SECRET_KEY: "sk-lf-..." LANGFUSE_HOST: "https://cloud.langfuse.com" ``` ## LiteLLM Configuration Reference ### Model List Structure Each model in `model_list` requires: User-facing model name used in requests (e.g., "gpt-4", "claude-3-opus") Parameters passed to LiteLLM: * `model`: Provider-specific model identifier (e.g., "azure/gpt-4", "bedrock/claude-3-opus") * `api_key`: API key (use `os.environ/VAR_NAME` for environment variables) * `api_base`: API endpoint URL (for Azure, self-hosted, etc.) * Provider-specific params (region, version, etc.) ### LiteLLM Settings Global settings for the proxy: * `success_callback`: List of callbacks on success (e.g., `["langfuse"]`) * `failure_callback`: List of callbacks on failure * `drop_params`: Drop extra params not supported by provider * `num_retries`: Number of retry attempts on failure * `timeout`: Request timeout in seconds ### Environment Variables Environment variables for the proxy process (API keys, Langfuse config, etc.) ### Complete Schema Example ```yaml theme={null} model_list: - model_name: "user-facing-name" litellm_params: model: "provider/model-id" api_key: "os.environ/API_KEY_VAR" api_base: "https://api.provider.com" # Provider-specific params api_version: "2024-01-01" aws_region_name: "us-east-1" # ... other params litellm_settings: success_callback: ["langfuse"] failure_callback: ["langfuse"] drop_params: true num_retries: 3 timeout: 600 environment_variables: LANGFUSE_PUBLIC_KEY: "pk-..." LANGFUSE_SECRET_KEY: "sk-..." LANGFUSE_HOST: "https://cloud.langfuse.com" # Add any provider credentials AZURE_API_KEY: "your-key" OPENAI_API_KEY: "your-key" ``` ## Observability with Langfuse Track LLM usage, costs, and performance by adding Langfuse to your config: ```yaml theme={null} litellm_settings: success_callback: ["langfuse"] failure_callback: ["langfuse"] environment_variables: LANGFUSE_PUBLIC_KEY: "pk-lf-..." LANGFUSE_SECRET_KEY: "sk-lf-..." LANGFUSE_HOST: "https://cloud.langfuse.com" ``` This gives you: * Token usage and cost per model * Request latency and success rates * Error tracking * Per-user analytics View metrics at [https://cloud.langfuse.com](https://cloud.langfuse.com) (sign up at [langfuse.com](https://langfuse.com)) ## Advanced Features ### Rate Limiting ```yaml theme={null} litellm_settings: rpm: 60 # Requests per minute tpm: 100000 # Tokens per minute ``` ### Caching ```yaml theme={null} litellm_settings: cache: true cache_params: type: "redis" host: "localhost" port: 6379 ``` ### Load Balancing ```yaml theme={null} router_settings: routing_strategy: "least-busy" # or "latency-based", "usage-based" num_retries: 3 timeout: 600 ``` ### Custom Metadata ```yaml theme={null} litellm_settings: metadata: environment: "production" team: "engineering" project: "agent-workflows" ``` ## Troubleshooting ### Proxy Won't Start ```bash theme={null} # Check if litellm is installed in venv ~/.kubiya/workers//venv/bin/litellm --version # View proxy logs tail -f ~/.kubiya/workers//litellm_proxy.log # Verify config syntax cat litellm_config.yaml | python3 -c "import sys, yaml; yaml.safe_load(sys.stdin)" ``` ### Worker Not Using Local Proxy ```bash theme={null} # Check worker environment ps aux | grep worker.py # Look for LITELLM_API_BASE in process env # Verify configuration priority # CLI flags > Env vars > Context > Control Plane # Check worker logs tail -f ~/.kubiya/workers//logs/worker.log | grep -i litellm ``` ### Authentication Errors ```bash theme={null} # Check environment variables are set echo $AWS_ACCESS_KEY_ID echo $AZURE_API_KEY # Verify keys in proxy config cat litellm_config.yaml | grep api_key # Test provider connection curl -v https://bedrock-runtime.us-east-1.amazonaws.com ``` ### Model Not Found ```bash theme={null} # Check model name in config cat litellm_config.yaml | grep model_name # Verify model ID is correct for provider # Example: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0 # Check provider documentation # https://docs.litellm.ai/docs/providers/ ``` ### Performance Issues ```bash theme={null} # Monitor proxy resource usage ps aux | grep litellm # Check for rate limiting tail -f ~/.kubiya/workers//litellm_proxy.log | grep -i "rate limit" # Increase timeout # Add to litellm_settings: # timeout: 1200 # 20 minutes ``` ## Best Practices **Security:** * Use environment variables for API keys (never hardcode) * Restrict file permissions: `chmod 600 ~/.kubiya/litellm_*.yaml` * Rotate credentials regularly * Use IAM roles when possible (AWS Bedrock on EC2/ECS) **Cost Optimization:** * Use cheaper models for simple tasks * Enable caching to avoid duplicate requests * Set rate limits to prevent runaway costs * Monitor usage with Langfuse **Reliability:** * Configure fallback models across providers * Set appropriate timeouts * Enable retries for transient failures * Monitor proxy logs **Development Workflow:** 1. Start with Ollama for local dev (free, fast) 2. Test with staging before production 3. Enable Langfuse early 4. Use context configuration for environment switching ## Examples by Use Case ### Development Environment (Local Ollama) **Quick setup for local development:** ```bash theme={null} # 1. Install and start Ollama ollama serve # 2. Add to ~/.kubiya/config cat >> ~/.kubiya/config < ~/.kubiya/litellm_staging.yaml <> ~/.kubiya/config < ~/.kubiya/litellm_production.yaml <> ~/.kubiya/config <