> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kubiya.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom LLM Gateway

> Route LLM requests through your own providers (AWS Bedrock, Azure, Ollama, etc.) instead of the default Control Plane gateway

By default, workers use the Control Plane's centralized LLM gateway. You can override this to run your own local LiteLLM proxy for:

* **Custom providers**: AWS Bedrock, Azure OpenAI, Ollama, or 100+ other providers
* **Cost control**: Use your own API keys and budgets
* **Observability**: Track usage with Langfuse
* **Network isolation**: Keep LLM traffic in your infrastructure
* **Offline operation**: Run local models (Ollama) with no internet

## How It Works

The CLI starts a local LiteLLM proxy process that intercepts all LLM requests from the worker:

1. CLI starts proxy on `127.0.0.1:<auto-port>`
2. Worker environment gets `LITELLM_API_BASE` pointing to local proxy
3. All LLM requests go through your local proxy instead of Control Plane
4. Proxy forwards to your configured provider (AWS/Azure/Ollama/etc.)

No code changes needed - the proxy is OpenAI-compatible.

## Supported Providers

LiteLLM supports **100+ providers**. Most commonly used:

**Cloud Providers:**

* AWS Bedrock (Claude, Llama, Mistral)
* Azure OpenAI (GPT-4, GPT-3.5)
* GCP Vertex AI (Gemini, PaLM)
* OpenAI, Anthropic, Cohere, Mistral AI

**Local/Self-Hosted:**

* Ollama (Llama, Mistral, CodeLlama)
* vLLM, LocalAI, LM Studio
* Hugging Face TGI

**See full list:** [https://docs.litellm.ai/docs/providers](https://docs.litellm.ai/docs/providers)

## Configuration Methods

You can configure the custom LLM gateway in **4 ways** (in priority order):

### 1. CLI Flags (Highest Priority)

Override everything with command-line flags:

<Tabs>
  <Tab title="Config File">
    ```bash theme={null}
    kubiya worker start \
      --queue-id=my-queue \
      --type=local \
      --enable-local-proxy \
      --proxy-config-file=/path/to/litellm_config.yaml
    ```
  </Tab>

  <Tab title="Inline JSON">
    ```bash theme={null}
    kubiya worker start \
      --queue-id=my-queue \
      --type=local \
      --enable-local-proxy \
      --proxy-config-json='{"model_list":[{"model_name":"gpt-4","litellm_params":{"model":"azure/gpt-4","api_key":"env:AZURE_API_KEY"}}]}'
    ```
  </Tab>
</Tabs>

### 2. Environment Variables

Set once, use everywhere:

```bash theme={null}
export KUBIYA_ENABLE_LOCAL_PROXY=true
export KUBIYA_PROXY_CONFIG_FILE=/path/to/litellm_config.yaml

# Worker automatically uses local proxy
kubiya worker start --queue-id=my-queue --type=local
```

### 3. Context Configuration

Persistent configuration in `~/.kubiya/config` - just add `litellm-proxy` to your current context:

```yaml theme={null}
apiVersion: v1
kind: Config
current-context: default

contexts:
  - name: default
    context:
      api-url: https://control-plane.kubiya.ai
      # Add LiteLLM proxy configuration here
      litellm-proxy:
        enabled: true
        config-file: /Users/myuser/.kubiya/litellm_production.yaml

# ... rest of your config (users, organizations)
```

Once configured, starting any worker automatically uses this LiteLLM configuration. See [Configuration File](/cli/configuration-file) for complete details.

### 4. Control Plane Queue Settings

Configure via Composer UI for shared team configuration.

### 5. Control Plane Gateway (Default Fallback)

If nothing is configured, workers use the centralized Control Plane gateway.

## Provider Configuration Examples

### AWS Bedrock (Claude)

<Tabs>
  <Tab title="Configuration File">
    **Create `litellm_bedrock.yaml`:**

    ```yaml theme={null}
    model_list:
      - model_name: bedrock-claude-3-5-sonnet
        litellm_params:
          model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
          aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
          aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
          aws_region_name: os.environ/AWS_REGION_NAME

      - model_name: bedrock-claude-3-opus
        litellm_params:
          model: bedrock/anthropic.claude-3-opus-20240229-v1:0
          aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
          aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
          aws_region_name: us-east-1

      - model_name: bedrock-llama3-70b
        litellm_params:
          model: bedrock/meta.llama3-70b-instruct-v1:0
          aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
          aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
          aws_region_name: us-west-2

    litellm_settings:
      success_callback: ["langfuse"]
      failure_callback: ["langfuse"]
      drop_params: true

    environment_variables:
      LANGFUSE_PUBLIC_KEY: "pk-lf-..."
      LANGFUSE_SECRET_KEY: "sk-lf-..."
      LANGFUSE_HOST: "https://cloud.langfuse.com"
    ```
  </Tab>

  <Tab title="Start Worker">
    ```bash theme={null}
    # Set AWS credentials
    export AWS_ACCESS_KEY_ID="your-access-key"
    export AWS_SECRET_ACCESS_KEY="your-secret-key"
    export AWS_REGION_NAME="us-east-1"

    # Start worker with Bedrock
    kubiya worker start \
      --queue-id=bedrock-queue \
      --type=local \
      --enable-local-proxy \
      --proxy-config-file=./litellm_bedrock.yaml
    ```
  </Tab>

  <Tab title="IAM Roles">
    **For EC2/ECS with IAM roles:**

    ```yaml theme={null}
    model_list:
      - model_name: bedrock-claude-3-5-sonnet
        litellm_params:
          model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
          # No credentials needed - uses IAM role
          aws_region_name: us-east-1
    ```

    **For assumed roles:**

    ```yaml theme={null}
    model_list:
      - model_name: bedrock-claude-3-5-sonnet
        litellm_params:
          model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
          aws_role_name: arn:aws:iam::123456789:role/BedrockAccessRole
          aws_session_name: kubiya-worker-session
          aws_region_name: us-east-1
    ```
  </Tab>
</Tabs>

<Info>
  **AWS Bedrock Authentication**: Supports IAM roles, access keys, session tokens, profiles, and more. See [AWS Bedrock Provider Docs](https://docs.litellm.ai/docs/providers/bedrock) for all auth methods.
</Info>

### Ollama (Local Models)

Run completely offline with open-source models:

<Tabs>
  <Tab title="Configuration File">
    **Create `litellm_ollama.json`:**

    ```json theme={null}
    {
      "model_list": [
        {
          "model_name": "llama3",
          "litellm_params": {
            "model": "ollama/llama3",
            "api_base": "http://localhost:11434"
          }
        },
        {
          "model_name": "codellama",
          "litellm_params": {
            "model": "ollama/codellama",
            "api_base": "http://localhost:11434"
          }
        },
        {
          "model_name": "mistral",
          "litellm_params": {
            "model": "ollama/mistral",
            "api_base": "http://localhost:11434"
          }
        },
        {
          "model_name": "llama3-70b",
          "litellm_params": {
            "model": "ollama/llama3:70b",
            "api_base": "http://localhost:11434"
          }
        }
      ]
    }
    ```
  </Tab>

  <Tab title="Setup Ollama">
    ```bash theme={null}
    # Install Ollama
    curl -fsSL https://ollama.com/install.sh | sh

    # Pull models
    ollama pull llama3
    ollama pull codellama
    ollama pull mistral

    # Start Ollama server
    ollama serve

    # In another terminal, start worker
    kubiya worker start \
      --queue-id=ollama-queue \
      --type=local \
      --enable-local-proxy \
      --proxy-config-file=./litellm_ollama.json
    ```
  </Tab>

  <Tab title="Remote Ollama">
    **For remote Ollama server:**

    ```json theme={null}
    {
      "model_list": [
        {
          "model_name": "llama3",
          "litellm_params": {
            "model": "ollama/llama3",
            "api_base": "http://ollama-server.company.internal:11434"
          }
        }
      ]
    }
    ```
  </Tab>
</Tabs>

<Info>
  **Ollama Benefits**: Perfect for development, privacy-sensitive workloads, air-gapped environments, or cost-free experimentation. See [Ollama Provider Docs](https://docs.litellm.ai/docs/providers/ollama).
</Info>

### Azure OpenAI

<Tabs>
  <Tab title="Configuration File">
    **Create `litellm_azure.yaml`:**

    ```yaml theme={null}
    model_list:
      - model_name: gpt-4
        litellm_params:
          model: azure/gpt-4
          api_base: https://your-instance.openai.azure.com
          api_key: os.environ/AZURE_API_KEY
          api_version: "2024-02-15-preview"

      - model_name: gpt-4-turbo
        litellm_params:
          model: azure/gpt-4-turbo
          api_base: https://your-instance.openai.azure.com
          api_key: os.environ/AZURE_API_KEY
          api_version: "2024-02-15-preview"

      - model_name: gpt-35-turbo
        litellm_params:
          model: azure/gpt-35-turbo
          api_base: https://your-instance.openai.azure.com
          api_key: os.environ/AZURE_API_KEY
          api_version: "2024-02-15-preview"

    litellm_settings:
      success_callback: ["langfuse"]
      drop_params: true
    ```
  </Tab>

  <Tab title="Start Worker">
    ```bash theme={null}
    export AZURE_API_KEY="your-azure-key"

    kubiya worker start \
      --queue-id=azure-queue \
      --type=local \
      --enable-local-proxy \
      --proxy-config-file=./litellm_azure.yaml
    ```
  </Tab>

  <Tab title="Multiple Deployments">
    **Use different Azure deployments:**

    ```yaml theme={null}
    model_list:
      # Production deployment
      - model_name: gpt-4-prod
        litellm_params:
          model: azure/gpt-4-prod-deployment
          api_base: https://prod-instance.openai.azure.com
          api_key: os.environ/AZURE_PROD_API_KEY
          api_version: "2024-02-15-preview"

      # Development deployment
      - model_name: gpt-4-dev
        litellm_params:
          model: azure/gpt-4-dev-deployment
          api_base: https://dev-instance.openai.azure.com
          api_key: os.environ/AZURE_DEV_API_KEY
          api_version: "2024-02-15-preview"
    ```
  </Tab>
</Tabs>

### Multi-Provider (Fallback & Load Balancing)

Configure multiple providers for reliability and cost optimization:

```yaml theme={null}
model_list:
  # Primary: AWS Bedrock (cost-effective)
  - model_name: claude-3-opus
    litellm_params:
      model: bedrock/anthropic.claude-3-opus-20240229-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

  # Fallback: Azure OpenAI
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_base: https://your-instance.openai.azure.com
      api_key: os.environ/AZURE_API_KEY

  # Fast/cheap tasks: Groq
  - model_name: llama3-groq
    litellm_params:
      model: groq/llama3-70b-8192
      api_key: os.environ/GROQ_API_KEY

  # Development: Local Ollama
  - model_name: llama3-local
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434

  # Code tasks: OpenAI
  - model_name: gpt-4-code
    litellm_params:
      model: gpt-4-turbo-preview
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  success_callback: ["langfuse"]
  drop_params: true
  # Optional: Set default model
  default_team_settings:
    - team_id: default
      success_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
```

## LiteLLM Configuration Reference

### Model List Structure

Each model in `model_list` requires:

<ParamField path="model_name" type="string" required>
  User-facing model name used in requests (e.g., "gpt-4", "claude-3-opus")
</ParamField>

<ParamField path="litellm_params" type="object" required>
  Parameters passed to LiteLLM:

  * `model`: Provider-specific model identifier (e.g., "azure/gpt-4", "bedrock/claude-3-opus")
  * `api_key`: API key (use `os.environ/VAR_NAME` for environment variables)
  * `api_base`: API endpoint URL (for Azure, self-hosted, etc.)
  * Provider-specific params (region, version, etc.)
</ParamField>

### LiteLLM Settings

<ParamField path="litellm_settings" type="object">
  Global settings for the proxy:

  * `success_callback`: List of callbacks on success (e.g., `["langfuse"]`)
  * `failure_callback`: List of callbacks on failure
  * `drop_params`: Drop extra params not supported by provider
  * `num_retries`: Number of retry attempts on failure
  * `timeout`: Request timeout in seconds
</ParamField>

### Environment Variables

<ParamField path="environment_variables" type="object">
  Environment variables for the proxy process (API keys, Langfuse config, etc.)
</ParamField>

### Complete Schema Example

```yaml theme={null}
model_list:
  - model_name: "user-facing-name"
    litellm_params:
      model: "provider/model-id"
      api_key: "os.environ/API_KEY_VAR"
      api_base: "https://api.provider.com"
      # Provider-specific params
      api_version: "2024-01-01"
      aws_region_name: "us-east-1"
      # ... other params

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]
  drop_params: true
  num_retries: 3
  timeout: 600

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-..."
  LANGFUSE_SECRET_KEY: "sk-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
  # Add any provider credentials
  AZURE_API_KEY: "your-key"
  OPENAI_API_KEY: "your-key"
```

## Observability with Langfuse

Track LLM usage, costs, and performance by adding Langfuse to your config:

```yaml theme={null}
litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
```

This gives you:

* Token usage and cost per model
* Request latency and success rates
* Error tracking
* Per-user analytics

View metrics at [https://cloud.langfuse.com](https://cloud.langfuse.com) (sign up at [langfuse.com](https://langfuse.com))

## Advanced Features

### Rate Limiting

```yaml theme={null}
litellm_settings:
  rpm: 60  # Requests per minute
  tpm: 100000  # Tokens per minute
```

### Caching

```yaml theme={null}
litellm_settings:
  cache: true
  cache_params:
    type: "redis"
    host: "localhost"
    port: 6379
```

### Load Balancing

```yaml theme={null}
router_settings:
  routing_strategy: "least-busy"  # or "latency-based", "usage-based"
  num_retries: 3
  timeout: 600
```

### Custom Metadata

```yaml theme={null}
litellm_settings:
  metadata:
    environment: "production"
    team: "engineering"
    project: "agent-workflows"
```

## Troubleshooting

### Proxy Won't Start

```bash theme={null}
# Check if litellm is installed in venv
~/.kubiya/workers/<queue-id>/venv/bin/litellm --version

# View proxy logs
tail -f ~/.kubiya/workers/<queue-id>/litellm_proxy.log

# Verify config syntax
cat litellm_config.yaml | python3 -c "import sys, yaml; yaml.safe_load(sys.stdin)"
```

### Worker Not Using Local Proxy

```bash theme={null}
# Check worker environment
ps aux | grep worker.py
# Look for LITELLM_API_BASE in process env

# Verify configuration priority
# CLI flags > Env vars > Context > Control Plane

# Check worker logs
tail -f ~/.kubiya/workers/<queue-id>/logs/worker.log | grep -i litellm
```

### Authentication Errors

```bash theme={null}
# Check environment variables are set
echo $AWS_ACCESS_KEY_ID
echo $AZURE_API_KEY

# Verify keys in proxy config
cat litellm_config.yaml | grep api_key

# Test provider connection
curl -v https://bedrock-runtime.us-east-1.amazonaws.com
```

### Model Not Found

```bash theme={null}
# Check model name in config
cat litellm_config.yaml | grep model_name

# Verify model ID is correct for provider
# Example: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0

# Check provider documentation
# https://docs.litellm.ai/docs/providers/<provider-name>
```

### Performance Issues

```bash theme={null}
# Monitor proxy resource usage
ps aux | grep litellm

# Check for rate limiting
tail -f ~/.kubiya/workers/<queue-id>/litellm_proxy.log | grep -i "rate limit"

# Increase timeout
# Add to litellm_settings:
#   timeout: 1200  # 20 minutes
```

## Best Practices

**Security:**

* Use environment variables for API keys (never hardcode)
* Restrict file permissions: `chmod 600 ~/.kubiya/litellm_*.yaml`
* Rotate credentials regularly
* Use IAM roles when possible (AWS Bedrock on EC2/ECS)

**Cost Optimization:**

* Use cheaper models for simple tasks
* Enable caching to avoid duplicate requests
* Set rate limits to prevent runaway costs
* Monitor usage with Langfuse

**Reliability:**

* Configure fallback models across providers
* Set appropriate timeouts
* Enable retries for transient failures
* Monitor proxy logs

**Development Workflow:**

1. Start with Ollama for local dev (free, fast)
2. Test with staging before production
3. Enable Langfuse early
4. Use context configuration for environment switching

## Examples by Use Case

### Development Environment (Local Ollama)

**Quick setup for local development:**

```bash theme={null}
# 1. Install and start Ollama
ollama serve

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: dev
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-json: |
          {
            "model_list": [
              {"model_name": "llama3", "litellm_params": {"model": "ollama/llama3", "api_base": "http://localhost:11434"}}
            ]
          }
EOF

# 3. Switch to dev context and start worker
kubiya config use-context dev
kubiya worker start --queue-id=dev-queue --type=local
```

### Staging Environment (Azure OpenAI)

**Create separate config file for staging:**

```bash theme={null}
# 1. Create staging LiteLLM config
cat > ~/.kubiya/litellm_staging.yaml <<EOF
model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4-staging
      api_base: https://staging-instance.openai.azure.com
      api_key: os.environ/AZURE_STAGING_API_KEY
      api_version: "2024-02-15-preview"
EOF

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: staging
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-file: ~/.kubiya/litellm_staging.yaml
EOF

# 3. Use it
kubiya config use-context staging
kubiya worker start --queue-id=staging-queue --type=local
```

### Production Environment (AWS Bedrock)

**Production setup with Langfuse observability:**

```bash theme={null}
# 1. Create production LiteLLM config
cat > ~/.kubiya/litellm_production.yaml <<EOF
model_list:
  - model_name: claude-3-5-sonnet
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: us-east-1

litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

environment_variables:
  LANGFUSE_PUBLIC_KEY: "pk-lf-..."
  LANGFUSE_SECRET_KEY: "sk-lf-..."
  LANGFUSE_HOST: "https://cloud.langfuse.com"
EOF

# 2. Add to ~/.kubiya/config
cat >> ~/.kubiya/config <<EOF
contexts:
  - name: production
    context:
      api-url: https://control-plane.kubiya.ai
      litellm-proxy:
        enabled: true
        config-file: ~/.kubiya/litellm_production.yaml
EOF

# 3. Set AWS credentials and start
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
kubiya config use-context production
kubiya worker start --queue-id=prod-queue --type=local --daemon
```

## Related Documentation

* [Configuration File](/cli/configuration-file) - Persistent LLM gateway config
* [Environment Variables](/cli/environment-variables) - LiteLLM proxy env vars
* [Worker Management](/cli/workers) - Deploy workers
* [LiteLLM Providers](https://docs.litellm.ai/docs/providers) - All 100+ providers
* [AWS Bedrock](https://docs.litellm.ai/docs/providers/bedrock) - Bedrock setup
* [Ollama](https://docs.litellm.ai/docs/providers/ollama) - Local models
