Worker Management

Kubiya Workers are Temporal-based execution engines that process AI agent workflows with enterprise-grade reliability and scalability. This guide covers worker deployment, configuration, and management.

What is a Worker?

Workers are distributed execution engines that:

Poll Task Queues: Listen for workflow and activity tasks from Temporal
Execute Agent Workflows: Run AI agents with tools and integrations
Report Health: Send heartbeats to the Control Plane
Stream Events: Provide real-time execution updates
Handle Failures: Automatically retry failed tasks with exponential backoff

Quick Start

Start Your First Worker

# 1. Configure API Key
export KUBIYA_API_KEY="your-api-key"

# 2. Start a local worker
kubiya worker start --queue-id=my-queue --type=local

# 3. Monitor logs (in another terminal)
tail -f ~/.kubiya/workers/my-queue/logs/worker.log

Deployment Modes

Local Mode

Development and testing with Python virtual environment

Daemon Mode

Production deployment running as background process

Docker Mode

Containerized deployment with isolation

Kubernetes Mode

Scalable multi-replica deployment with auto-scaling

Worker Architecture

Core Components

Communication Flow

Registration: Worker registers with Control Plane and receives configuration
Temporal Connection: Connects to Temporal Cloud using provided credentials
Task Polling: Continuously polls assigned queue for new tasks
Task Execution: Executes agent workflows and activities
Event Streaming: Sends real-time events to Control Plane
Health Reporting: Periodic heartbeats with metrics and status

Deployment Modes

Local Mode

Best for development and testing.

# Start local worker
kubiya worker start --queue-id=dev-queue --type=local

# With custom environment
export LOG_LEVEL=DEBUG
export HEARTBEAT_INTERVAL=15
kubiya worker start --queue-id=dev-queue --type=local

Features:

Automatic Python virtual environment setup
Foreground process with live logging
Quick iteration and debugging
Dependencies auto-installed

Directory Structure:

~/.kubiya/workers/dev-queue/
├── venv/                 # Python virtual environment
├── logs/
│   └── worker.log        # Execution logs
├── worker.py             # Worker implementation
└── requirements.txt      # Python dependencies

Daemon Mode

Production deployment as a background process.

# Start daemon
kubiya worker start --queue-id=prod-queue --type=local --daemon

# Or use shorthand
kubiya worker start --queue-id=prod-queue --type=local -d

# Check status
cat ~/.kubiya/workers/prod-queue/daemon_info.txt

# View logs
tail -f ~/.kubiya/workers/prod-queue/logs/worker.log

# Stop daemon
pkill -f "worker.py.*prod-queue"

Features:

Runs in background
Automatic restart on crash
Log rotation (configurable)
PID and status tracking

Configuration:

# Start with custom log settings
kubiya worker start \
  --queue-id=prod-queue \
  --type=local \
  --daemon \
  --max-log-size=104857600 \     # 100MB
  --max-log-backups=10

Docker Mode

Isolated containerized deployment.

# Start Docker worker via CLI
kubiya worker start --queue-id=docker-queue --type=docker

# Or run container directly
docker run -d \
  --name kubiya-worker \
  --restart unless-stopped \
  -e KUBIYA_API_KEY="your-api-key" \
  -e CONTROL_PLANE_URL="https://control-plane.kubiya.ai" \
  -e QUEUE_ID="docker-queue" \
  -e LOG_LEVEL="INFO" \
  ghcr.io/kubiyabot/agent-worker:latest

# View logs
docker logs -f kubiya-worker

# Stop worker
docker stop kubiya-worker
docker rm kubiya-worker

Docker Compose:

# docker-compose.yml
version: '3.8'
services:
  kubiya-worker:
    image: ghcr.io/kubiyabot/agent-worker:latest
    container_name: kubiya-worker
    restart: unless-stopped
    environment:
      - KUBIYA_API_KEY=${KUBIYA_API_KEY}
      - CONTROL_PLANE_URL=https://control-plane.kubiya.ai
      - QUEUE_ID=docker-queue
      - LOG_LEVEL=INFO
      - HEARTBEAT_INTERVAL=30
    volumes:
      - ./logs:/root/.kubiya/workers/logs
    logging:
      driver: "json-file"
      options:
        max-size: "100m"
        max-file: "10"

Start with:

docker-compose up -d
docker-compose logs -f

Kubernetes Mode

Scalable production deployment with high availability.

Basic Deployment
With Autoscaling
With Monitoring

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubiya-worker
  namespace: kubiya
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubiya-worker
  template:
    metadata:
      labels:
        app: kubiya-worker
    spec:
      containers:
      - name: worker
        image: ghcr.io/kubiyabot/agent-worker:latest
        command: ["kubiya", "worker", "start"]
        args:
          - "--queue-id=$(QUEUE_ID)"
          - "--type=local"
        env:
        - name: KUBIYA_API_KEY
          valueFrom:
            secretKeyRef:
              name: kubiya-secrets
              key: api-key
        - name: CONTROL_PLANE_URL
          value: "https://control-plane.kubiya.ai"
        - name: QUEUE_ID
          value: "production-queue"
        - name: LOG_LEVEL
          value: "INFO"
        - name: WORKER_HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          exec:
            command: ["pgrep", "-f", "worker.py"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command: ["pgrep", "-f", "worker.py"]
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Secret
metadata:
  name: kubiya-secrets
  namespace: kubiya
type: Opaque
stringData:
  api-key: "your-api-key-here"

Deploy:

# Create namespace
kubectl create namespace kubiya

# Apply manifests
kubectl apply -f kubiya-worker.yaml

# Scale deployment
kubectl scale deployment kubiya-worker -n kubiya --replicas=5

# View logs
kubectl logs -f deployment/kubiya-worker -n kubiya

# Check status
kubectl get pods -n kubiya
kubectl describe deployment kubiya-worker -n kubiya

Configuration

Environment Variables

KUBIYA_API_KEY

string

required

API authentication key for Control Plane

CONTROL_PLANE_URL

string

default:"https://control-plane.kubiya.ai"

Control Plane base URL

CONTROL_PLANE_GATEWAY_URL

string

Override Control Plane URL (takes precedence)

QUEUE_ID

string

required

Worker queue identifier (must match queue in Control Plane)

ENVIRONMENT_NAME

string

default:"default"

Environment name for the worker

WORKER_HOSTNAME

string

default:"auto-detected"

Custom worker hostname for identification

HEARTBEAT_INTERVAL

integer

default:"30"

Heartbeat interval in seconds (15-300)

LOG_LEVEL

string

default:"INFO"

Logging level: DEBUG, INFO, WARN, ERROR

MAX_CONCURRENT_ACTIVITIES

integer

default:"10"

Maximum concurrent activity executions

MAX_CONCURRENT_WORKFLOWS

integer

default:"5"

Maximum concurrent workflow executions

Advanced Configuration

# Performance tuning
export MAX_CONCURRENT_ACTIVITIES=20
export MAX_CONCURRENT_WORKFLOWS=10
export ACTIVITY_TIMEOUT=600

# Custom control plane
export CONTROL_PLANE_GATEWAY_URL="https://cp.company.internal"

# Debug mode
export LOG_LEVEL=DEBUG
export KUBIYA_DEBUG=true

# Resource limits (Docker/K8s)
export MEMORY_LIMIT="2Gi"
export CPU_LIMIT="1000m"

# Start worker with configuration
kubiya worker start --queue-id=tuned-queue --type=local

Monitoring

Log Management

# View real-time logs
tail -f ~/.kubiya/workers/<queue-id>/logs/worker.log

# Search logs for errors
grep ERROR ~/.kubiya/workers/<queue-id>/logs/worker.log

# View last 100 lines
tail -n 100 ~/.kubiya/workers/<queue-id>/logs/worker.log

# Follow logs with filtering
tail -f ~/.kubiya/workers/<queue-id>/logs/worker.log | grep "Task completed"

Health Checks

# Check if worker is running
ps aux | grep "worker.py.*<queue-id>"

# Check daemon status
cat ~/.kubiya/workers/<queue-id>/daemon_info.txt

# Test connectivity to Control Plane
curl https://control-plane.kubiya.ai/health

# Verify Temporal connection (in worker logs)
grep "Connected to Temporal" ~/.kubiya/workers/<queue-id>/logs/worker.log

Metrics

Workers report the following metrics:

Task Execution: Success/failure counts, execution time
Resource Usage: CPU, memory, network
Queue Status: Pending tasks, poll rate
Health Status: Heartbeat success, connectivity

# View metrics in logs
tail -f ~/.kubiya/workers/<queue-id>/logs/worker.log | grep "metrics"

# Example output:
# [INFO] Metrics: tasks_completed=42, avg_duration=3.2s, memory_mb=512

Troubleshooting

Worker Won’t Start

# Check Python version (requires 3.8+)
python3 --version

# Clear virtual environment
rm -rf ~/.kubiya/workers/<queue-id>/venv

# Start with debug logging
export LOG_LEVEL=DEBUG
kubiya worker start --queue-id=<queue-id> --type=local

# Check for port conflicts
lsof -i :7233  # Temporal port

Connection Issues

# Test Control Plane connectivity
curl -v https://control-plane.kubiya.ai/health

# Check API key
echo $KUBIYA_API_KEY

# Verify DNS resolution
nslookup control-plane.kubiya.ai

# Check firewall/proxy settings
echo $HTTP_PROXY
echo $HTTPS_PROXY

Worker Crashes

# Check crash logs
tail -n 500 ~/.kubiya/workers/<queue-id>/logs/worker.log | grep ERROR

# Increase memory limits (Docker/K8s)
# Edit deployment and set higher resource limits

# Enable auto-restart (daemon mode)
kubiya worker start --queue-id=<queue-id> --type=local --daemon

# Check for dependency issues
~/.kubiya/workers/<queue-id>/venv/bin/pip list

Task Execution Failures

# Check activity timeouts
export ACTIVITY_TIMEOUT=1200  # Increase to 20 minutes

# Verify tool availability
kubiya tool list --source <source-id>

# Check integration credentials
kubiya integration list

# Review execution logs
tail -f ~/.kubiya/workers/<queue-id>/logs/worker.log | grep "Activity failed"

Best Practices

Production Deployment

Use Daemon or Kubernetes Mode: Never run production workers in foreground

Monitor Health: Set up alerting for heartbeat failures

Resource Limits: Configure appropriate CPU/memory limits

Log Rotation: Enable log rotation to prevent disk fill

Multiple Replicas: Run at least 2-3 workers for high availability

Security

✅ Store API keys in secrets (Kubernetes Secrets, AWS Secrets Manager)
✅ Rotate keys regularly (at least quarterly)
✅ Use network policies to restrict traffic
✅ Enable TLS for all communications
✅ Monitor access logs for suspicious activity

Performance Optimization

# Tune concurrency based on workload
export MAX_CONCURRENT_ACTIVITIES=50  # For high-throughput
export MAX_CONCURRENT_WORKFLOWS=20

# Adjust heartbeat interval
export HEARTBEAT_INTERVAL=15  # More frequent for critical workers

# Optimize Python environment
# Use pypy for better performance
~/.kubiya/workers/<queue-id>/venv/bin/pip install pypy

Scaling Strategy

Vertical Scaling
Horizontal Scaling
Multi-Queue

# Increase resources per worker
resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

Command Reference

# Start worker
kubiya worker start --queue-id=<id> --type=<local|docker>

# Start daemon
kubiya worker start --queue-id=<id> --type=local --daemon

# Stop worker (daemon mode)
kubiya worker stop --queue-id=<id>

# View worker status
kubiya worker status --queue-id=<id>

# List all workers
kubiya worker list

# View logs
kubiya worker logs --queue-id=<id>

# Clear worker data
kubiya worker clean --queue-id=<id>

Next Steps

Resource Management

Learn to manage agents, tools, and other resources

Workflow Execution

Execute workflows on your workers

Environment Variables

Complete configuration reference

Advanced Deployment

Multi-region, HA, and enterprise patterns

Getting Started

Core Operations

Advanced

​What is a Worker?

​Quick Start

​Start Your First Worker

​Deployment Modes

Local Mode

Daemon Mode

Docker Mode

Kubernetes Mode

​Worker Architecture

​Core Components

​Communication Flow

​Deployment Modes

​Local Mode

​Daemon Mode

​Docker Mode

​Kubernetes Mode

​Configuration

​Environment Variables

​Advanced Configuration

​Monitoring

​Log Management

​Health Checks

​Metrics

​Troubleshooting

​Worker Won’t Start

​Connection Issues

​Worker Crashes

​Task Execution Failures

​Best Practices

​Production Deployment

​Security

​Performance Optimization

​Scaling Strategy

​Command Reference

​Next Steps

Resource Management

Workflow Execution

Environment Variables

Advanced Deployment

What is a Worker?

Quick Start

Start Your First Worker

Deployment Modes

Worker Architecture

Core Components

Communication Flow

Deployment Modes

Local Mode

Daemon Mode

Docker Mode

Kubernetes Mode

Configuration

Environment Variables

Advanced Configuration

Monitoring

Log Management

Health Checks

Metrics

Troubleshooting

Worker Won’t Start

Connection Issues

Worker Crashes

Task Execution Failures

Best Practices

Production Deployment

Security

Performance Optimization

Scaling Strategy

Command Reference

Next Steps