Skip to main content
Workers are the compute engines that power Kubiya’s distributed execution. They run on your infrastructure—whether that’s a laptop, a server, or a Kubernetes cluster—and pull tasks from queues to execute agent workflows. The Kubiya CLI manages the entire worker lifecycle, from registration to health monitoring, making deployment and management straightforward.

What is a Worker?

A worker is a program that connects to Kubiya and executes tasks. Think of workers like delivery drivers waiting at a distribution center: when packages (tasks) arrive at the distribution center (task queue), available drivers (workers) pick them up and deliver them. Workers are designed for distributed execution, which means:
  • They run on your own machines, not in Kubiya’s cloud—giving you control over where your workloads execute
  • Multiple workers can share the workload for scalability—add more workers to handle more tasks in parallel
  • Workers poll task queues for work assignments—they actively check for new work and pull tasks when available
  • Each worker operates independently—if one worker goes down, others continue processing tasks
This architecture gives you the flexibility to scale execution across your infrastructure while maintaining security and control over where sensitive operations run.

Where Workers Run

One of Workers’ key strengths is platform flexibility. You can deploy workers on virtually any compute infrastructure:

MacOS

Perfect for local development and individual contributor machines. Developers can run workers on their MacBooks to test agent workflows in a local environment before deploying to production.

Windows

Ideal for desktop automation and Windows-specific tooling. If your agents need to interact with Windows applications or APIs, run workers on Windows servers or workstations.

Linux

The most common deployment target for production workloads. Run workers on Linux servers, VMs, or cloud instances (AWS EC2, Google Compute Engine, Azure VMs).

Kubernetes

Scalable production deployments with auto-scaling capabilities. Deploy workers as Kubernetes pods that automatically scale based on workload, with built-in health checks and rolling updates.

OpenShift

Enterprise Kubernetes distribution with additional security and governance features. Ideal for organizations with strict compliance requirements and existing OpenShift infrastructure. This flexibility means you can:
  • Run workers locally during development
  • Deploy to production on Kubernetes for scale
  • Place workers inside private networks to reach internal systems
  • Use specialized hardware (GPU machines, high-memory servers) for specific workloads

Worker Lifecycle Management

The Kubiya CLI takes care of everything workers need to function—you don’t need to manually configure connections, manage credentials, or handle retries. Here’s what happens in a worker’s lifecycle:

1. Registration

When a worker starts, it authenticates with the Kubiya Control Plane using an API key. The Control Plane verifies the worker’s identity and sends back everything it needs: Temporal connection credentials, LLM gateway settings, and queue configuration.

2. Polling

Once registered, the worker continuously checks its assigned task queue for new work. This is an active process—workers pull work when they’re ready, rather than having work pushed to them. This design prevents overloading workers and provides natural load balancing.

3. Execution

When a worker receives a task, it executes the agent workflow in an isolated environment. The worker streams execution logs and events in real-time so you can monitor progress. If a task fails, the worker handles retry logic automatically.

4. Health Monitoring

Every 30 seconds (configurable), workers send heartbeats to the Control Plane reporting their status, how many tasks they’re processing, and system health metrics. This allows the Control Plane to route tasks only to healthy workers.

5. Shutdown

When a worker shuts down (planned or unplanned), it drains in-flight tasks gracefully whenever possible, ensuring tasks complete before the worker terminates. The key takeaway: The Kubiya CLI handles registration, configuration, credentials, retries, and health monitoring automatically. You just start the worker—everything else is managed for you.

Worker Registration Flow

Here’s a visual representation of how workers register and begin processing tasks: This flow shows the complete lifecycle: from starting a worker with a single CLI command to continuous task execution and health monitoring. Key Steps:
  1. CLI starts the worker with kubiya worker start --queue-id=<queue-id>
  2. Worker authenticates with the Control Plane using an API key
  3. Control Plane sends configuration including Task Queue ID, Temporal credentials, and LLM settings
  4. Worker connects to the Task Queue and begins polling for tasks
  5. Continuous execution loop where the worker pulls tasks, executes agent workflows, and reports results
  6. Health monitoring via periodic heartbeats every 30 seconds
Platform Support: Workers run on MacOS, Windows, Linux, Kubernetes, and OpenShift - giving you flexibility to deploy on any infrastructure.

Quick Start

Getting started with workers is straightforward. Here’s how to run your first worker:
# 1. Install the Kubiya CLI
brew install kubiya-cli

# 2. Authenticate with your Kubiya account
kubiya auth login

# 3. Start a worker (one command!)
kubiya worker start --queue-id=my-queue
That’s it! The CLI handles everything automatically:
  • Creates a Python virtual environment
  • Installs dependencies
  • Registers with the Control Plane
  • Connects to the task queue
  • Begins polling for tasks
  • Streams logs to your terminal
For production deployments or different configurations, see the deployment modes below.

Deployment Modes

Workers support multiple deployment modes to fit different use cases:

Local Mode

For development and testing. Runs in the foreground with live logging, perfect for debugging agent workflows during development. Use when: You’re developing agents locally and want to see real-time output.

Daemon Mode

For production deployments on a single server. Runs in the background as a daemon process with automatic restart on crashes and log rotation. Use when: You need a worker running continuously on a single machine without manual intervention.

Docker Mode

For containerized deployments with complete environment isolation. Package workers as Docker containers for portable, reproducible deployments. Use when: You want container isolation or need to deploy workers across different environments with consistent behavior.

Kubernetes Mode

For scalable production deployments with high availability. Deploy workers as Kubernetes pods with horizontal auto-scaling, health checks, and rolling updates. Use when: You need production-scale deployment with automatic scaling based on workload and enterprise-grade reliability.
For detailed deployment guides, YAML configurations, environment variables, troubleshooting steps, and advanced patterns, see the Worker Management CLI Reference.

Workers vs Task Queues

It’s important to understand the relationship between workers and task queues:
ConceptWhat It IsAnalogyPurpose
Task QueueA waiting area for tasks that need executionRestaurant order queueHolds and distributes work
WorkerThe execution engine that processes tasksKitchen staff preparing ordersExecutes and completes work
The relationship: Task queues hold the work; workers do the work.
  • One queue, multiple workers: This is the standard pattern for horizontal scaling. Multiple workers attached to the same queue process tasks in parallel, increasing throughput.
  • One worker, multiple queues: A single worker can be attached to multiple queues if it has capacity, allowing it to handle different types of work.
  • Queue-based routing: Tasks are routed to specific queues based on environment (dev/staging/prod), priority (high/low), or other criteria. Workers attached to those queues process the relevant tasks.
Learn more about task queues and how they distribute work in the Task Queues documentation.

Common Patterns

Here are typical deployment patterns you’ll see in production:

Single Queue, Multiple Workers

The most common pattern for scaling throughput. Create one task queue and attach multiple workers to it. As workload increases, add more workers to handle the load. Kubernetes deployments can auto-scale workers based on CPU or custom metrics. Example: A production queue with 10 workers handling agent workflows. During peak hours, scale to 20 workers automatically.

Environment-Specific Queues

Separate queues for different environments to isolate workloads and prevent production issues from affecting development. Example: Three queues—dev-queue, staging-queue, prod-queue—each with their own workers running in the appropriate environment.

Specialized Workers

Dedicate workers with specific hardware or configurations for particular workload types. Example:
  • GPU workers for machine learning inference tasks
  • High-memory workers for data processing pipelines
  • Workers with access to specific internal APIs for integration workflows

Hybrid Deployment

Combine different deployment modes for flexibility and cost optimization. Example:
  • Kubernetes workers for production scale and auto-scaling
  • Local workers for developers during feature development
  • Daemon mode workers on edge servers for low-latency regional processing

Next Steps