What is a Worker?
A worker is a program that connects to Kubiya and executes tasks. Think of workers like delivery drivers waiting at a distribution center: when packages (tasks) arrive at the distribution center (task queue), available drivers (workers) pick them up and deliver them. Workers are designed for distributed execution, which means:- They run on your own machines, not in Kubiya’s cloud—giving you control over where your workloads execute
- Multiple workers can share the workload for scalability—add more workers to handle more tasks in parallel
- Workers poll task queues for work assignments—they actively check for new work and pull tasks when available
- Each worker operates independently—if one worker goes down, others continue processing tasks
Where Workers Run
One of Workers’ key strengths is platform flexibility. You can deploy workers on virtually any compute infrastructure:MacOS
Perfect for local development and individual contributor machines. Developers can run workers on their MacBooks to test agent workflows in a local environment before deploying to production.Windows
Ideal for desktop automation and Windows-specific tooling. If your agents need to interact with Windows applications or APIs, run workers on Windows servers or workstations.Linux
The most common deployment target for production workloads. Run workers on Linux servers, VMs, or cloud instances (AWS EC2, Google Compute Engine, Azure VMs).Kubernetes
Scalable production deployments with auto-scaling capabilities. Deploy workers as Kubernetes pods that automatically scale based on workload, with built-in health checks and rolling updates.OpenShift
Enterprise Kubernetes distribution with additional security and governance features. Ideal for organizations with strict compliance requirements and existing OpenShift infrastructure. This flexibility means you can:- Run workers locally during development
- Deploy to production on Kubernetes for scale
- Place workers inside private networks to reach internal systems
- Use specialized hardware (GPU machines, high-memory servers) for specific workloads
Worker Lifecycle Management
The Kubiya CLI takes care of everything workers need to function—you don’t need to manually configure connections, manage credentials, or handle retries. Here’s what happens in a worker’s lifecycle:1. Registration
When a worker starts, it authenticates with the Kubiya Control Plane using an API key. The Control Plane verifies the worker’s identity and sends back everything it needs: Temporal connection credentials, LLM gateway settings, and queue configuration.2. Polling
Once registered, the worker continuously checks its assigned task queue for new work. This is an active process—workers pull work when they’re ready, rather than having work pushed to them. This design prevents overloading workers and provides natural load balancing.3. Execution
When a worker receives a task, it executes the agent workflow in an isolated environment. The worker streams execution logs and events in real-time so you can monitor progress. If a task fails, the worker handles retry logic automatically.4. Health Monitoring
Every 30 seconds (configurable), workers send heartbeats to the Control Plane reporting their status, how many tasks they’re processing, and system health metrics. This allows the Control Plane to route tasks only to healthy workers.5. Shutdown
When a worker shuts down (planned or unplanned), it drains in-flight tasks gracefully whenever possible, ensuring tasks complete before the worker terminates. The key takeaway: The Kubiya CLI handles registration, configuration, credentials, retries, and health monitoring automatically. You just start the worker—everything else is managed for you.Worker Registration Flow
Here’s a visual representation of how workers register and begin processing tasks: This flow shows the complete lifecycle: from starting a worker with a single CLI command to continuous task execution and health monitoring. Key Steps:- CLI starts the worker with
kubiya worker start --queue-id=<queue-id> - Worker authenticates with the Control Plane using an API key
- Control Plane sends configuration including Task Queue ID, Temporal credentials, and LLM settings
- Worker connects to the Task Queue and begins polling for tasks
- Continuous execution loop where the worker pulls tasks, executes agent workflows, and reports results
- Health monitoring via periodic heartbeats every 30 seconds
Quick Start
Getting started with workers is straightforward. Here’s how to run your first worker:- Creates a Python virtual environment
- Installs dependencies
- Registers with the Control Plane
- Connects to the task queue
- Begins polling for tasks
- Streams logs to your terminal
Deployment Modes
Workers support multiple deployment modes to fit different use cases:Local Mode
For development and testing. Runs in the foreground with live logging, perfect for debugging agent workflows during development. Use when: You’re developing agents locally and want to see real-time output.Daemon Mode
For production deployments on a single server. Runs in the background as a daemon process with automatic restart on crashes and log rotation. Use when: You need a worker running continuously on a single machine without manual intervention.Docker Mode
For containerized deployments with complete environment isolation. Package workers as Docker containers for portable, reproducible deployments. Use when: You want container isolation or need to deploy workers across different environments with consistent behavior.Kubernetes Mode
For scalable production deployments with high availability. Deploy workers as Kubernetes pods with horizontal auto-scaling, health checks, and rolling updates. Use when: You need production-scale deployment with automatic scaling based on workload and enterprise-grade reliability.For detailed deployment guides, YAML configurations, environment variables, troubleshooting steps, and advanced patterns, see the Worker Management CLI Reference.
Workers vs Task Queues
It’s important to understand the relationship between workers and task queues:| Concept | What It Is | Analogy | Purpose |
|---|---|---|---|
| Task Queue | A waiting area for tasks that need execution | Restaurant order queue | Holds and distributes work |
| Worker | The execution engine that processes tasks | Kitchen staff preparing orders | Executes and completes work |
- One queue, multiple workers: This is the standard pattern for horizontal scaling. Multiple workers attached to the same queue process tasks in parallel, increasing throughput.
- One worker, multiple queues: A single worker can be attached to multiple queues if it has capacity, allowing it to handle different types of work.
- Queue-based routing: Tasks are routed to specific queues based on environment (dev/staging/prod), priority (high/low), or other criteria. Workers attached to those queues process the relevant tasks.
Common Patterns
Here are typical deployment patterns you’ll see in production:Single Queue, Multiple Workers
The most common pattern for scaling throughput. Create one task queue and attach multiple workers to it. As workload increases, add more workers to handle the load. Kubernetes deployments can auto-scale workers based on CPU or custom metrics. Example: A production queue with 10 workers handling agent workflows. During peak hours, scale to 20 workers automatically.Environment-Specific Queues
Separate queues for different environments to isolate workloads and prevent production issues from affecting development. Example: Three queues—dev-queue, staging-queue, prod-queue—each with their own workers running in the appropriate environment.Specialized Workers
Dedicate workers with specific hardware or configurations for particular workload types. Example:- GPU workers for machine learning inference tasks
- High-memory workers for data processing pipelines
- Workers with access to specific internal APIs for integration workflows
Hybrid Deployment
Combine different deployment modes for flexibility and cost optimization. Example:- Kubernetes workers for production scale and auto-scaling
- Local workers for developers during feature development
- Daemon mode workers on edge servers for low-latency regional processing
Next Steps
Task Queues
Understand how task queues distribute work to workers
Worker Management CLI
Complete technical guide with deployment configurations, troubleshooting, and advanced patterns
Environments
Configure execution environments that define worker context and capabilities
Control Plane Architecture
Deep dive into worker registration internals and system architecture