Runners - Kubiya

Kubiya Runners are the execution engine that orchestrates serverless tools on your infrastructure. They manage container lifecycles, enforce security policies, and provide the bridge between Kubiya’s AI-generated workflows and your actual systems.

Why Runners Matter for Production

Runners solve critical challenges for production automation:

Data Sovereignty

All execution happens on your infrastructure
Sensitive data never leaves your environment
Meet compliance requirements for regulated industries
Full control over data residency and processing

Security & Isolation

Network policies control tool access to systems
Resource limits prevent runaway processes
Security scanning of all container images
Audit logging of every operation

Performance & Reliability

Execute tools close to your data and services
Automatic retry and error recovery
Load balancing across multiple runner instances
Caching of frequently used tool images

Runner Architecture

Core Components

Container Engine
Network Proxy
Secret Manager
Resource Controller

Manages tool execution lifecycle:

Pull and cache tool container images
Create isolated execution environments
Enforce resource limits and security policies
Collect logs and metrics from running containers

Deployment Options

Self-Hosted Runners

Deploy runners on your own infrastructure for maximum control:

Kubernetes

Native Kubernetes deployment with Helm charts

Docker Compose

Simple deployment for development and testing

VM/Bare Metal

Direct installation on Linux systems

Cloud Native

Optimized for AWS EKS, GCP GKE, Azure AKS

Kubernetes Deployment

# values.yaml for Kubiya Runner Helm chart
runner:
  replicas: 3
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 2000m
      memory: 4Gi

security:
  networkPolicies: true
  podSecurityStandards: restricted
  runAsNonRoot: true

storage:
  cacheSize: 20Gi
  logsRetention: 30d

integrations:
  kubernets:
    inClusterConfig: true
  aws:
    roleArn: "arn:aws:iam::123456789012:role/KubiyaRunner"

Benefits of Self-Hosted

Security

Complete control over network access, data processing, and credential handling

Compliance

Meet SOC2, HIPAA, PCI-DSS requirements with on-premises execution

Performance

Low latency access to internal systems and databases

Cost

No data transfer costs for large-scale operations

Hosted Runners

Use Kubiya’s managed infrastructure for quick setup:

Quick Start

No installation required - start automating immediately

Maintenance Free

Automatic updates, scaling, and monitoring

Global Reach

Runners available in multiple regions worldwide

Enterprise SLA

99.9% uptime guarantee with 24/7 support

Hosted runners are ideal for development and testing, but production workloads typically require self-hosted runners for security and compliance reasons.

Cross-Environment Orchestration

Runners enable seamless automation across different environments and clusters:

Multi-Cluster Workflows

Deploy applications across multiple Kubernetes clusters:

# Workflow that deploys to multiple environments
name: multi-environment-deployment
steps:
  - name: deploy-to-staging
    runner: staging-cluster-runner
    tool: kubernetes-deployer
    inputs:
      namespace: myapp-staging
      image: myapp:${BUILD_VERSION}
      
  - name: run-integration-tests
    runner: testing-runner
    tool: test-suite
    depends_on: [deploy-to-staging]
    
  - name: deploy-to-production
    runner: production-cluster-runner  
    tool: kubernetes-deployer
    inputs:
      namespace: myapp-prod
      image: myapp:${BUILD_VERSION}
    condition: ${run-integration-tests.status} == "success"

Cross-Cloud Operations

Orchestrate operations spanning multiple cloud providers:

# Disaster recovery workflow across clouds
name: cross-cloud-failover
steps:
  - name: backup-aws-data
    runner: aws-us-east-runner
    tool: aws-backup
    
  - name: restore-to-gcp
    runner: gcp-us-central-runner  
    tool: gcp-restore
    inputs:
      backup_location: ${backup-aws-data.backup_url}
    
  - name: update-dns
    runner: cloudflare-runner
    tool: dns-updater
    inputs:
      record: api.myapp.com
      target: ${restore-to-gcp.new_endpoint}

Intelligent Runner Selection

Kubiya automatically selects the best runner for each operation based on:

Proximity to target systems and data
Available resources and current load
Security policies and network access rules
Cost optimization preferences

Security & Compliance

Network Security

Runners implement defense-in-depth networking:

# Network policy for runner security
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy  
metadata:
  name: kubiya-runner-policy
spec:
  podSelector:
    matchLabels:
      app: kubiya-runner
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: kubiya-control-plane
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: allowed-namespaces
  - to: []
    ports:
    - protocol: TCP
      port: 443  # HTTPS only

Resource Isolation

Each tool execution runs with strict resource controls:

# Resource limits for tool execution
limits:
  cpu: "2000m"
  memory: "4Gi"  
  ephemeral-storage: "10Gi"
  
security_context:
  runAsNonRoot: true
  runAsUser: 65534
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  
capabilities:
  drop:
  - ALL

Audit & Monitoring

Complete visibility into all runner operations:

Execution logs: Every command, API call, and file access
Performance metrics: Resource usage, execution time, error rates
Security events: Failed authentication, policy violations, anomalies
Compliance reports: SOC2, GDPR, HIPAA compliance summaries

Advanced Configuration

High Availability

Deploy runners with automatic failover:

# HA runner configuration
runner:
  replicas: 5
  antiAffinity: hard  # Spread across nodes
  
persistence:
  storageClass: fast-ssd
  replication: 3
  
loadBalancer:
  enabled: true
  sessionAffinity: ClientIP
  
backup:
  enabled: true
  schedule: "0 2 * * *"
  retention: 30d

Custom Resource Types

Define organization-specific resource types and policies:

# Custom resource definitions for your infrastructure
custom_resources:
  - name: microservice
    properties:
      team: string
      criticality: [low, medium, high, critical]
      data_classification: [public, internal, confidential, restricted]
    
policies:
  - name: critical_service_protection
    selector:
      criticality: critical
    rules:
      - require_approval: true
      - max_concurrent_operations: 1
      - rollback_required: true

Integration Plugins

Extend runner capabilities with custom plugins:

# Custom plugin for specialized monitoring
from kubiya_runner import Plugin

class CustomMonitoringPlugin(Plugin):
    def pre_execution(self, context):
        """Called before each tool execution"""
        self.start_monitoring(context.tool_name)
        
    def post_execution(self, context, result):
        """Called after each tool execution"""
        metrics = self.collect_metrics()
        self.send_to_custom_system(metrics)
        
    def on_error(self, context, error):
        """Called when tool execution fails"""
        self.trigger_incident_response(context, error)

Performance Optimization

Image Caching

Runners aggressively cache tool images for fast startup:

Multi-layer caching: Share common base layers across tools
Predictive pre-pulling: Download images before they’re needed
Garbage collection: Automatically clean up unused images
Compression: Reduce storage and transfer overhead

Resource Scaling

Automatically scale runner capacity based on demand:

# Horizontal Pod Autoscaler for runners
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: kubiya-runner-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kubiya-runner
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: active_executions
      target:
        type: AverageValue
        averageValue: "5"

Production Tip: Start with 3 runner replicas for high availability, then use metrics to determine optimal scaling parameters for your workload patterns.

What’s Next?

With runners managing tool execution, you need AI models to generate the intelligent workflows that determine what tools to run and when. Kubiya’s model-agnostic approach lets you choose the best AI for your use case.

AI Models →

Learn how AI models generate context-aware workflows

Deploy Runners

Step-by-step runner deployment guide

Introduction

Quick Start

Core Concepts

Using Kubiya

Workflows

MCP Integration

Administration

Reference

​Why Runners Matter for Production

​Data Sovereignty

​Security & Isolation

​Performance & Reliability

​Runner Architecture

​Core Components

​Deployment Options

​Self-Hosted Runners

Kubernetes

Docker Compose

VM/Bare Metal

Cloud Native

​Kubernetes Deployment

​Benefits of Self-Hosted

Security

Compliance

Performance

Cost

​Hosted Runners

Quick Start

Maintenance Free

Global Reach

Enterprise SLA

​Cross-Environment Orchestration

​Multi-Cluster Workflows

​Cross-Cloud Operations

​Intelligent Runner Selection

​Security & Compliance

​Network Security

​Resource Isolation

​Audit & Monitoring

​Advanced Configuration

​High Availability

​Custom Resource Types

​Integration Plugins

​Performance Optimization

​Image Caching

​Resource Scaling

​What’s Next?

AI Models →

Deploy Runners

Why Runners Matter for Production

Data Sovereignty

Security & Isolation

Performance & Reliability

Runner Architecture

Core Components

Deployment Options

Self-Hosted Runners

Kubernetes Deployment

Benefits of Self-Hosted

Hosted Runners

Cross-Environment Orchestration

Multi-Cluster Workflows

Cross-Cloud Operations

Intelligent Runner Selection

Security & Compliance

Network Security

Resource Isolation

Audit & Monitoring

Advanced Configuration

High Availability

Custom Resource Types

Integration Plugins

Performance Optimization

Image Caching

Resource Scaling

What’s Next?