Serverless Workflow Architecture

Kubiya implements a cutting-edge serverless architecture where every workflow and step runs as an independent Docker container. This design enables unprecedented flexibility, scalability, and software compatibility.

Core Principles

1. Containerized Everything

🐳 Docker-Based Execution

Every single step in your workflow runs in its own Docker container:

  • Any Language: Python, JavaScript, Go, Rust, Java, C++, Ruby, etc.
  • Any Tool: Git, AWS CLI, Terraform, kubectl, npm, cargo, etc.
  • Any Library: Install and use any package or dependency
  • Any Version: Pin specific versions of languages and tools

2. True Statelessness

Each workflow execution is completely independent:

# This workflow runs in a fresh environment every time
workflow = Workflow(
    name="data-pipeline",
    runner="kubiya-hosted"
)

# Step 1 runs in container A
workflow.add_step(
    name="fetch-data",
    image="python:3.11",
    code="# This container is destroyed after execution"
)

# Step 2 runs in container B (completely separate)
workflow.add_step(
    name="process-data",
    image="python:3.11",
    code="# Fresh container, no state from Step 1"
)

3. Infinite Scalability

The serverless model means:

  • No pre-provisioned resources
  • Automatic scaling based on demand
  • Pay only for actual execution time
  • Handle 1 or 1,000,000 workflows seamlessly

Architecture Diagram

Comparison with Other Orchestration Platforms

Kubiya vs Traditional Orchestrators

Execution Model Comparison

Resource Management

How It Works

1. Workflow Definition

You define workflows using Python SDK, YAML, or through AI generation:

from kubiya_workflow_sdk import Workflow, Step

workflow = Workflow(
    name="multi-tool-pipeline",
    runner="kubiya-hosted"
)

2. Container Specification

Each step specifies its container requirements:

# Use pre-built images
workflow.add_step(Step(
    name="python-analysis",
    image="python:3.11-slim",
    packages=["pandas", "numpy", "scikit-learn"],
    code="..."
))

# Or custom images
workflow.add_step(Step(
    name="custom-tool",
    image="myregistry/my-tool:latest",
    code="..."
))

3. Execution Flow

4. Data Flow

Steps communicate through artifacts:

# Step 1: Generate data
workflow.add_step(Step(
    name="generate",
    code="""
    data = create_dataset()
    save_artifact('dataset.csv', data)
    """
))

# Step 2: Process data
workflow.add_step(Step(
    name="process",
    code="""
    data = load_artifact('dataset.csv')
    results = analyze(data)
    save_artifact('results.json', results)
    """
))

Real-World Examples

Multi-Language ETL Pipeline

// ... existing code ...

DevOps Automation

// ... existing code ...

Advanced Features

Custom Base Images

Create specialized images for your workflows:

# Dockerfile
FROM python:3.11
RUN pip install pandas numpy scikit-learn tensorflow
RUN apt-get update && apt-get install -y graphviz
COPY models/ /opt/models/
workflow.add_step(Step(
    name="ml-pipeline",
    image="myregistry/ml-base:latest",
    code="# Use pre-installed ML tools"
))

Resource Management

Control container resources:

workflow.add_step(Step(
    name="memory-intensive",
    image="python:3.11",
    resources={
        "memory": "8Gi",
        "cpu": "4",
        "gpu": "1"  # For ML workloads
    },
    code="# Run memory-intensive operations"
))

Network Isolation

Each container runs in isolation:

# Step 1: Start a service
workflow.add_step(Step(
    name="api-server",
    image="node:20",
    ports=[3000],
    code="# Start API server"
))

# Step 2: Different network namespace
workflow.add_step(Step(
    name="client",
    image="python:3.11",
    code="# Cannot access Step 1's ports directly"
))

AI-Powered Generation

With ADK orchestration, you can generate these complex workflows using natural language:

from kubiya_workflow_sdk import KubiyaWorkflow

# Generate entire workflow from description
workflow = KubiyaWorkflow.from_prompt(
    """
    Create a data pipeline that:
    1. Extracts data from PostgreSQL and MongoDB
    2. Transforms using Python pandas
    3. Runs ML predictions with TensorFlow
    4. Stores results in S3 and sends Slack notification
    """,
    runner="kubiya-hosted"
)

# ADK generates the complete containerized workflow
result = workflow.execute()

Benefits Summary

πŸš€ Any Software

Run literally any software, tool, or language in your workflows

πŸ“¦ Zero Dependencies

No need to pre-install anything - containers have everything

πŸ”„ Perfect Isolation

Each step runs in complete isolation with no side effects

⚑ Instant Scale

From 1 to 1 million executions without infrastructure changes

Security & Compliance

The containerized architecture provides:

  • Process Isolation: Each step runs in its own namespace
  • Resource Limits: Prevent runaway processes
  • Network Policies: Control communication between steps
  • Audit Trails: Complete execution history
  • Secrets Management: Secure credential injection

What’s Next?

Getting Started

Create your first containerized workflow

ADK Provider

Learn about AI-powered workflow generation

Examples

See real-world containerized workflows