> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kubiya.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture

> Control Plane system architecture, deployment models, and component interactions.

Central orchestration platform for AI agent and team execution. Manages workflows, worker coordination, policy enforcement, and integrations.

The Control Plane serves as the central nervous system for Kubiya's distributed AI execution platform, addressing the challenges of coordinating multi-tenant agent workloads, enforcing security policies, and maintaining consistent state across heterogeneous execution environments. It provides a unified orchestration layer that abstracts the complexity of workflow management, resource allocation, and integration coordination.

***

## Deployment Options

Choose between fully managed SaaS or self-hosted deployment based on your requirements.

### Kubiya Hosted (SaaS)

Fully managed infrastructure with zero operational overhead. All components, scaling, security, and maintenance handled by Kubiya.

**Benefits:** Zero ops • Auto-scaling • Enterprise security • 99.9% uptime SLA • Continuous updates

### Self-Hosted

**Stack:** Docker Compose (local) • Kubernetes/Helm (production) • Temporal Server • PostgreSQL • Redis

**Benefits:** Full control • On-premises • Custom security • Air-gapped support

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#10b981','primaryTextColor':'#fff','primaryBorderColor':'#059669','lineColor':'#6366f1','secondaryColor':'#8b5cf6','tertiaryColor':'#f59e0b','fontSize':'16px'}}}%%
flowchart LR
    subgraph saas[" 🌐 Kubiya Hosted — Fully Managed "]
        direction TB
        sm1["<b>Serverless API</b><br/>Auto-scaling<br/>Global edge network"]
        sm2["<b>Workflow Engine</b><br/>High availability<br/>Managed orchestration"]
        sm3["<b>Multi-tenant Database</b><br/>Row-level security<br/>Automatic backups"]
        sm4["<b>Distributed Cache</b><br/>Low latency<br/>Redis protocol"]
    end

    subgraph self[" 🏢 Self-Hosted — Full Control "]
        direction TB
        sh1["<b>Kubernetes/Docker</b><br/>Container orchestration<br/>Your infrastructure"]
        sh2["<b>Temporal Server</b><br/>Workflow engine<br/>Local or cloud"]
        sh3["<b>PostgreSQL</b><br/>Relational database<br/>Your deployment"]
        sh4["<b>Redis</b><br/>Cache & pub/sub<br/>Standalone/cluster"]
    end

    classDef saasStyle fill:#7c3aed,stroke:#5b21b6,stroke-width:3px,color:#fff,rx:10,ry:10
    classDef selfStyle fill:#0891b2,stroke:#0e7490,stroke-width:3px,color:#fff,rx:10,ry:10

    class sm1,sm2,sm3,sm4 saasStyle
    class sh1,sh2,sh3,sh4 selfStyle
```

***

## High-Level Architecture

Distributed, multi-tenant architecture for AI agent orchestration.

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart TB
    subgraph clients[" 👥 Clients "]
        U["<b>Users / API Clients</b><br/>REST & WebSocket"]
    end

    subgraph core[" 🎯 Control Plane Core "]
        API["<b>Control Plane API</b><br/>FastAPI • Multi-tenant"]
    end

    subgraph security[" 🔐 Security Layer "]
        Auth["<b>Authentication</b><br/>JWT + API Keys<br/>Token Validation"]
        OPA["<b>Policy Enforcement</b><br/>Open Policy Agent<br/>Access Control"]
    end

    subgraph data[" 💾 Data Layer "]
        DB[("<b>PostgreSQL</b><br/>Row-Level Security<br/>Multi-tenancy")]
        Redis[("<b>Redis</b><br/>Cache + Pub/Sub<br/>Session State")]
    end

    subgraph orchestration[" ⚙️ Orchestration "]
        T["<b>Temporal</b><br/>Workflow Engine<br/>Task Distribution"]
        TQ["<b>Task Queues</b><br/>Per-Organization<br/>Load Balancing"]
    end

    subgraph execution[" 🚀 Execution Layer "]
        W["<b>Workers</b><br/>Distributed Execution<br/>Multi-runtime"]
        RT["<b>Runtimes</b><br/>Agno • Claude Code<br/>MCP Integration"]
    end

    subgraph external[" 🌐 External Services "]
        LLM["<b>LiteLLM Gateway</b><br/>Multi-provider Routing<br/>Model Management"]
        CG["<b>Context Graph</b><br/>Knowledge API<br/>Organizational Context"]
    end

    U -->|"HTTPS/WSS"| API
    API -->|"Validate"| Auth
    API -->|"Enforce"| OPA
    API <-->|"CRUD"| DB
    API <-->|"Cache"| Redis
    API -->|"Submit Workflow"| T
    API <-->|"Context"| CG

    T -->|"Route to Queue"| TQ
    TQ -->|"Poll Tasks"| W
    W -->|"Heartbeat"| Redis
    W -->|"Execute"| RT
    RT -->|"LLM Calls"| LLM
    RT <-->|"Sessions"| DB

    classDef clientStyle fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    classDef coreStyle fill:#8b5cf6,stroke:#6d28d9,stroke-width:3px,color:#fff
    classDef securityStyle fill:#ef4444,stroke:#b91c1c,stroke-width:2px,color:#fff
    classDef dataStyle fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff
    classDef orchStyle fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff
    classDef execStyle fill:#06b6d4,stroke:#0891b2,stroke-width:2px,color:#fff
    classDef extStyle fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff

    class U clientStyle
    class API coreStyle
    class Auth,OPA securityStyle
    class DB,Redis dataStyle
    class T,TQ orchStyle
    class W,RT execStyle
    class LLM,CG extStyle
```

**Core Components:**

* **API Layer:** FastAPI-based REST and WebSocket interface for multi-tenant access
* **Orchestration:** Temporal workflows for reliable, distributed task execution
* **Data Persistence:** PostgreSQL with row-level security for tenant isolation
* **Caching & State:** Redis for performance optimization and real-time communication
* **Security:** JWT/API key authentication with OPA-based policy enforcement
* **LLM Gateway:** LiteLLM for unified access to multiple AI model providers
* **Knowledge Layer:** Context Graph for organizational memory and cross-agent learning

***

## Control Plane API

The Control Plane API provides a comprehensive REST and WebSocket interface for managing the entire lifecycle of AI agents, teams, and workflows.

**Architectural Approach:**

* **Multi-tenant by design:** All data access enforced at the database level through row-level security
* **Async-first architecture:** Built on FastAPI with async/await for high-concurrency workloads
* **Strongly typed contracts:** Pydantic models ensure request/response validation
* **Real-time communication:** WebSocket endpoints for streaming execution logs and live status updates
* **Structured observability:** Correlation IDs and structured logging for distributed tracing

**API Categories:**

* **Agent & Team Management:** Create, configure, and orchestrate AI agents and multi-agent teams
* **Execution Control:** Submit tasks, monitor progress, stream outputs, and handle approvals
* **Resource Management:** Task queue registration, environment configuration, model registry
* **Governance:** Policy definition, secret management, integration credentials
* **Analytics & Observability:** Usage metrics, cost tracking, execution history

The API serves as the primary integration point for CI/CD pipelines, ChatOps interfaces, and custom automation workflows.

***

## Authentication & Authorization

Multi-tenant authentication with organization-level isolation ensures secure access across all Control Plane operations.

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart LR
    subgraph client[" 👤 Client Request "]
        C["API Request<br/>JWT or API Key"]
    end

    subgraph auth[" 🔐 Authentication Flow "]
        direction TB
        V["Token Validation"]
        CH["Cache Check"]
        EV["External Validation"]
        CH --> |Cache Hit| V
        CH --> |Cache Miss| EV
        EV --> V
    end

    subgraph context[" 🎯 Request Context "]
        OC["Organization Context<br/>Roles • Permissions<br/>Tenant Scope"]
    end

    subgraph data[" 💾 Data Access "]
        RLS["Row-Level Security<br/>Tenant Isolation"]
    end

    C --> CH
    V --> OC
    OC --> RLS

    classDef clientStyle fill:#3b82f6,stroke:#1e40af
    classDef authStyle fill:#8b5cf6,stroke:#6d28d9
    classDef contextStyle fill:#10b981,stroke:#047857
    classDef dataStyle fill:#f59e0b,stroke:#d97706

    class C clientStyle
    class V,CH,EV authStyle
    class OC contextStyle
    class RLS dataStyle
```

**Authentication Methods:**

* **JWT Bearer Tokens:** User-scoped authentication with role-based access control
* **API Keys:** Service-to-service authentication for automation and integrations

**Security Architecture:**

* **Token Caching:** Redis-backed cache layer minimizes external validation latency
* **Organization Scoping:** Every request automatically scoped to authenticated organization
* **Database-Level Isolation:** PostgreSQL row-level security (RLS) enforces tenant boundaries
* **Role-Based Access Control (RBAC):** Fine-grained permissions for resource access
* **Automatic Token Rotation:** Built-in support for credential lifecycle management

All authenticated requests establish an organization context that flows through the entire request lifecycle, ensuring consistent security enforcement from API entry through database access.

***

## Policy Enforcement

OPA-based access control and governance.

**Policy Types:** Tool usage control • Resource access restrictions • Execution constraints (limits/quotas) • Approval workflows

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart LR
    A["🚀 <b>Execution Request</b><br/>Agent/Team/Job"]

    subgraph pre[" 🔍 Pre-Execution "]
        P["<b>Policy Enforcer</b><br/>OPA Watchdog"]
        D{"<b>Policy Check</b><br/>✓ Tools allowed?<br/>✓ Resources accessible?<br/>✓ Quotas OK?"}
    end

    subgraph exec[" ⚙️ Execution "]
        E["<b>Workflow Starts</b><br/>Temporal Orchestration"]
        M["<b>Runtime Monitor</b><br/>Continuous Enforcement"]
    end

    subgraph outcomes[" 📊 Outcomes "]
        S["✅ <b>Success</b><br/>Task completed"]
        R["❌ <b>Rejected</b><br/>Policy violation"]
        L["📝 <b>Audit Log</b><br/>Compliance trail"]
    end

    A -->|"1. Submit"| P
    P -->|"2. Evaluate"| D
    D -->|"✅ Allow"| E
    D -->|"❌ Deny"| R
    E -->|"3. Monitor"| M
    M -->|"4. Complete"| S
    S --> L
    R --> L

    classDef reqStyle fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    classDef policyStyle fill:#8b5cf6,stroke:#6d28d9,stroke-width:2px,color:#fff
    classDef execStyle fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff
    classDef successStyle fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff
    classDef rejectStyle fill:#ef4444,stroke:#b91c1c,stroke-width:2px,color:#fff
    classDef auditStyle fill:#64748b,stroke:#475569,stroke-width:2px,color:#fff

    class A reqStyle
    class P,D policyStyle
    class E,M execStyle
    class S successStyle
    class R rejectStyle
    class L auditStyle
```

**Architecture Decision: Why OPA?**

* **Decoupled policy logic:** Policies defined separately from application code enable non-engineers to manage governance
* **Declarative approach:** Rego policy language allows expressing complex rules as data
* **Centralized enforcement:** Single policy engine across all agents, teams, and workflows
* **Audit transparency:** Policy decisions are logged with full context for compliance reporting
* **Dynamic evaluation:** Policies evaluated at execution time based on current state and context

**Policy Scope:**

* **Tool Usage Control:** Restrict which Skills and MCP servers agents can invoke
* **Resource Access:** Limit file paths, network destinations, or cloud resources
* **Execution Constraints:** Enforce quotas on compute time, token usage, or cost
* **Approval Workflows:** Require human-in-the-loop approval for sensitive operations

**Enforcement Points:**

* **Pre-execution:** Validate permissions before workflow submission
* **During execution:** Monitor and enforce limits in real-time
* **Post-execution:** Generate audit logs for compliance and security review

Policies can be attached at multiple levels (organization, environment, team, agent) with inheritance and override semantics that balance flexibility with security.

***

## Worker Registration & Lifecycle

Workers register with the Control Plane to receive and execute tasks, establishing a durable connection to the distributed orchestration system.

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart TD
    subgraph worker[" 🔧 Worker Instance "]
        W["Worker Process<br/>Queue ID<br/>System Info"]
    end

    subgraph registration[" 📋 Registration "]
        direction TB
        R["Authenticate<br/>& Register"]
        RC["Receive Config<br/>Temporal • LiteLLM<br/>Credentials"]
    end

    subgraph orchestration[" ⚙️ Orchestration "]
        T["Temporal<br/>Task Queue<br/>Connection"]
    end

    subgraph heartbeat[" 💓 Health Monitoring "]
        direction TB
        LH["Lightweight:<br/>Status • Task Count"]
        FH["Full:<br/>System Metrics<br/>Recent Logs"]
    end

    W --> R
    R --> RC
    RC --> T
    W -.-> LH
    W -.-> FH

    classDef workerStyle fill:#06b6d4,stroke:#0891b2
    classDef regStyle fill:#8b5cf6,stroke:#6d28d9
    classDef orchStyle fill:#f59e0b,stroke:#d97706
    classDef hbStyle fill:#10b981,stroke:#047857

    class W workerStyle
    class R,RC regStyle
    class T orchStyle
    class LH,FH hbStyle
```

**Registration Flow:**

1. **Initial Connection:** Worker authenticates with API key and queue identifier
2. **Configuration Receipt:** Control Plane provides connection credentials for Temporal and LiteLLM
3. **Queue Binding:** Worker connects to organization-specific Temporal task queue
4. **Polling Activation:** Worker begins polling for available workflow tasks

**Health Monitoring Strategy:**

* **Lightweight Heartbeats:** Frequent status updates with minimal overhead (task count, current state)
* **Full Heartbeats:** Periodic detailed reports including system metrics and execution logs
* **Redis-Based State:** Fast lookups for worker status without database queries
* **Automatic Staleness Detection:** Workers that stop reporting are marked inactive

This dual-level heartbeat approach balances real-time visibility with system efficiency, enabling the Control Plane to route tasks only to healthy workers while minimizing network and storage overhead.

***

## Task Queue Architecture

Hierarchical task routing and worker management.

<Note>
  For a beginner-friendly introduction to workers and task queues, see [Workers Overview](/core-concepts/execution-infrastructure/workers) and [Task Queues](/core-concepts/execution-infrastructure/task-queues).
</Note>

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart TD
    subgraph hierarchy[" 📊 Queue Hierarchy "]
        direction TB
        O["🏢 <b>Organization</b><br/>Multi-tenant isolation"]
        E["🌍 <b>Environment</b><br/>Production, Staging, Dev"]
        WQ["📦 <b>Task Queue</b><br/>Task distribution"]
        W["🔧 <b>Workers</b><br/>Execution engines"]

        O -->|"1:N"| E
        E -->|"1:N"| WQ
        WQ -->|"1:N"| W
    end

    subgraph example[" 💼 Example: Acme Corp "]
        direction TB
        org["<b>Acme Corp</b><br/>org-uuid-1234"]

        subgraph envs[" Environments "]
            prod["<b>Production</b><br/>High availability"]
            stage["<b>Staging</b><br/>Pre-prod testing"]
        end

        subgraph queues[" Task Queues "]
            q1["<b>fast-workers</b><br/>General purpose<br/>Queue: uuid-abc"]
            q2["<b>gpu-workers</b><br/>ML workloads<br/>Queue: uuid-def"]
        end

        subgraph workers[" Active Workers "]
            w1["<b>worker-1</b><br/>Status: active<br/>Tasks: 142"]
            w2["<b>worker-2</b><br/>Status: idle<br/>Tasks: 89"]
            w3["<b>worker-3</b><br/>Status: busy<br/>Tasks: 201"]
        end

        org --> prod & stage
        prod --> q1 & q2
        q1 --> w1 & w2 & w3
    end

    classDef orgStyle fill:#8b5cf6,stroke:#6d28d9,stroke-width:2px,color:#fff
    classDef envStyle fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    classDef queueStyle fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff
    classDef workerStyle fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff

    class O,org orgStyle
    class E,prod,stage envStyle
    class WQ,q1,q2 queueStyle
    class W,w1,w2,w3 workerStyle
```

**Architecture Rationale:**

The four-level hierarchy (Organization → Environment → Task Queue → Workers) solves several key challenges:

1. **Tenant Isolation:** Organization-level separation ensures complete data and execution isolation
2. **Environment Segregation:** Separate queues for production, staging, and development prevent cross-contamination
3. **Capacity Management:** Task queues provide named capacity pools that can be scaled independently
4. **Flexible Routing:** Multiple routing strategies support different use cases:
   * **AUTO:** Temporal load-balances across available workers (default)
   * **SPECIFIC\_QUEUE:** Direct targeting for specialized hardware (GPU, high-memory)
   * **ENVIRONMENT:** Restrict execution to specific deployment contexts

**Queue Naming Convention:**
Each Temporal task queue is named `{organization_id}.{queue_uuid}`, ensuring global uniqueness while maintaining logical hierarchy. This naming scheme enables:

* Fast organization-level filtering and metrics
* Queue-level access control policies
* Clear audit trails in execution logs

**Load Balancing:**
Temporal's built-in task distribution ensures work is fairly distributed across all workers polling a queue, with automatic retries and dead-letter handling for failed tasks.

***

## Execution Flow

End-to-end orchestration from user request through task completion, with support for long-running workflows and human-in-the-loop interactions.

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart LR
    subgraph submission[" 📤 Submission "]
        REQ["User Request<br/>Agent/Team/Job"]
    end

    subgraph orchestration[" ⚙️ Orchestration "]
        direction TB
        CP["Control Plane<br/>Create Execution"]
        T["Temporal<br/>Route to Queue"]
    end

    subgraph execution[" 🚀 Execution "]
        direction TB
        W["Worker<br/>Poll Task"]
        RT["Runtime<br/>Execute"]
        LLM["LLM Provider<br/>AI Inference"]
    end

    subgraph completion[" ✅ Results "]
        direction TB
        S["Success<br/>Final Output"]
        H["Waiting<br/>User Input"]
        F["Failed<br/>Error Details"]
    end

    REQ --> CP
    CP --> T
    T --> W
    W --> RT
    RT --> LLM
    RT --> S
    RT --> H
    RT --> F
    H -.Resume.-> W

    classDef subStyle fill:#3b82f6,stroke:#1e40af
    classDef orchStyle fill:#8b5cf6,stroke:#6d28d9
    classDef execStyle fill:#f59e0b,stroke:#d97706
    classDef resultStyle fill:#10b981,stroke:#047857

    class REQ subStyle
    class CP,T orchStyle
    class W,RT,LLM execStyle
    class S,H,F resultStyle
```

**Workflow Types:**

* **Agent Execution:** Single-agent task processing with tool invocation and multi-turn conversations
* **Team Execution:** Multi-agent coordination with inter-agent communication and shared context
* **Scheduled Jobs:** Cron-based recurring workflows for automation and monitoring

**Key Architectural Features:**

**Durable Execution:**
Temporal's workflow engine ensures executions survive process restarts, network failures, and infrastructure changes. Workflow state is persisted and can be resumed exactly where it left off.

**Real-Time Observability:**

* **WebSocket Streaming:** Live execution logs and status updates
* **Query Endpoints:** Synchronous status checks without interrupting execution
* **Signal Handling:** External events can influence running workflows (user input, cancellation)

**Human-in-the-Loop (HITL):**
Workflows can pause for human approval or input, maintaining full state while waiting. When input arrives via signal, execution resumes seamlessly with complete conversation context.

**State Decisions:**
After each agent turn, an AI-powered state analyzer determines whether the task is complete, requires user input, or should continue execution. This enables natural multi-turn interactions without hard-coded state machines.

***

## Temporal Workflow Architecture

```mermaid theme={null}
stateDiagram-v2
    [*] --> Pending
    Pending --> Running: Worker picks up
    Running --> Executing: Execute runtime
    Executing --> Analyzing: Submit turn analytics
    Analyzing --> Decision: AI state decision

    Decision --> Completed: Task done
    Decision --> WaitingForInput: Needs user input
    Decision --> Running: Continue execution
    Decision --> Failed: Error occurred

    WaitingForInput --> Running: User sends message (signal)

    Completed --> [*]
    Failed --> [*]
```

**Architecture Decision: Why Temporal?**

Temporal addresses several critical challenges in distributed agent orchestration:

1. **Durability:** Workflow state persists across failures, restarts, and infrastructure changes
2. **Deterministic Replay:** Workflows can be reconstructed and replayed for debugging or recovery
3. **Built-in Retry Logic:** Configurable retry policies with exponential backoff for transient failures
4. **Long-Running Workflows:** Support for workflows that run hours or days (HITL, scheduled jobs)
5. **Distributed Task Queues:** Native load balancing across heterogeneous worker pools
6. **Versioning:** Safe deployment of workflow changes without disrupting in-flight executions

**Workflow Structure:**

Each workflow type implements a distinct orchestration pattern:

* **Agent Workflows:** Linear execution with dynamic tool invocation and state transitions
* **Team Workflows:** Parallel and sequential agent coordination with shared context
* **Scheduled Workflows:** Wrapper workflows that handle cron triggers and error handling

**Activity Pattern:**

Workflows delegate actual work to activities (atomic units of execution):

* **LLM Inference:** Calls to model providers through LiteLLM gateway
* **Database Operations:** Session storage, execution status updates
* **Analytics:** Token usage, cost tracking, turn metrics
* **External Integration:** Calls to Context Graph, policy enforcer, storage services

Activities can be retried independently of the workflow, enabling fine-grained error recovery without restarting entire executions.

**Execution Guarantees:**

* **At-least-once execution** for all activities (idempotency required)
* **Exactly-once workflow decisions** (deterministic replay)
* **Strong consistency** for workflow state

***

## LLM Gateway

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart TB
    subgraph control[" 🎯 Control Plane "]
        CP["<b>API</b><br/>Model Registry"]
    end

    subgraph gateway[" 🧠 LiteLLM Gateway "]
        direction TB
        LR["<b>Router</b><br/>Unified OpenAI API"]
        MC[("<b>Cache</b><br/>Response Caching")]
        LR <--> MC
    end

    subgraph providers[" ☁️ LLM Providers "]
        direction LR
        O["<b>OpenAI</b><br/>GPT-4, GPT-4o"]
        A["<b>Anthropic</b><br/>Claude Sonnet/Opus"]
        G["<b>Google</b><br/>Gemini"]
        M["<b>Mistral</b><br/>Mistral Large"]
        X["<b>Custom</b><br/>Self-hosted Models"]
    end

    subgraph workers[" 🔧 Workers "]
        W["<b>Runtime</b><br/>Agno/Claude Code"]
    end

    CP -->|"Configure Models"| LR
    W -->|"Unified Interface"| LR
    LR -->|"Route by Model"| O & A & G & M & X

    Note1["💡 <b>Automatic Fallback</b><br/>Default model on error"]
    LR -.->|"If unavailable"| Note1

    classDef controlStyle fill:#8b5cf6,stroke:#6d28d9,stroke-width:2px,color:#fff
    classDef gatewayStyle fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff
    classDef providerStyle fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff
    classDef workerStyle fill:#06b6d4,stroke:#0891b2,stroke-width:2px,color:#fff

    class CP controlStyle
    class LR,MC gatewayStyle
    class O,A,G,M,X providerStyle
    class W workerStyle
```

**Architecture Decision: Why LiteLLM?**

LiteLLM solves the challenge of managing multiple LLM providers with incompatible APIs:

1. **Unified Interface:** All providers exposed through OpenAI-compatible API
2. **Zero Code Changes:** Switch providers by changing model identifier, no code changes
3. **Automatic Retries:** Built-in retry logic with exponential backoff for rate limits
4. **Response Caching:** Reduce costs and latency for repeated queries
5. **Cost Tracking:** Unified token counting and cost calculation across providers
6. **Fallback Chains:** Automatic failover to backup models on provider outages

**Provider Support:**

* **Commercial:** OpenAI, Anthropic, Google, Microsoft Azure OpenAI, Mistral, Cohere
* **Open Source:** Replicate, Together AI, Anyscale, Perplexity
* **Self-Hosted:** vLLM, Ollama, LM Studio, custom OpenAI-compatible endpoints

**Routing Strategy:**
Model identifiers follow the pattern `{provider}/{model-name}`:

* `openai/gpt-4o` → OpenAI GPT-4o
* `anthropic/claude-sonnet-4` → Anthropic Claude Sonnet
* `gemini/gemini-2.0-flash` → Google Gemini

**Caching Behavior:**
Responses are cached based on model, messages, and parameters. Cache hits eliminate model provider calls entirely, reducing both cost and latency. Cache invalidation is automatic based on configured TTL.

**Streaming Support:**
Both synchronous (full response) and streaming (token-by-token) modes supported, with unified error handling and timeout management across all providers.

***

## Context Graph Integration

The Context Graph provides persistent organizational knowledge that enhances agent intelligence across all executions.

**Architectural Role:**

The Context Graph serves as a central knowledge repository that agents query to:

* **Historical Context:** Learn from past executions, decisions, and outcomes
* **Organizational Knowledge:** Access company-specific information, terminology, and processes
* **Cross-Agent Learning:** Benefit from insights gathered by other agents and teams
* **Entity Relationships:** Understand connections between projects, resources, and team members

**Integration Pattern:**

The Control Plane acts as an HTTP proxy to the Context Graph API, providing:

* **Authenticated Access:** All queries inherit organization context from the authenticated session
* **Query Translation:** Natural language questions translated to structured graph queries
* **Result Enrichment:** Graph results augmented with execution-time context
* **Caching:** Frequently accessed knowledge cached in Redis for performance

**Use Cases:**

* **Onboarding:** New agents immediately benefit from accumulated organizational knowledge
* **Consistency:** All agents reference the same authoritative sources
* **Evolution:** Knowledge base grows automatically from agent interactions and human feedback
* **Retrieval-Augmented Generation (RAG):** Context-aware responses based on organizational data

This integration enables agents to operate with institutional knowledge rather than relying solely on their foundation model's training data.

***

## Cognitive Memory

Persistent session storage enables agents to maintain context across multi-turn conversations and long-running workflows.

**Memory Architecture:**

Cognitive memory is stored in PostgreSQL using a structured schema that captures:

* **Conversation History:** Complete message sequences with timestamps and metadata
* **Agent State:** Current context variables, active tools, and execution phase
* **Session Metadata:** User identifiers, environment context, policy constraints
* **Turn Analytics:** Token usage, latency, and cost per interaction

**Persistence Strategy:**

Unlike stateless API interactions, agent executions maintain durable sessions that:

* **Survive Restarts:** Sessions persist across worker failures and deployments
* **Enable Resumption:** Long-running tasks can pause and resume without losing context
* **Support HITL:** Human-in-the-loop workflows maintain full conversation state while waiting
* **Facilitate Debugging:** Complete execution history available for post-mortem analysis

**Multi-Turn Conversation Support:**

Memory enables natural conversational interactions where agents:

* Reference earlier parts of the conversation
* Build incrementally on previous decisions
* Clarify ambiguous requests through back-and-forth dialogue
* Maintain task context across multiple human inputs

**Performance Optimization:**

Recent conversation history is cached in Redis for fast access during active executions, with full history retrieved from PostgreSQL only when needed for longer context windows or historical analysis.

***

## Storage System

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart LR
    subgraph api[" 🎯 Control Plane "]
        CP["<b>API</b><br/>File Management"]
    end

    subgraph service[" 📦 Storage Service "]
        SS["<b>Router</b><br/>Multi-cloud"]
        Q["<b>Quota Enforcer</b><br/>Per-org limits"]
        SS --> Q
    end

    subgraph providers[" ☁️ Cloud Providers "]
        S3["<b>AWS S3</b><br/>Object storage"]
        GCS["<b>Google Cloud</b><br/>Cloud Storage"]
        AZ["<b>Azure</b><br/>Blob Storage"]
    end

    subgraph data[" 💾 Database "]
        SF[("<b>File Metadata</b><br/>Tracking")]
        SU[("<b>Usage Quotas</b><br/>Per-Org")]
    end

    CP -->|"Upload/Download"| SS
    SS -->|"Route by config"| S3 & GCS & AZ
    SS <-->|"Track usage"| SF & SU

    classDef apiStyle fill:#8b5cf6,stroke:#6d28d9,stroke-width:2px,color:#fff
    classDef serviceStyle fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff
    classDef providerStyle fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff
    classDef dataStyle fill:#0891b2,stroke:#0e7490,stroke-width:2px,color:#fff

    class CP apiStyle
    class SS,Q serviceStyle
    class S3,GCS,AZ providerStyle
    class SF,SU dataStyle
```

**Architecture Decision: Multi-Cloud Strategy**

Supporting multiple cloud providers solves several challenges:

1. **Deployment Flexibility:** Customers can use their existing cloud infrastructure
2. **Compliance Requirements:** Data residency regulations may dictate storage location
3. **Cost Optimization:** Organizations can leverage existing cloud commitments
4. **Vendor Independence:** Avoid lock-in to single cloud provider
5. **Hybrid Deployments:** Self-hosted control planes can use on-premises storage

**Supported Providers:**

* **AWS S3:** Industry-standard object storage with broad regional availability
* **Google Cloud Storage:** Deep integration with GCP services
* **Azure Blob Storage:** Native support for Azure-based deployments
* **S3-Compatible:** MinIO, Wasabi, Backblaze B2, or any S3-compatible service

**Storage Features:**

**Quota Management:**
Per-organization storage limits with soft warnings and hard enforcement. Quota tracking includes both file count and total bytes, with visibility in analytics dashboards.

**Metadata Tracking:**
File metadata stored in PostgreSQL enables:

* Fast searches without scanning object storage
* Tag-based filtering and organization
* Relationship tracking (which executions used which files)
* Versioning and lifecycle management

**Soft Delete:**
Deleted files are marked inactive but retained for configurable retention period, enabling recovery from accidental deletions and supporting audit requirements.

**Access Control:**
All storage operations respect organization-level isolation and OPA policies. Agents can only access files within their organization scope, with optional policy-based restrictions on file types or sizes.

**API Design:**
RESTful endpoints for standard CRUD operations, with support for:

* Presigned URLs for direct uploads (bypassing Control Plane for large files)
* Batch operations for efficiency
* Streaming downloads for large files
* Tag-based search and filtering

***

## Technology Stack & Dependencies

The Control Plane's architecture is built on proven, production-grade technologies chosen for specific technical requirements.

```mermaid theme={null}
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'14px'}}}%%
flowchart TB
    subgraph core[" 🎯 Control Plane Core "]
        CP["<b>API</b><br/>REST & WebSocket"]
    end

    subgraph infra[" 🏗️ Infrastructure "]
        PG[("<b>PostgreSQL</b><br/>Multi-tenant RLS")]
        RD[("<b>Redis</b><br/>Cache & Pub/Sub")]
        T["<b>Temporal</b><br/>Workflows"]
    end

    subgraph external[" 🌐 External Services "]
        K["<b>Kubiya API</b><br/>Auth"]
        OPA["<b>OPA Enforcer</b><br/>Policies"]
        CG["<b>Context Graph</b><br/>Knowledge"]
        LLM["<b>LiteLLM</b><br/>Models"]
        CS["<b>Cloud Storage</b><br/>S3/GCS/Azure"]
    end

    subgraph execution[" 🚀 Execution "]
        W["<b>Workers</b><br/>Task Executors"]
        RT["<b>Runtimes</b><br/>Agno/Claude"]
    end

    CP <-->|"CRUD"| PG
    CP <-->|"Cache"| RD
    CP -->|"Submit"| T
    CP <-->|"Validate"| K
    CP <-->|"Enforce"| OPA
    CP <-->|"Retrieve"| CG
    CP <-->|"Route"| LLM
    CP <-->|"Files"| CS

    W -->|"Poll"| T
    W -->|"Execute"| RT
    RT -->|"LLM"| LLM
    RT <-->|"Sessions"| PG
    W -->|"Heartbeat"| RD

    classDef coreStyle fill:#8b5cf6,stroke:#6d28d9,stroke-width:3px,color:#fff
    classDef infraStyle fill:#10b981,stroke:#047857,stroke-width:2px,color:#fff
    classDef extStyle fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff
    classDef execStyle fill:#06b6d4,stroke:#0891b2,stroke-width:2px,color:#fff

    class CP coreStyle
    class PG,RD,T infraStyle
    class K,OPA,CG,LLM,CS extStyle
    class W,RT execStyle
```

### Core Infrastructure

**PostgreSQL** - Multi-tenant relational database

* **Why chosen:** Native row-level security (RLS) for tenant isolation, ACID guarantees, rich query capabilities
* **Role:** Primary data store for agents, teams, executions, policies, and all configuration
* **Key features:** JSON columns for flexible schemas, full-text search, complex queries with joins

**Redis** - In-memory data store

* **Why chosen:** Microsecond latency for caching, pub/sub for real-time updates, simple data structures
* **Role:** Authentication token cache, worker heartbeats, WebSocket session state
* **Key features:** TTL-based expiration, atomic operations, pub/sub messaging

**Temporal** - Workflow orchestration engine

* **Why chosen:** Durable execution, built-in retry logic, distributed task queues, workflow versioning
* **Role:** Orchestrate all agent and team executions, manage long-running workflows, ensure reliable task distribution
* **Key features:** Deterministic replay, signal handling, query support, horizontal scalability

**LiteLLM** - LLM gateway and router

* **Why chosen:** Unified API across providers, automatic retries, cost tracking, response caching
* **Role:** Abstract model provider differences, enable multi-model support, manage API keys and quotas
* **Key features:** OpenAI-compatible interface, fallback chains, streaming support

### External Services

**Kubiya API** - Authentication service

* **Why integrated:** Centralized user management, SSO integration, organization provisioning
* **Role:** Validate JWT tokens, provide organization context, manage API keys
* **Integration pattern:** External validation with Redis caching for performance

**OPA (Open Policy Agent)** - Policy enforcement

* **Why chosen:** Declarative policy language, decoupled from application code, industry standard for cloud-native authorization
* **Role:** Enforce access control, usage limits, and governance policies
* **Integration pattern:** Synchronous policy evaluation before and during execution

**Context Graph** - Knowledge API

* **Why integrated:** Organizational memory, cross-agent learning, RAG support
* **Role:** Provide historical context and domain knowledge to enhance agent intelligence
* **Integration pattern:** On-demand queries with caching

**Cloud Storage Providers** - Object storage (S3/GCS/Azure)

* **Why multi-cloud:** Deployment flexibility, compliance requirements, cost optimization
* **Role:** Store agent artifacts, execution outputs, uploaded files
* **Integration pattern:** Provider-agnostic abstraction with metadata in PostgreSQL

### Architectural Principles

1. **Separation of Concerns:** Each technology addresses specific requirements (data persistence vs. caching vs. orchestration)
2. **Best-of-Breed:** Choose proven technologies rather than monolithic solutions
3. **Horizontal Scalability:** All components can scale independently
4. **Cloud-Native:** Designed for containerized, distributed deployments
5. **Operational Maturity:** Established technologies with strong community support and tooling

***

## Data Model Overview

The Control Plane's data model supports multi-tenancy, hierarchical resource organization, and complete execution traceability.

**Entity Hierarchy:**

```
Organization (Root Tenant)
├── Environments
│   ├── Task Queues
│   │   └── Workers
│   ├── Agents
│   └── Teams
├── Projects
│   ├── Knowledge Items
│   └── Resources
├── Policies
├── Secrets
└── Integration Credentials
```

**Multi-Tenancy Design:**

All data is organizationally scoped with database-level enforcement:

* **Row-Level Security (RLS):** PostgreSQL policies automatically filter queries by organization
* **Isolation Guarantees:** No query can access data from another organization
* **Shared Infrastructure:** All organizations use same database instance with logical separation

**Execution Tracking:**

Complete audit trail from task submission through completion:

* **Execution Records:** Status, timestamps, initiator, target agent/team
* **Session History:** Full conversation logs with token usage and costs
* **Analytics Data:** Aggregated metrics for dashboards and reporting
* **Relationship Tracking:** Links between executions, jobs, and schedules

**Schema Evolution:**

Database migrations managed through automated tooling:

* **Version Control:** All schema changes tracked in migration history
* **Zero-Downtime Deployments:** Migrations designed for rolling updates
* **Rollback Support:** Backward-compatible changes enable safe rollbacks

***

## Monitoring & Analytics

Comprehensive observability enables operational insight, troubleshooting, and cost optimization.

**Structured Logging:**

JSON-formatted logs with:

* **Correlation IDs:** Track requests across distributed services
* **Contextual Fields:** Organization ID, execution ID, user ID automatically included
* **Log Levels:** Debug, info, warning, error for filtering
* **Structured Data:** Machine-readable format for log aggregation and analysis

**Execution Analytics:**

Real-time and historical metrics for:

* **Token Usage:** Prompt tokens, completion tokens, total tokens per execution
* **Cost Tracking:** Per-execution costs based on model pricing
* **Latency Metrics:** Time spent in different workflow phases
* **State Transitions:** Track how executions move through pending, running, waiting, completed, failed states

**Operational Health:**

System health endpoints provide:

* **Service Status:** Component-level health checks (database, Redis, Temporal connectivity)
* **Worker Presence:** Real-time view of active workers per queue
* **Queue Depth:** Pending task counts for capacity planning
* **Error Rates:** Failed execution percentages and error categorization

**Analytics Architecture:**

**Real-Time Layer:**

* Turn-by-turn analytics captured during execution
* Immediate visibility into token usage and costs
* WebSocket streaming for live dashboard updates

**Historical Layer:**

* Aggregated metrics in PostgreSQL for dashboards
* Time-series analysis for trend identification
* Exportable data for external BI tools

**Integration Points:**

* **Prometheus:** Metrics export for alerting and monitoring
* **OpenTelemetry:** Distributed tracing for request path analysis
* **Custom Webhooks:** Real-time event notifications for external systems

This multi-layer observability approach balances real-time operational needs with long-term analytical requirements.

***

## Architectural Qualities

The Control Plane is designed to meet enterprise requirements for security, reliability, and scale.

### Security & Governance

**Multi-Tenancy:** Database-level tenant isolation with row-level security policies
**Authentication:** JWT and API key support with flexible provider integration
**Authorization:** Declarative OPA policies for fine-grained access control
**Audit Trail:** Complete execution history with immutable logs
**Secret Management:** Secure storage and injection of credentials
**Network Isolation:** Support for private networking in self-hosted deployments

### Reliability & Resilience

**Durable Execution:** Temporal workflows survive failures and infrastructure changes
**Automatic Retries:** Configurable retry policies with exponential backoff
**Graceful Degradation:** System remains operational even if external services are unavailable
**Health Monitoring:** Continuous worker health checks with automatic routing around failures
**State Persistence:** All critical state stored in durable data stores (PostgreSQL, Temporal)
**Idempotent Operations:** Safe to retry any operation without unintended side effects

### Scalability & Performance

**Horizontal Scaling:** All components scale independently (API servers, workers, databases)
**Distributed Task Queues:** Temporal provides natural work distribution across worker pools
**Caching Strategy:** Multi-tier caching (Redis) reduces database load and external API calls
**Asynchronous Operations:** Non-blocking I/O for high-concurrency workloads
**Connection Pooling:** Efficient database connection management
**Resource Isolation:** Individual executions don't impact each other

### Operational Excellence

**Structured Observability:** JSON logs, metrics, and traces for complete system visibility
**Zero-Downtime Deployments:** Rolling updates with backward-compatible changes
**Configuration Management:** Environment-based config with secret separation
**API-Driven:** All operations available via REST API for automation
**Self-Service:** Teams can provision and manage resources without operator intervention
**Cost Transparency:** Detailed usage tracking and cost attribution per organization

### Integration & Extensibility

**Multi-Cloud Support:** Run on any Kubernetes cluster or serverless platform
**Storage Flexibility:** Support for AWS S3, GCP, Azure, or S3-compatible storage
**Model Agnostic:** Support for any LLM provider through LiteLLM gateway
**MCP Protocol:** Standardized integration with external tools and services
**Webhook Support:** Real-time event notifications for external systems
**API-First Design:** Full platform capabilities accessible programmatically

***

## Next Steps

Explore related documentation to understand how components work together:

<CardGroup cols={2}>
  <Card title="Control Plane Overview" icon="sitemap" href="/core-concepts/control-plane/overview">
    Deployment options and operational concepts
  </Card>

  <Card title="Developers Guide" icon="code" href="/core-concepts/control-plane/developers-guide">
    Local development setup and API reference
  </Card>

  <Card title="Agents" icon="robot" href="/core-concepts/agents">
    Create and configure AI agents
  </Card>

  <Card title="Teams" icon="users" href="/core-concepts/teams">
    Coordinate multi-agent teams
  </Card>

  <Card title="Runtimes" icon="code" href="/core-concepts/runtimes/overview">
    Choose execution runtimes
  </Card>

  <Card title="Environments" icon="server" href="/core-concepts/environments">
    Configure execution environments
  </Card>
</CardGroup>
