Skip to main content
Central orchestration platform for AI agent and team execution. Manages workflows, worker coordination, policy enforcement, and integrations. The Control Plane serves as the central nervous system for Kubiya’s distributed AI execution platform, addressing the challenges of coordinating multi-tenant agent workloads, enforcing security policies, and maintaining consistent state across heterogeneous execution environments. It provides a unified orchestration layer that abstracts the complexity of workflow management, resource allocation, and integration coordination.

Deployment Options

Choose between fully managed SaaS or self-hosted deployment based on your requirements.

Kubiya Hosted (SaaS)

Fully managed infrastructure with zero operational overhead. All components, scaling, security, and maintenance handled by Kubiya. Benefits: Zero ops • Auto-scaling • Enterprise security • 99.9% uptime SLA • Continuous updates

Self-Hosted

Stack: Docker Compose (local) • Kubernetes/Helm (production) • Temporal Server • PostgreSQL • Redis Benefits: Full control • On-premises • Custom security • Air-gapped support

High-Level Architecture

Distributed, multi-tenant architecture for AI agent orchestration. Core Components:
  • API Layer: FastAPI-based REST and WebSocket interface for multi-tenant access
  • Orchestration: Temporal workflows for reliable, distributed task execution
  • Data Persistence: PostgreSQL with row-level security for tenant isolation
  • Caching & State: Redis for performance optimization and real-time communication
  • Security: JWT/API key authentication with OPA-based policy enforcement
  • LLM Gateway: LiteLLM for unified access to multiple AI model providers
  • Knowledge Layer: Context Graph for organizational memory and cross-agent learning

Control Plane API

The Control Plane API provides a comprehensive REST and WebSocket interface for managing the entire lifecycle of AI agents, teams, and workflows. Architectural Approach:
  • Multi-tenant by design: All data access enforced at the database level through row-level security
  • Async-first architecture: Built on FastAPI with async/await for high-concurrency workloads
  • Strongly typed contracts: Pydantic models ensure request/response validation
  • Real-time communication: WebSocket endpoints for streaming execution logs and live status updates
  • Structured observability: Correlation IDs and structured logging for distributed tracing
API Categories:
  • Agent & Team Management: Create, configure, and orchestrate AI agents and multi-agent teams
  • Execution Control: Submit tasks, monitor progress, stream outputs, and handle approvals
  • Resource Management: Task queue registration, environment configuration, model registry
  • Governance: Policy definition, secret management, integration credentials
  • Analytics & Observability: Usage metrics, cost tracking, execution history
The API serves as the primary integration point for CI/CD pipelines, ChatOps interfaces, and custom automation workflows.

Authentication & Authorization

Multi-tenant authentication with organization-level isolation ensures secure access across all Control Plane operations. Authentication Methods:
  • JWT Bearer Tokens: User-scoped authentication with role-based access control
  • API Keys: Service-to-service authentication for automation and integrations
Security Architecture:
  • Token Caching: Redis-backed cache layer minimizes external validation latency
  • Organization Scoping: Every request automatically scoped to authenticated organization
  • Database-Level Isolation: PostgreSQL row-level security (RLS) enforces tenant boundaries
  • Role-Based Access Control (RBAC): Fine-grained permissions for resource access
  • Automatic Token Rotation: Built-in support for credential lifecycle management
All authenticated requests establish an organization context that flows through the entire request lifecycle, ensuring consistent security enforcement from API entry through database access.

Policy Enforcement

OPA-based access control and governance. Policy Types: Tool usage control • Resource access restrictions • Execution constraints (limits/quotas) • Approval workflows Architecture Decision: Why OPA?
  • Decoupled policy logic: Policies defined separately from application code enable non-engineers to manage governance
  • Declarative approach: Rego policy language allows expressing complex rules as data
  • Centralized enforcement: Single policy engine across all agents, teams, and workflows
  • Audit transparency: Policy decisions are logged with full context for compliance reporting
  • Dynamic evaluation: Policies evaluated at execution time based on current state and context
Policy Scope:
  • Tool Usage Control: Restrict which Skills and MCP servers agents can invoke
  • Resource Access: Limit file paths, network destinations, or cloud resources
  • Execution Constraints: Enforce quotas on compute time, token usage, or cost
  • Approval Workflows: Require human-in-the-loop approval for sensitive operations
Enforcement Points:
  • Pre-execution: Validate permissions before workflow submission
  • During execution: Monitor and enforce limits in real-time
  • Post-execution: Generate audit logs for compliance and security review
Policies can be attached at multiple levels (organization, environment, team, agent) with inheritance and override semantics that balance flexibility with security.

Worker Registration & Lifecycle

Workers register with the Control Plane to receive and execute tasks, establishing a durable connection to the distributed orchestration system. Registration Flow:
  1. Initial Connection: Worker authenticates with API key and queue identifier
  2. Configuration Receipt: Control Plane provides connection credentials for Temporal and LiteLLM
  3. Queue Binding: Worker connects to organization-specific Temporal task queue
  4. Polling Activation: Worker begins polling for available workflow tasks
Health Monitoring Strategy:
  • Lightweight Heartbeats: Frequent status updates with minimal overhead (task count, current state)
  • Full Heartbeats: Periodic detailed reports including system metrics and execution logs
  • Redis-Based State: Fast lookups for worker status without database queries
  • Automatic Staleness Detection: Workers that stop reporting are marked inactive
This dual-level heartbeat approach balances real-time visibility with system efficiency, enabling the Control Plane to route tasks only to healthy workers while minimizing network and storage overhead.

Task Queue Architecture

Hierarchical task routing and worker management.
For a beginner-friendly introduction to workers and task queues, see Workers Overview and Task Queues.
Architecture Rationale: The four-level hierarchy (Organization → Environment → Task Queue → Workers) solves several key challenges:
  1. Tenant Isolation: Organization-level separation ensures complete data and execution isolation
  2. Environment Segregation: Separate queues for production, staging, and development prevent cross-contamination
  3. Capacity Management: Task queues provide named capacity pools that can be scaled independently
  4. Flexible Routing: Multiple routing strategies support different use cases:
    • AUTO: Temporal load-balances across available workers (default)
    • SPECIFIC_QUEUE: Direct targeting for specialized hardware (GPU, high-memory)
    • ENVIRONMENT: Restrict execution to specific deployment contexts
Queue Naming Convention: Each Temporal task queue is named {organization_id}.{queue_uuid}, ensuring global uniqueness while maintaining logical hierarchy. This naming scheme enables:
  • Fast organization-level filtering and metrics
  • Queue-level access control policies
  • Clear audit trails in execution logs
Load Balancing: Temporal’s built-in task distribution ensures work is fairly distributed across all workers polling a queue, with automatic retries and dead-letter handling for failed tasks.

Execution Flow

End-to-end orchestration from user request through task completion, with support for long-running workflows and human-in-the-loop interactions. Workflow Types:
  • Agent Execution: Single-agent task processing with tool invocation and multi-turn conversations
  • Team Execution: Multi-agent coordination with inter-agent communication and shared context
  • Scheduled Jobs: Cron-based recurring workflows for automation and monitoring
Key Architectural Features: Durable Execution: Temporal’s workflow engine ensures executions survive process restarts, network failures, and infrastructure changes. Workflow state is persisted and can be resumed exactly where it left off. Real-Time Observability:
  • WebSocket Streaming: Live execution logs and status updates
  • Query Endpoints: Synchronous status checks without interrupting execution
  • Signal Handling: External events can influence running workflows (user input, cancellation)
Human-in-the-Loop (HITL): Workflows can pause for human approval or input, maintaining full state while waiting. When input arrives via signal, execution resumes seamlessly with complete conversation context. State Decisions: After each agent turn, an AI-powered state analyzer determines whether the task is complete, requires user input, or should continue execution. This enables natural multi-turn interactions without hard-coded state machines.

Temporal Workflow Architecture

Architecture Decision: Why Temporal? Temporal addresses several critical challenges in distributed agent orchestration:
  1. Durability: Workflow state persists across failures, restarts, and infrastructure changes
  2. Deterministic Replay: Workflows can be reconstructed and replayed for debugging or recovery
  3. Built-in Retry Logic: Configurable retry policies with exponential backoff for transient failures
  4. Long-Running Workflows: Support for workflows that run hours or days (HITL, scheduled jobs)
  5. Distributed Task Queues: Native load balancing across heterogeneous worker pools
  6. Versioning: Safe deployment of workflow changes without disrupting in-flight executions
Workflow Structure: Each workflow type implements a distinct orchestration pattern:
  • Agent Workflows: Linear execution with dynamic tool invocation and state transitions
  • Team Workflows: Parallel and sequential agent coordination with shared context
  • Scheduled Workflows: Wrapper workflows that handle cron triggers and error handling
Activity Pattern: Workflows delegate actual work to activities (atomic units of execution):
  • LLM Inference: Calls to model providers through LiteLLM gateway
  • Database Operations: Session storage, execution status updates
  • Analytics: Token usage, cost tracking, turn metrics
  • External Integration: Calls to Context Graph, policy enforcer, storage services
Activities can be retried independently of the workflow, enabling fine-grained error recovery without restarting entire executions. Execution Guarantees:
  • At-least-once execution for all activities (idempotency required)
  • Exactly-once workflow decisions (deterministic replay)
  • Strong consistency for workflow state

LLM Gateway

Architecture Decision: Why LiteLLM? LiteLLM solves the challenge of managing multiple LLM providers with incompatible APIs:
  1. Unified Interface: All providers exposed through OpenAI-compatible API
  2. Zero Code Changes: Switch providers by changing model identifier, no code changes
  3. Automatic Retries: Built-in retry logic with exponential backoff for rate limits
  4. Response Caching: Reduce costs and latency for repeated queries
  5. Cost Tracking: Unified token counting and cost calculation across providers
  6. Fallback Chains: Automatic failover to backup models on provider outages
Provider Support:
  • Commercial: OpenAI, Anthropic, Google, Microsoft Azure OpenAI, Mistral, Cohere
  • Open Source: Replicate, Together AI, Anyscale, Perplexity
  • Self-Hosted: vLLM, Ollama, LM Studio, custom OpenAI-compatible endpoints
Routing Strategy: Model identifiers follow the pattern {provider}/{model-name}:
  • openai/gpt-4o → OpenAI GPT-4o
  • anthropic/claude-sonnet-4 → Anthropic Claude Sonnet
  • gemini/gemini-2.0-flash → Google Gemini
Caching Behavior: Responses are cached based on model, messages, and parameters. Cache hits eliminate model provider calls entirely, reducing both cost and latency. Cache invalidation is automatic based on configured TTL. Streaming Support: Both synchronous (full response) and streaming (token-by-token) modes supported, with unified error handling and timeout management across all providers.

Context Graph Integration

The Context Graph provides persistent organizational knowledge that enhances agent intelligence across all executions. Architectural Role: The Context Graph serves as a central knowledge repository that agents query to:
  • Historical Context: Learn from past executions, decisions, and outcomes
  • Organizational Knowledge: Access company-specific information, terminology, and processes
  • Cross-Agent Learning: Benefit from insights gathered by other agents and teams
  • Entity Relationships: Understand connections between projects, resources, and team members
Integration Pattern: The Control Plane acts as an HTTP proxy to the Context Graph API, providing:
  • Authenticated Access: All queries inherit organization context from the authenticated session
  • Query Translation: Natural language questions translated to structured graph queries
  • Result Enrichment: Graph results augmented with execution-time context
  • Caching: Frequently accessed knowledge cached in Redis for performance
Use Cases:
  • Onboarding: New agents immediately benefit from accumulated organizational knowledge
  • Consistency: All agents reference the same authoritative sources
  • Evolution: Knowledge base grows automatically from agent interactions and human feedback
  • Retrieval-Augmented Generation (RAG): Context-aware responses based on organizational data
This integration enables agents to operate with institutional knowledge rather than relying solely on their foundation model’s training data.

Cognitive Memory

Persistent session storage enables agents to maintain context across multi-turn conversations and long-running workflows. Memory Architecture: Cognitive memory is stored in PostgreSQL using a structured schema that captures:
  • Conversation History: Complete message sequences with timestamps and metadata
  • Agent State: Current context variables, active tools, and execution phase
  • Session Metadata: User identifiers, environment context, policy constraints
  • Turn Analytics: Token usage, latency, and cost per interaction
Persistence Strategy: Unlike stateless API interactions, agent executions maintain durable sessions that:
  • Survive Restarts: Sessions persist across worker failures and deployments
  • Enable Resumption: Long-running tasks can pause and resume without losing context
  • Support HITL: Human-in-the-loop workflows maintain full conversation state while waiting
  • Facilitate Debugging: Complete execution history available for post-mortem analysis
Multi-Turn Conversation Support: Memory enables natural conversational interactions where agents:
  • Reference earlier parts of the conversation
  • Build incrementally on previous decisions
  • Clarify ambiguous requests through back-and-forth dialogue
  • Maintain task context across multiple human inputs
Performance Optimization: Recent conversation history is cached in Redis for fast access during active executions, with full history retrieved from PostgreSQL only when needed for longer context windows or historical analysis.

Storage System

Architecture Decision: Multi-Cloud Strategy Supporting multiple cloud providers solves several challenges:
  1. Deployment Flexibility: Customers can use their existing cloud infrastructure
  2. Compliance Requirements: Data residency regulations may dictate storage location
  3. Cost Optimization: Organizations can leverage existing cloud commitments
  4. Vendor Independence: Avoid lock-in to single cloud provider
  5. Hybrid Deployments: Self-hosted control planes can use on-premises storage
Supported Providers:
  • AWS S3: Industry-standard object storage with broad regional availability
  • Google Cloud Storage: Deep integration with GCP services
  • Azure Blob Storage: Native support for Azure-based deployments
  • S3-Compatible: MinIO, Wasabi, Backblaze B2, or any S3-compatible service
Storage Features: Quota Management: Per-organization storage limits with soft warnings and hard enforcement. Quota tracking includes both file count and total bytes, with visibility in analytics dashboards. Metadata Tracking: File metadata stored in PostgreSQL enables:
  • Fast searches without scanning object storage
  • Tag-based filtering and organization
  • Relationship tracking (which executions used which files)
  • Versioning and lifecycle management
Soft Delete: Deleted files are marked inactive but retained for configurable retention period, enabling recovery from accidental deletions and supporting audit requirements. Access Control: All storage operations respect organization-level isolation and OPA policies. Agents can only access files within their organization scope, with optional policy-based restrictions on file types or sizes. API Design: RESTful endpoints for standard CRUD operations, with support for:
  • Presigned URLs for direct uploads (bypassing Control Plane for large files)
  • Batch operations for efficiency
  • Streaming downloads for large files
  • Tag-based search and filtering

Technology Stack & Dependencies

The Control Plane’s architecture is built on proven, production-grade technologies chosen for specific technical requirements.

Core Infrastructure

PostgreSQL - Multi-tenant relational database
  • Why chosen: Native row-level security (RLS) for tenant isolation, ACID guarantees, rich query capabilities
  • Role: Primary data store for agents, teams, executions, policies, and all configuration
  • Key features: JSON columns for flexible schemas, full-text search, complex queries with joins
Redis - In-memory data store
  • Why chosen: Microsecond latency for caching, pub/sub for real-time updates, simple data structures
  • Role: Authentication token cache, worker heartbeats, WebSocket session state
  • Key features: TTL-based expiration, atomic operations, pub/sub messaging
Temporal - Workflow orchestration engine
  • Why chosen: Durable execution, built-in retry logic, distributed task queues, workflow versioning
  • Role: Orchestrate all agent and team executions, manage long-running workflows, ensure reliable task distribution
  • Key features: Deterministic replay, signal handling, query support, horizontal scalability
LiteLLM - LLM gateway and router
  • Why chosen: Unified API across providers, automatic retries, cost tracking, response caching
  • Role: Abstract model provider differences, enable multi-model support, manage API keys and quotas
  • Key features: OpenAI-compatible interface, fallback chains, streaming support

External Services

Kubiya API - Authentication service
  • Why integrated: Centralized user management, SSO integration, organization provisioning
  • Role: Validate JWT tokens, provide organization context, manage API keys
  • Integration pattern: External validation with Redis caching for performance
OPA (Open Policy Agent) - Policy enforcement
  • Why chosen: Declarative policy language, decoupled from application code, industry standard for cloud-native authorization
  • Role: Enforce access control, usage limits, and governance policies
  • Integration pattern: Synchronous policy evaluation before and during execution
Context Graph - Knowledge API
  • Why integrated: Organizational memory, cross-agent learning, RAG support
  • Role: Provide historical context and domain knowledge to enhance agent intelligence
  • Integration pattern: On-demand queries with caching
Cloud Storage Providers - Object storage (S3/GCS/Azure)
  • Why multi-cloud: Deployment flexibility, compliance requirements, cost optimization
  • Role: Store agent artifacts, execution outputs, uploaded files
  • Integration pattern: Provider-agnostic abstraction with metadata in PostgreSQL

Architectural Principles

  1. Separation of Concerns: Each technology addresses specific requirements (data persistence vs. caching vs. orchestration)
  2. Best-of-Breed: Choose proven technologies rather than monolithic solutions
  3. Horizontal Scalability: All components can scale independently
  4. Cloud-Native: Designed for containerized, distributed deployments
  5. Operational Maturity: Established technologies with strong community support and tooling

Data Model Overview

The Control Plane’s data model supports multi-tenancy, hierarchical resource organization, and complete execution traceability. Entity Hierarchy:
Organization (Root Tenant)
├── Environments
│   ├── Task Queues
│   │   └── Workers
│   ├── Agents
│   └── Teams
├── Projects
│   ├── Knowledge Items
│   └── Resources
├── Policies
├── Secrets
└── Integration Credentials
Multi-Tenancy Design: All data is organizationally scoped with database-level enforcement:
  • Row-Level Security (RLS): PostgreSQL policies automatically filter queries by organization
  • Isolation Guarantees: No query can access data from another organization
  • Shared Infrastructure: All organizations use same database instance with logical separation
Execution Tracking: Complete audit trail from task submission through completion:
  • Execution Records: Status, timestamps, initiator, target agent/team
  • Session History: Full conversation logs with token usage and costs
  • Analytics Data: Aggregated metrics for dashboards and reporting
  • Relationship Tracking: Links between executions, jobs, and schedules
Schema Evolution: Database migrations managed through automated tooling:
  • Version Control: All schema changes tracked in migration history
  • Zero-Downtime Deployments: Migrations designed for rolling updates
  • Rollback Support: Backward-compatible changes enable safe rollbacks

Monitoring & Analytics

Comprehensive observability enables operational insight, troubleshooting, and cost optimization. Structured Logging: JSON-formatted logs with:
  • Correlation IDs: Track requests across distributed services
  • Contextual Fields: Organization ID, execution ID, user ID automatically included
  • Log Levels: Debug, info, warning, error for filtering
  • Structured Data: Machine-readable format for log aggregation and analysis
Execution Analytics: Real-time and historical metrics for:
  • Token Usage: Prompt tokens, completion tokens, total tokens per execution
  • Cost Tracking: Per-execution costs based on model pricing
  • Latency Metrics: Time spent in different workflow phases
  • State Transitions: Track how executions move through pending, running, waiting, completed, failed states
Operational Health: System health endpoints provide:
  • Service Status: Component-level health checks (database, Redis, Temporal connectivity)
  • Worker Presence: Real-time view of active workers per queue
  • Queue Depth: Pending task counts for capacity planning
  • Error Rates: Failed execution percentages and error categorization
Analytics Architecture: Real-Time Layer:
  • Turn-by-turn analytics captured during execution
  • Immediate visibility into token usage and costs
  • WebSocket streaming for live dashboard updates
Historical Layer:
  • Aggregated metrics in PostgreSQL for dashboards
  • Time-series analysis for trend identification
  • Exportable data for external BI tools
Integration Points:
  • Prometheus: Metrics export for alerting and monitoring
  • OpenTelemetry: Distributed tracing for request path analysis
  • Custom Webhooks: Real-time event notifications for external systems
This multi-layer observability approach balances real-time operational needs with long-term analytical requirements.

Architectural Qualities

The Control Plane is designed to meet enterprise requirements for security, reliability, and scale.

Security & Governance

Multi-Tenancy: Database-level tenant isolation with row-level security policies Authentication: JWT and API key support with flexible provider integration Authorization: Declarative OPA policies for fine-grained access control Audit Trail: Complete execution history with immutable logs Secret Management: Secure storage and injection of credentials Network Isolation: Support for private networking in self-hosted deployments

Reliability & Resilience

Durable Execution: Temporal workflows survive failures and infrastructure changes Automatic Retries: Configurable retry policies with exponential backoff Graceful Degradation: System remains operational even if external services are unavailable Health Monitoring: Continuous worker health checks with automatic routing around failures State Persistence: All critical state stored in durable data stores (PostgreSQL, Temporal) Idempotent Operations: Safe to retry any operation without unintended side effects

Scalability & Performance

Horizontal Scaling: All components scale independently (API servers, workers, databases) Distributed Task Queues: Temporal provides natural work distribution across worker pools Caching Strategy: Multi-tier caching (Redis) reduces database load and external API calls Asynchronous Operations: Non-blocking I/O for high-concurrency workloads Connection Pooling: Efficient database connection management Resource Isolation: Individual executions don’t impact each other

Operational Excellence

Structured Observability: JSON logs, metrics, and traces for complete system visibility Zero-Downtime Deployments: Rolling updates with backward-compatible changes Configuration Management: Environment-based config with secret separation API-Driven: All operations available via REST API for automation Self-Service: Teams can provision and manage resources without operator intervention Cost Transparency: Detailed usage tracking and cost attribution per organization

Integration & Extensibility

Multi-Cloud Support: Run on any Kubernetes cluster or serverless platform Storage Flexibility: Support for AWS S3, GCP, Azure, or S3-compatible storage Model Agnostic: Support for any LLM provider through LiteLLM gateway MCP Protocol: Standardized integration with external tools and services Webhook Support: Real-time event notifications for external systems API-First Design: Full platform capabilities accessible programmatically

Next Steps

Explore related documentation to understand how components work together: