Central orchestration platform for AI agent and team execution. Manages workflows, worker coordination, policy enforcement, and integrations. The Control Plane serves as the central nervous system for Kubiya’s distributed AI execution platform, addressing the challenges of coordinating multi-tenant agent workloads, enforcing security policies, and maintaining consistent state across heterogeneous execution environments. It provides a unified orchestration layer that abstracts the complexity of workflow management, resource allocation, and integration coordination.Documentation Index
Fetch the complete documentation index at: https://docs.kubiya.ai/llms.txt
Use this file to discover all available pages before exploring further.
Deployment Options
Choose between fully managed SaaS or self-hosted deployment based on your requirements.Kubiya Hosted (SaaS)
Fully managed infrastructure with zero operational overhead. All components, scaling, security, and maintenance handled by Kubiya. Benefits: Zero ops • Auto-scaling • Enterprise security • 99.9% uptime SLA • Continuous updatesSelf-Hosted
Stack: Docker Compose (local) • Kubernetes/Helm (production) • Temporal Server • PostgreSQL • Redis Benefits: Full control • On-premises • Custom security • Air-gapped supportHigh-Level Architecture
Distributed, multi-tenant architecture for AI agent orchestration. Core Components:- API Layer: FastAPI-based REST and WebSocket interface for multi-tenant access
- Orchestration: Temporal workflows for reliable, distributed task execution
- Data Persistence: PostgreSQL with row-level security for tenant isolation
- Caching & State: Redis for performance optimization and real-time communication
- Security: JWT/API key authentication with OPA-based policy enforcement
- LLM Gateway: LiteLLM for unified access to multiple AI model providers
- Knowledge Layer: Context Graph for organizational memory and cross-agent learning
Control Plane API
The Control Plane API provides a comprehensive REST and WebSocket interface for managing the entire lifecycle of AI agents, teams, and workflows. Architectural Approach:- Multi-tenant by design: All data access enforced at the database level through row-level security
- Async-first architecture: Built on FastAPI with async/await for high-concurrency workloads
- Strongly typed contracts: Pydantic models ensure request/response validation
- Real-time communication: WebSocket endpoints for streaming execution logs and live status updates
- Structured observability: Correlation IDs and structured logging for distributed tracing
- Agent & Team Management: Create, configure, and orchestrate AI agents and multi-agent teams
- Execution Control: Submit tasks, monitor progress, stream outputs, and handle approvals
- Resource Management: Task queue registration, environment configuration, model registry
- Governance: Policy definition, secret management, integration credentials
- Analytics & Observability: Usage metrics, cost tracking, execution history
Authentication & Authorization
Multi-tenant authentication with organization-level isolation ensures secure access across all Control Plane operations. Authentication Methods:- JWT Bearer Tokens: User-scoped authentication with role-based access control
- API Keys: Service-to-service authentication for automation and integrations
- Token Caching: Redis-backed cache layer minimizes external validation latency
- Organization Scoping: Every request automatically scoped to authenticated organization
- Database-Level Isolation: PostgreSQL row-level security (RLS) enforces tenant boundaries
- Role-Based Access Control (RBAC): Fine-grained permissions for resource access
- Automatic Token Rotation: Built-in support for credential lifecycle management
Policy Enforcement
OPA-based access control and governance. Policy Types: Tool usage control • Resource access restrictions • Execution constraints (limits/quotas) • Approval workflows Architecture Decision: Why OPA?- Decoupled policy logic: Policies defined separately from application code enable non-engineers to manage governance
- Declarative approach: Rego policy language allows expressing complex rules as data
- Centralized enforcement: Single policy engine across all agents, teams, and workflows
- Audit transparency: Policy decisions are logged with full context for compliance reporting
- Dynamic evaluation: Policies evaluated at execution time based on current state and context
- Tool Usage Control: Restrict which Skills and MCP servers agents can invoke
- Resource Access: Limit file paths, network destinations, or cloud resources
- Execution Constraints: Enforce quotas on compute time, token usage, or cost
- Approval Workflows: Require human-in-the-loop approval for sensitive operations
- Pre-execution: Validate permissions before workflow submission
- During execution: Monitor and enforce limits in real-time
- Post-execution: Generate audit logs for compliance and security review
Worker Registration & Lifecycle
Workers register with the Control Plane to receive and execute tasks, establishing a durable connection to the distributed orchestration system. Registration Flow:- Initial Connection: Worker authenticates with API key and queue identifier
- Configuration Receipt: Control Plane provides connection credentials for Temporal and LiteLLM
- Queue Binding: Worker connects to organization-specific Temporal task queue
- Polling Activation: Worker begins polling for available workflow tasks
- Lightweight Heartbeats: Frequent status updates with minimal overhead (task count, current state)
- Full Heartbeats: Periodic detailed reports including system metrics and execution logs
- Redis-Based State: Fast lookups for worker status without database queries
- Automatic Staleness Detection: Workers that stop reporting are marked inactive
Task Queue Architecture
Hierarchical task routing and worker management.For a beginner-friendly introduction to workers and task queues, see Workers Overview and Task Queues.
- Tenant Isolation: Organization-level separation ensures complete data and execution isolation
- Environment Segregation: Separate queues for production, staging, and development prevent cross-contamination
- Capacity Management: Task queues provide named capacity pools that can be scaled independently
- Flexible Routing: Multiple routing strategies support different use cases:
- AUTO: Temporal load-balances across available workers (default)
- SPECIFIC_QUEUE: Direct targeting for specialized hardware (GPU, high-memory)
- ENVIRONMENT: Restrict execution to specific deployment contexts
{organization_id}.{queue_uuid}, ensuring global uniqueness while maintaining logical hierarchy. This naming scheme enables:
- Fast organization-level filtering and metrics
- Queue-level access control policies
- Clear audit trails in execution logs
Execution Flow
End-to-end orchestration from user request through task completion, with support for long-running workflows and human-in-the-loop interactions. Workflow Types:- Agent Execution: Single-agent task processing with tool invocation and multi-turn conversations
- Team Execution: Multi-agent coordination with inter-agent communication and shared context
- Scheduled Jobs: Cron-based recurring workflows for automation and monitoring
- WebSocket Streaming: Live execution logs and status updates
- Query Endpoints: Synchronous status checks without interrupting execution
- Signal Handling: External events can influence running workflows (user input, cancellation)
Temporal Workflow Architecture
Architecture Decision: Why Temporal? Temporal addresses several critical challenges in distributed agent orchestration:- Durability: Workflow state persists across failures, restarts, and infrastructure changes
- Deterministic Replay: Workflows can be reconstructed and replayed for debugging or recovery
- Built-in Retry Logic: Configurable retry policies with exponential backoff for transient failures
- Long-Running Workflows: Support for workflows that run hours or days (HITL, scheduled jobs)
- Distributed Task Queues: Native load balancing across heterogeneous worker pools
- Versioning: Safe deployment of workflow changes without disrupting in-flight executions
- Agent Workflows: Linear execution with dynamic tool invocation and state transitions
- Team Workflows: Parallel and sequential agent coordination with shared context
- Scheduled Workflows: Wrapper workflows that handle cron triggers and error handling
- LLM Inference: Calls to model providers through LiteLLM gateway
- Database Operations: Session storage, execution status updates
- Analytics: Token usage, cost tracking, turn metrics
- External Integration: Calls to Context Graph, policy enforcer, storage services
- At-least-once execution for all activities (idempotency required)
- Exactly-once workflow decisions (deterministic replay)
- Strong consistency for workflow state
LLM Gateway
Architecture Decision: Why LiteLLM? LiteLLM solves the challenge of managing multiple LLM providers with incompatible APIs:- Unified Interface: All providers exposed through OpenAI-compatible API
- Zero Code Changes: Switch providers by changing model identifier, no code changes
- Automatic Retries: Built-in retry logic with exponential backoff for rate limits
- Response Caching: Reduce costs and latency for repeated queries
- Cost Tracking: Unified token counting and cost calculation across providers
- Fallback Chains: Automatic failover to backup models on provider outages
- Commercial: OpenAI, Anthropic, Google, Microsoft Azure OpenAI, Mistral, Cohere
- Open Source: Replicate, Together AI, Anyscale, Perplexity
- Self-Hosted: vLLM, Ollama, LM Studio, custom OpenAI-compatible endpoints
{provider}/{model-name}:
openai/gpt-4o→ OpenAI GPT-4oanthropic/claude-sonnet-4→ Anthropic Claude Sonnetgemini/gemini-2.0-flash→ Google Gemini
Context Graph Integration
The Context Graph provides persistent organizational knowledge that enhances agent intelligence across all executions. Architectural Role: The Context Graph serves as a central knowledge repository that agents query to:- Historical Context: Learn from past executions, decisions, and outcomes
- Organizational Knowledge: Access company-specific information, terminology, and processes
- Cross-Agent Learning: Benefit from insights gathered by other agents and teams
- Entity Relationships: Understand connections between projects, resources, and team members
- Authenticated Access: All queries inherit organization context from the authenticated session
- Query Translation: Natural language questions translated to structured graph queries
- Result Enrichment: Graph results augmented with execution-time context
- Caching: Frequently accessed knowledge cached in Redis for performance
- Onboarding: New agents immediately benefit from accumulated organizational knowledge
- Consistency: All agents reference the same authoritative sources
- Evolution: Knowledge base grows automatically from agent interactions and human feedback
- Retrieval-Augmented Generation (RAG): Context-aware responses based on organizational data
Cognitive Memory
Persistent session storage enables agents to maintain context across multi-turn conversations and long-running workflows. Memory Architecture: Cognitive memory is stored in PostgreSQL using a structured schema that captures:- Conversation History: Complete message sequences with timestamps and metadata
- Agent State: Current context variables, active tools, and execution phase
- Session Metadata: User identifiers, environment context, policy constraints
- Turn Analytics: Token usage, latency, and cost per interaction
- Survive Restarts: Sessions persist across worker failures and deployments
- Enable Resumption: Long-running tasks can pause and resume without losing context
- Support HITL: Human-in-the-loop workflows maintain full conversation state while waiting
- Facilitate Debugging: Complete execution history available for post-mortem analysis
- Reference earlier parts of the conversation
- Build incrementally on previous decisions
- Clarify ambiguous requests through back-and-forth dialogue
- Maintain task context across multiple human inputs
Storage System
Architecture Decision: Multi-Cloud Strategy Supporting multiple cloud providers solves several challenges:- Deployment Flexibility: Customers can use their existing cloud infrastructure
- Compliance Requirements: Data residency regulations may dictate storage location
- Cost Optimization: Organizations can leverage existing cloud commitments
- Vendor Independence: Avoid lock-in to single cloud provider
- Hybrid Deployments: Self-hosted control planes can use on-premises storage
- AWS S3: Industry-standard object storage with broad regional availability
- Google Cloud Storage: Deep integration with GCP services
- Azure Blob Storage: Native support for Azure-based deployments
- S3-Compatible: MinIO, Wasabi, Backblaze B2, or any S3-compatible service
- Fast searches without scanning object storage
- Tag-based filtering and organization
- Relationship tracking (which executions used which files)
- Versioning and lifecycle management
- Presigned URLs for direct uploads (bypassing Control Plane for large files)
- Batch operations for efficiency
- Streaming downloads for large files
- Tag-based search and filtering
Technology Stack & Dependencies
The Control Plane’s architecture is built on proven, production-grade technologies chosen for specific technical requirements.Core Infrastructure
PostgreSQL - Multi-tenant relational database- Why chosen: Native row-level security (RLS) for tenant isolation, ACID guarantees, rich query capabilities
- Role: Primary data store for agents, teams, executions, policies, and all configuration
- Key features: JSON columns for flexible schemas, full-text search, complex queries with joins
- Why chosen: Microsecond latency for caching, pub/sub for real-time updates, simple data structures
- Role: Authentication token cache, worker heartbeats, WebSocket session state
- Key features: TTL-based expiration, atomic operations, pub/sub messaging
- Why chosen: Durable execution, built-in retry logic, distributed task queues, workflow versioning
- Role: Orchestrate all agent and team executions, manage long-running workflows, ensure reliable task distribution
- Key features: Deterministic replay, signal handling, query support, horizontal scalability
- Why chosen: Unified API across providers, automatic retries, cost tracking, response caching
- Role: Abstract model provider differences, enable multi-model support, manage API keys and quotas
- Key features: OpenAI-compatible interface, fallback chains, streaming support
External Services
Kubiya API - Authentication service- Why integrated: Centralized user management, SSO integration, organization provisioning
- Role: Validate JWT tokens, provide organization context, manage API keys
- Integration pattern: External validation with Redis caching for performance
- Why chosen: Declarative policy language, decoupled from application code, industry standard for cloud-native authorization
- Role: Enforce access control, usage limits, and governance policies
- Integration pattern: Synchronous policy evaluation before and during execution
- Why integrated: Organizational memory, cross-agent learning, RAG support
- Role: Provide historical context and domain knowledge to enhance agent intelligence
- Integration pattern: On-demand queries with caching
- Why multi-cloud: Deployment flexibility, compliance requirements, cost optimization
- Role: Store agent artifacts, execution outputs, uploaded files
- Integration pattern: Provider-agnostic abstraction with metadata in PostgreSQL
Architectural Principles
- Separation of Concerns: Each technology addresses specific requirements (data persistence vs. caching vs. orchestration)
- Best-of-Breed: Choose proven technologies rather than monolithic solutions
- Horizontal Scalability: All components can scale independently
- Cloud-Native: Designed for containerized, distributed deployments
- Operational Maturity: Established technologies with strong community support and tooling
Data Model Overview
The Control Plane’s data model supports multi-tenancy, hierarchical resource organization, and complete execution traceability. Entity Hierarchy:- Row-Level Security (RLS): PostgreSQL policies automatically filter queries by organization
- Isolation Guarantees: No query can access data from another organization
- Shared Infrastructure: All organizations use same database instance with logical separation
- Execution Records: Status, timestamps, initiator, target agent/team
- Session History: Full conversation logs with token usage and costs
- Analytics Data: Aggregated metrics for dashboards and reporting
- Relationship Tracking: Links between executions, jobs, and schedules
- Version Control: All schema changes tracked in migration history
- Zero-Downtime Deployments: Migrations designed for rolling updates
- Rollback Support: Backward-compatible changes enable safe rollbacks
Monitoring & Analytics
Comprehensive observability enables operational insight, troubleshooting, and cost optimization. Structured Logging: JSON-formatted logs with:- Correlation IDs: Track requests across distributed services
- Contextual Fields: Organization ID, execution ID, user ID automatically included
- Log Levels: Debug, info, warning, error for filtering
- Structured Data: Machine-readable format for log aggregation and analysis
- Token Usage: Prompt tokens, completion tokens, total tokens per execution
- Cost Tracking: Per-execution costs based on model pricing
- Latency Metrics: Time spent in different workflow phases
- State Transitions: Track how executions move through pending, running, waiting, completed, failed states
- Service Status: Component-level health checks (database, Redis, Temporal connectivity)
- Worker Presence: Real-time view of active workers per queue
- Queue Depth: Pending task counts for capacity planning
- Error Rates: Failed execution percentages and error categorization
- Turn-by-turn analytics captured during execution
- Immediate visibility into token usage and costs
- WebSocket streaming for live dashboard updates
- Aggregated metrics in PostgreSQL for dashboards
- Time-series analysis for trend identification
- Exportable data for external BI tools
- Prometheus: Metrics export for alerting and monitoring
- OpenTelemetry: Distributed tracing for request path analysis
- Custom Webhooks: Real-time event notifications for external systems
Architectural Qualities
The Control Plane is designed to meet enterprise requirements for security, reliability, and scale.Security & Governance
Multi-Tenancy: Database-level tenant isolation with row-level security policies Authentication: JWT and API key support with flexible provider integration Authorization: Declarative OPA policies for fine-grained access control Audit Trail: Complete execution history with immutable logs Secret Management: Secure storage and injection of credentials Network Isolation: Support for private networking in self-hosted deploymentsReliability & Resilience
Durable Execution: Temporal workflows survive failures and infrastructure changes Automatic Retries: Configurable retry policies with exponential backoff Graceful Degradation: System remains operational even if external services are unavailable Health Monitoring: Continuous worker health checks with automatic routing around failures State Persistence: All critical state stored in durable data stores (PostgreSQL, Temporal) Idempotent Operations: Safe to retry any operation without unintended side effectsScalability & Performance
Horizontal Scaling: All components scale independently (API servers, workers, databases) Distributed Task Queues: Temporal provides natural work distribution across worker pools Caching Strategy: Multi-tier caching (Redis) reduces database load and external API calls Asynchronous Operations: Non-blocking I/O for high-concurrency workloads Connection Pooling: Efficient database connection management Resource Isolation: Individual executions don’t impact each otherOperational Excellence
Structured Observability: JSON logs, metrics, and traces for complete system visibility Zero-Downtime Deployments: Rolling updates with backward-compatible changes Configuration Management: Environment-based config with secret separation API-Driven: All operations available via REST API for automation Self-Service: Teams can provision and manage resources without operator intervention Cost Transparency: Detailed usage tracking and cost attribution per organizationIntegration & Extensibility
Multi-Cloud Support: Run on any Kubernetes cluster or serverless platform Storage Flexibility: Support for AWS S3, GCP, Azure, or S3-compatible storage Model Agnostic: Support for any LLM provider through LiteLLM gateway MCP Protocol: Standardized integration with external tools and services Webhook Support: Real-time event notifications for external systems API-First Design: Full platform capabilities accessible programmaticallyNext Steps
Explore related documentation to understand how components work together:Control Plane Overview
Deployment options and operational concepts
Developers Guide
Local development setup and API reference
Agents
Create and configure AI agents
Teams
Coordinate multi-agent teams
Runtimes
Choose execution runtimes
Environments
Configure execution environments