Documentation Index
Fetch the complete documentation index at: https://docs.kubiya.ai/llms.txt
Use this file to discover all available pages before exploring further.
Terraform Modules
The Kubiya Control Plane Provider includes pre-built Terraform modules to help you quickly deploy common infrastructure patterns. These modules encapsulate best practices and reduce boilerplate code.
Available Modules
engineering-org Module
A comprehensive module for creating a complete engineering organization setup. This module creates and manages all necessary resources for running AI agents in your organization.
Module Source: kubiya/control-plane//modules/engineering-org
engineering-org Module
Overview
The engineering-org module provides a complete infrastructure-as-code solution for setting up your AI agent organization. It creates all necessary resources including environments, projects, teams, agents, skills, policies, worker queues, and jobs.
Features
- Flexible Configuration: Use maps to create multiple instances of each resource type
- Sensible Defaults: Ready-to-use configuration for quick setup
- Easy Extension: Add or modify resources by updating variable maps
- Complete Setup: Creates all resources for a production-ready organization
- Automatic Dependencies: Handles relationships between resources automatically
- Resource References: Reference resources by name for easy relationship management
Resources Created
This module can create:
| Resource Type | Description |
|---|
| Environments | Isolated execution environments for agents and workers |
| Projects | Organizational units for grouping related work |
| Teams | Groups of agents working together with shared configuration |
| Agents | AI-powered automation agents with custom LLM configurations |
| Skills | Reusable capabilities (shell, filesystem, docker) for agents |
| Policies | OPA Rego policies for governance and security |
| Worker Queues | Task queues for managing worker distribution |
| Jobs | Scheduled, webhook-triggered, or manual tasks |
Usage
Minimal Example (Using Defaults)
The simplest way to use the module is with all defaults:
terraform {
required_providers {
controlplane = {
source = "kubiya/control-plane"
version = "~> 1.0"
}
}
}
provider "controlplane" {
# KUBIYA_CONTROL_PLANE_API_KEY environment variable
}
module "engineering_org" {
source = "kubiya/control-plane//modules/engineering-org"
}
output "summary" {
value = module.engineering_org.summary
}
This creates:
- 1 production environment
- 1 platform project
- 1 devops team
- 2 agents (deployer, monitor)
- 2 skills (shell, filesystem)
- 1 security policy
- 1 default worker queue
- 1 daily health check job
Custom Configuration Example
Customize the module by providing your own variable maps:
module "engineering_org" {
source = "kubiya/control-plane//modules/engineering-org"
# Multiple environments
environments = {
production = {
description = "Production environment"
settings = jsonencode({
region = "us-east-1"
max_workers = 20
auto_scaling = true
retention_days = 90
})
execution_environment = jsonencode({
env_vars = {
LOG_LEVEL = "info"
APP_ENV = "production"
}
})
}
staging = {
description = "Staging environment"
settings = jsonencode({
region = "us-west-2"
max_workers = 10
auto_scaling = true
retention_days = 30
})
execution_environment = jsonencode({
env_vars = {
LOG_LEVEL = "debug"
APP_ENV = "staging"
}
})
}
}
# Multiple teams
teams = {
devops = {
description = "DevOps and platform engineering team"
runtime = "claude_code"
configuration = jsonencode({
max_agents = 15
enable_monitoring = true
})
}
sre = {
description = "Site reliability engineering team"
runtime = "claude_code"
configuration = jsonencode({
max_agents = 10
})
}
}
# Multiple agents with team assignments
agents = {
deployer = {
description = "Production deployment agent"
model_id = "kubiya/claude-sonnet-4"
runtime = "claude_code"
llm_config = jsonencode({
temperature = 0.3
max_tokens = 4000
})
capabilities = ["kubernetes_deploy", "helm_deploy", "rollback"]
configuration = jsonencode({
max_retries = 3
timeout = 900
approval_needed = true
})
team_name = "devops" # References the devops team
}
monitor = {
description = "Monitoring and alerting agent"
model_id = "kubiya/claude-sonnet-4"
runtime = "claude_code"
llm_config = jsonencode({
temperature = 0.5
max_tokens = 2000
})
capabilities = ["metrics_collection", "alerting", "log_analysis"]
configuration = jsonencode({
check_interval = 60
alert_channels = ["slack", "pagerduty"]
})
team_name = "sre" # References the sre team
}
incident_responder = {
description = "Incident response agent"
model_id = "kubiya/claude-sonnet-4"
runtime = "claude_code"
llm_config = jsonencode({
temperature = 0.4
max_tokens = 3000
})
capabilities = ["incident_management", "root_cause_analysis"]
configuration = jsonencode({
escalation_timeout = 600
})
team_name = "sre"
}
}
# Multiple skills
skills = {
shell = {
description = "Shell command execution"
type = "shell"
enabled = true
configuration = jsonencode({
allowed_commands = ["kubectl", "helm", "aws", "terraform"]
timeout = 600
working_dir = "/app"
})
}
filesystem = {
description = "File system operations"
type = "file_system"
enabled = true
configuration = jsonencode({
allowed_paths = ["/app/configs", "/app/data"]
max_file_size = 52428800 # 50MB
operations = ["read", "write", "list", "delete"]
})
}
docker = {
description = "Docker container management"
type = "docker"
enabled = true
configuration = jsonencode({
allowed_registries = ["docker.io", "gcr.io", "ghcr.io"]
max_containers = 20
network_mode = "bridge"
})
}
}
# Multiple policies
policies = {
security = {
description = "Security policy for production"
enabled = true
policy_content = <<-EOT
package kubiya.security
# Deny destructive operations without approval
deny[msg] {
input.operation = "delete"
input.environment = "production"
count(input.approvals) < 2
msg := "Delete operations in production require at least 2 approvals"
}
# Require MFA for sensitive operations
deny[msg] {
input.operation = "deploy"
input.environment = "production"
not input.mfa_verified
msg := "Production deployments require MFA verification"
}
EOT
tags = ["security", "production", "compliance"]
}
cost_control = {
description = "Cost control and resource limits"
enabled = true
policy_content = <<-EOT
package kubiya.cost
# Limit instance sizes
deny[msg] {
input.action = "create_instance"
input.instance_type = "x2.32xlarge"
msg := "Instance type too large, maximum allowed is m5.2xlarge"
}
# Require cost tags
deny[msg] {
input.action = "create_resource"
not input.tags.cost_center
msg := "All resources must have a cost_center tag"
}
EOT
tags = ["cost", "governance", "finops"]
}
}
# Multiple worker queues
worker_queues = {
production-primary = {
environment_name = "production"
display_name = "Production Primary Queue"
description = "Primary worker queue for production workloads"
heartbeat_interval = 60
max_workers = 20
tags = ["production", "primary", "high-priority"]
settings = {
region = "us-east-1"
tier = "production"
priority = "high"
}
}
production-batch = {
environment_name = "production"
display_name = "Production Batch Queue"
description = "Batch processing queue"
heartbeat_interval = 120
max_workers = 10
tags = ["production", "batch"]
settings = {
region = "us-east-1"
tier = "production"
priority = "normal"
}
}
}
# Multiple jobs
jobs = {
health_check = {
description = "Daily health check"
enabled = true
trigger_type = "cron"
cron_schedule = "0 9 * * *" # 9 AM UTC daily
cron_timezone = "UTC"
planning_mode = "predefined_agent"
entity_type = "agent"
entity_name = "monitor" # References the monitor agent
prompt_template = "Run daily health check for all production services"
system_prompt = "Check the health of all production services and report any issues"
executor_type = "auto"
execution_env_vars = {
CHECK_TYPE = "comprehensive"
ALERT_ON_FAILURE = "true"
}
}
deployment_webhook = {
description = "Handle deployment webhook events"
enabled = true
trigger_type = "webhook"
planning_mode = "predefined_agent"
entity_type = "agent"
entity_name = "deployer" # References the deployer agent
prompt_template = "Deploy {{service}} version {{version}} to {{environment}}"
system_prompt = "Process deployment request and verify prerequisites"
executor_type = "environment"
environment_name = "production"
config = jsonencode({
timeout = 1800 # 30 minutes
retry_policy = {
max_attempts = 3
backoff = "exponential"
}
})
}
incident_response = {
description = "Manual incident response"
enabled = true
trigger_type = "manual"
planning_mode = "predefined_agent"
entity_type = "agent"
entity_name = "incident_responder"
prompt_template = "Handle incident: {{incident_id}} - {{description}}"
system_prompt = "Coordinate incident response and resolution"
executor_type = "auto"
execution_secrets = ["pagerduty_token", "slack_webhook"]
}
}
}
# Outputs
output "summary" {
value = module.engineering_org.summary
}
output "environment_ids" {
value = module.engineering_org.environment_ids
}
output "agent_ids" {
value = module.engineering_org.agent_ids
}
output "webhook_urls" {
value = module.engineering_org.job_webhook_urls
sensitive = true
}
environments
Map of environments to create.
Type:
map(object({
description = string
settings = optional(string, null) # JSON-encoded
execution_environment = optional(string, null) # JSON-encoded
}))
Default: Creates a production environment
projects
Map of projects to create.
Type:
map(object({
key = string
description = string
settings = optional(string, null) # JSON-encoded
}))
Default: Creates a platform project
teams
Map of teams to create.
Type:
map(object({
description = string
runtime = optional(string, "default")
configuration = optional(string, null) # JSON-encoded
}))
Default: Creates a devops team
agents
Map of agents to create.
Type:
map(object({
description = string
model_id = optional(string, "gpt-4")
runtime = optional(string, "default")
llm_config = optional(string, null) # JSON-encoded
capabilities = optional(list(string), [])
configuration = optional(string, null) # JSON-encoded
team_name = optional(string, null)
}))
Default: Creates deployer and monitor agents
skills
Map of skills to create.
Type:
map(object({
description = string
type = string
enabled = optional(bool, true)
configuration = optional(string, null) # JSON-encoded
}))
Default: Creates shell and filesystem skills
policies
Map of policies to create.
Type:
map(object({
description = string
enabled = optional(bool, true)
policy_content = string
tags = optional(list(string), [])
}))
Default: Creates a security policy
worker_queues
Map of worker queues to create.
Type:
map(object({
environment_name = string
display_name = string
description = string
heartbeat_interval = optional(number, 60)
max_workers = optional(number, 10)
tags = optional(list(string), [])
settings = optional(map(string), {})
}))
Default: Creates a default production queue
jobs
Map of jobs to create.
Type:
map(object({
description = string
enabled = optional(bool, true)
trigger_type = string # "cron", "webhook", or "manual"
cron_schedule = optional(string, null)
cron_timezone = optional(string, "UTC")
planning_mode = string
entity_type = optional(string, null)
entity_name = optional(string, null)
prompt_template = string
system_prompt = optional(string, null)
executor_type = optional(string, "auto")
environment_name = optional(string, null)
execution_env_vars = optional(map(string), {})
execution_secrets = optional(list(string), [])
config = optional(string, null) # JSON-encoded
}))
Default: Creates a daily health check job
Module Outputs
Resource Collections
environments - Map of created environments with full details
projects - Map of created projects with full details
teams - Map of created teams with full details
agents - Map of created agents with full details
skills - Map of created skills with full details
policies - Map of created policies with full details
worker_queues - Map of created worker queues with full details
jobs - Map of created jobs with full details
ID Maps
environment_ids - Map of environment names to IDs
project_ids - Map of project names to IDs
team_ids - Map of team names to IDs
agent_ids - Map of agent names to IDs
skill_ids - Map of skill names to IDs
policy_ids - Map of policy names to IDs
worker_queue_ids - Map of worker queue names to IDs
job_ids - Map of job names to IDs
Special Outputs
job_webhook_urls - Map of webhook job names to webhook URLs (sensitive)
worker_queue_task_names - Map of queue names to task queue names for worker registration
summary - Count of all created resources
Resource Relationships
The module automatically handles dependencies between resources:
Team Assignment
Agents can reference teams using the team_name field:
teams = {
devops = {
description = "DevOps team"
runtime = "claude_code"
}
}
agents = {
deployer = {
description = "Deployment agent"
team_name = "devops" # References the devops team
# ...
}
}
Environment References
Worker queues reference environments, and jobs can reference environments:
environments = {
production = {
description = "Production environment"
}
}
worker_queues = {
prod-queue = {
environment_name = "production" # References the production environment
# ...
}
}
jobs = {
deploy = {
environment_name = "production" # References the production environment
# ...
}
}
Entity References
Jobs reference agents or teams:
agents = {
deployer = {
description = "Deployment agent"
# ...
}
}
jobs = {
deploy_job = {
entity_type = "agent"
entity_name = "deployer" # References the deployer agent
# ...
}
}
Best Practices
1. Start with Defaults
Begin with the default configuration and customize incrementally:
module "engineering_org" {
source = "kubiya/control-plane//modules/engineering-org"
# Start with all defaults, then customize specific resources
agents = {
# Override default agents or add new ones
}
}
2. Use Meaningful Resource Names
Map keys become part of resource names, so use clear, descriptive names:
environments = {
production = { # Good: clear and descriptive
# ...
}
prod = { # Avoid: abbreviated and unclear
# ...
}
}
3. Organize by Environment
Create separate environments for different stages:
environments = {
production = {
description = "Production environment"
settings = jsonencode({
max_workers = 20
retention_days = 90
})
}
staging = {
description = "Staging environment"
settings = jsonencode({
max_workers = 10
retention_days = 30
})
}
development = {
description = "Development environment"
settings = jsonencode({
max_workers = 5
retention_days = 7
})
}
}
4. Group Agents by Team
Organize agents into teams based on their function:
teams = {
devops = { description = "DevOps team" }
sre = { description = "SRE team" }
data = { description = "Data team" }
}
agents = {
deployer = { team_name = "devops", # ... }
monitor = { team_name = "sre", # ... }
pipeline = { team_name = "data", # ... }
}
5. Implement Policies Early
Define governance policies from the start:
policies = {
security = {
description = "Security policy"
enabled = true
policy_content = "..."
tags = ["security", "required"]
}
cost_control = {
description = "Cost control policy"
enabled = true
policy_content = "..."
tags = ["cost", "governance"]
}
}
6. Create Dedicated Worker Queues
Use separate queues for different priorities:
worker_queues = {
high-priority = {
environment_name = "production"
max_workers = 20
tags = ["high-priority", "realtime"]
}
batch-processing = {
environment_name = "production"
max_workers = 10
tags = ["batch", "background"]
}
}
Advanced Patterns
Multi-Environment Setup
Create a complete multi-environment infrastructure:
module "engineering_org" {
source = "kubiya/control-plane//modules/engineering-org"
environments = {
for env in ["production", "staging", "development"] :
env => {
description = "${title(env)} environment"
settings = jsonencode({
max_workers = env == "production" ? 20 : (env == "staging" ? 10 : 5)
retention_days = env == "production" ? 90 : (env == "staging" ? 30 : 7)
})
}
}
}
Module Composition
Combine multiple module instances:
# Base infrastructure
module "base" {
source = "kubiya/control-plane//modules/engineering-org"
environments = { /* ... */ }
teams = { /* ... */ }
}
# Additional agents using base infrastructure
module "extended" {
source = "kubiya/control-plane//modules/engineering-org"
# Reference base infrastructure
agents = {
custom_agent = {
description = "Custom agent"
team_name = "devops" # Must exist in base module
# ...
}
}
# Don't recreate base resources
environments = {}
teams = {}
skills = {}
policies = {}
worker_queues = {}
}
Troubleshooting
Resource Already Exists
If you see “resource already exists” errors:
# Import existing resources
terraform import 'module.engineering_org.controlplane_agent.this["deployer"]' agent-xxxxx
Dependency Errors
If you see dependency errors, verify resource names match:
agents = {
deployer = {
team_name = "devops" # Must match a key in teams map
}
}
teams = {
devops = { # Must match team_name in agents
description = "DevOps team"
}
}
JSON Encoding
Always use jsonencode() for configuration fields:
# Good
configuration = jsonencode({
key = "value"
})
# Bad - will cause errors
configuration = "{\"key\":\"value\"}"
Next Steps