kubiya exec command provides intelligent task execution with automatic agent/team selection, detailed planning, and cost optimization. No infrastructure setup required - just execute and go!
Overview
Key Features
- Automatic Agent Selection: Analyzes your task and selects the best agent or team
- Intelligent Planning: Creates step-by-step execution plans
- Cost Estimation: Shows estimated costs before execution
- On-Demand Workers: Auto-provisions ephemeral workers (no setup needed)
- Plan Storage: Saves all plans for repeatability and audit
- Multiple Execution Modes: Auto-planning, direct execution, or from saved plans
Quick Start
Execute with On-Demand Worker (Easiest)
Run AI tasks without needing a queue in advance - perfect for CI/CD pipelines!- Analyzes your task requirements
- Selects the best agent or team
- Creates a detailed execution plan
- Shows cost estimates
- Control Plane provisions ephemeral queue + worker
- Executes the task
- Auto-cleans up queue and worker
- Returns results
Execution Modes
Auto-Planning Mode (Recommended)
Let the planner automatically select the best agent/team for your task:- ✓ Task analyzed and categorized
- ✓ Available agents/teams discovered
- ✓ Best match selected with reasoning
- ✓ Execution plan generated
- ✓ Cost estimate calculated
- ✓ User approval requested (unless —yes)
- ✓ Plan saved to
~/.kubiya/plans/<plan-id>.json - ✓ Task executed
Direct Execution Mode
Execute with a specific agent or team (skips planning):- With Agent
- With Team
Load from Saved Plan
Re-execute a previously saved plan:Local Mode Execution
Execute tasks with an ephemeral local worker - perfect for development, testing, and experimentation.What is Local Mode?
Local mode runs an ephemeral Python worker on your local machine to execute tasks. The CLI:- Generates a quick execution plan (simplified planning)
- Creates a temporary worker queue (5-minute TTL)
- Starts a Python worker in the foreground
- Executes your task
- Automatically cleans up the queue and worker
- 🆓 Free: Uses your local compute resources
- 🧪 Perfect for testing: Quick iterations without infrastructure setup
- 🎓 Learning: Experiment with agents and tasks locally
- 🔒 Private: Everything runs on your machine
Basic Usage
How It Works
- First Run
- Subsequent Runs
When to Use Local Mode
Development
Quick iterations and testing during development without setting up infrastructure
Learning
Experiment with Kubiya agents and tasks in a safe local environment
Demos
Show Kubiya capabilities without depending on remote infrastructure
Comparison: Local vs On-Demand vs Persistent Workers
| Feature | Local Mode | On-Demand Workers | Persistent Workers |
|---|---|---|---|
| Execution Location | Your machine | Kubiya Cloud | Your infrastructure |
| Setup Time | Medium (first run) Fast (cached) | None | Configuration required |
| Best For | Dev/Testing | CI/CD/Production | High-frequency tasks |
| Cleanup | Automatic | Automatic | Manual |
| Cost | Free (your compute) | Pay per execution | Infrastructure costs |
| Network Access | Your machine’s network | Kubiya’s network | Your network |
| Custom Dependencies | Local Python env | Managed by Kubiya | Full control |
Technical Details
Ephemeral Queue:- Created automatically with 5-minute TTL
- Auto-deleted after execution
- Isolated from other queues
- Runs in foreground (attached to CLI process)
- Single execution mode (exits after task completion)
- 180-second readiness timeout
- Graceful cleanup on exit or Ctrl+C
- Uses “quick mode” for faster planning
- Simplified LLM prompts
- Optimized for local execution scenarios
Troubleshooting
Slow First Run (40-120 seconds)
Slow First Run (40-120 seconds)
This is normal! The first local execution installs Python dependencies which are then cached for subsequent runs.Solution: Be patient on first run. Future runs will be much faster (5-10 seconds).
Worker Readiness Timeout
Worker Readiness Timeout
If the worker doesn’t become ready within 180 seconds, you’ll see a timeout error.Common causes:
- Slow internet connection (dependency download)
- Resource constraints on your machine
- Conflicting Python installations
Port Conflicts
Port Conflicts
If you see port binding errors, another process may be using the required port.Solution:
Python Environment Issues
Python Environment Issues
If you encounter Python-related errors:Solutions:
Command Syntax
Options
| Flag | Short | Description | Default |
|---|---|---|---|
--yes | -y | Auto-approve without confirmation | false |
--output | -o | Output format: text, json, yaml | text |
--plan-file | Execute from saved plan file | ||
--save-plan | Custom path to save plan | ~/.kubiya/plans/<id>.json | |
--non-interactive | Skip all prompts (for CI/CD) | false | |
--priority | Task priority: low, medium, high, critical | medium | |
--stream | Stream real-time execution logs | false | |
--local | Run with ephemeral local worker | false | |
--queue | Worker queue ID(s) - repeatable for parallel execution | ||
--queue-name | Worker queue name(s) - alternative to IDs | ||
--environment | Environment ID for execution | ||
--parent-execution | Parent execution ID for conversation continuation |
Environment Variables
Examples
DevOps & Deployment
Production Deployment
Production Deployment
Rollback
Rollback
Staging Environment
Staging Environment
Infrastructure Management
Kubernetes Operations
Kubernetes Operations
Cloud Resources
Cloud Resources
Security & Compliance
Security Audit
Security Audit
Access Management
Access Management
Monitoring & Troubleshooting
Incident Response
Incident Response
Health Checks
Health Checks
Development & Testing
Test Execution
Test Execution
Code Quality
Code Quality
Data & Analytics
Database Operations
Database Operations
Reporting
Reporting
CI/CD Integration
GitHub Actions
GitHub Actions
GitLab CI
GitLab CI
Jenkins Pipeline
Jenkins Pipeline
Task Planning
Understanding the Planning Process
When you use auto-planning mode, the system:-
Analyzes Your Task
- Extracts intent and requirements
- Identifies needed skills and tools
- Determines complexity and risk level
-
Discovers Resources
- Fetches available agents and teams
- Checks agent capabilities and skills
- Evaluates team composition
-
Selects Best Match
- Matches task requirements to agent capabilities
- Considers agent availability and workload
- Provides reasoning for selection
-
Generates Plan
- Creates step-by-step execution plan
- Identifies dependencies between steps
- Adds error handling and rollback steps
-
Estimates Costs
- Calculates LLM token costs
- Estimates tool execution costs
- Shows total estimated cost
-
Requests Approval
- Displays plan summary
- Shows cost breakdown
- Waits for user confirmation (unless —yes)
Planning Modes
The planner adapts its behavior based on your execution mode:- Normal Mode
- Quick Mode
Full Planning with Comprehensive AnalysisUsed by default and for on-demand worker execution:Characteristics:
- Comprehensive LLM prompts with full context
- Detailed resource discovery and matching
- Complete risk analysis and prerequisites
- Thorough task breakdown with dependencies
- Full cost estimation with breakdown
- Production deployments
- Complex multi-step workflows
- Team-based executions
- When you need detailed planning insights
Real-Time Progress Streaming
When running in interactive mode, the CLI streams real-time progress updates via Server-Sent Events (SSE):- Initializing: Setting up planner context
- Discovering: Fetching agents, teams, environments, queues (parallel)
- Analyzing: Matching task requirements to capabilities
- Generating: Creating detailed execution plan
- Finalizing: Cost estimation and success criteria
- Complete: Plan ready for review
thinking: Planner is processing informationtool_call: Planner is calling internal toolsresources_summary: Resource discovery resultscomplete: Planning finished successfullyerror: Planning encountered an issue
Non-Interactive Mode: Use
--non-interactive or set KUBIYA_NON_INTERACTIVE=true to disable progress streaming and get direct output.Resource Discovery
The planner performs parallel resource discovery to make intelligent recommendations: Discovery Process:- Agents Discovery: Fetches all available agents with their skills and capabilities
- Teams Discovery: Retrieves team configurations and member agents
- Environments Discovery: Gets environment configurations and access policies
- Queue Discovery: Lists available worker queues and their status
- Task keywords → Agent/Team skills
- Required tools → Agent capabilities
- Complexity level → Agent experience/configuration
- Availability → Agent/Queue status (running/idle)
- Performance history → Previous success rates
Interactive Plan Display
When reviewing plans interactively, you can drill down into details: Display Options:- Summary View (default): High-level overview with key information
- Task Breakdown: Detailed step-by-step task list with responsibilities
- Cost Analysis: Detailed cost breakdown by model and tool
- Risk Assessment: Full risk analysis with mitigation strategies
- Dependencies: Task dependencies and execution order
- Use
--yesflag to skip interactive display and approve automatically - Use
--non-interactivefor fully automated execution - Use
--output jsonfor machine-readable output without prompts
Plan Storage
All plans are automatically saved for audit and repeatability:- Task prompt and metadata
- Selected agent/team with reasoning
- Step-by-step execution plan
- Cost estimates and actuals
- Execution results and logs
- Timestamp and duration
Plan File Format
Plans are saved as JSON files with comprehensive metadata for programmatic access and automation.File Structure
Each plan file contains aSavedPlan object with the following structure:
Field Reference
Top-Level Fields:plan(object): The complete plan response from the plannersaved_at(timestamp): When the plan was saved locallyprompt(string): Original user prompt that generated the planapproved(boolean): Whether user approved the planapproved_at(timestamp, optional): When plan was approvedexecuted_at(timestamp, optional): When plan was executedexecution_id(string, optional): Associated execution identifierfile_path(string): Full path to the plan file
plan_id(string): Unique identifier for the plantitle(string): Human-readable plan titlesummary(string): Detailed plan descriptioncomplexity(object): Effort estimationstory_points(number): Estimated complexity (1-13)confidence(string): Confidence level (low/medium/high)reasoning(string): Explanation of complexity assessment
team_breakdown(array): Execution phases and task assignmentsrecommended_execution(object): Planner’s execution recommendationentity_type(string): “agent” or “team”entity_id(string): ID of recommended agent/teamentity_name(string): Name of recommended entityrecommended_environment_id(string): Environment to userecommended_worker_queue_id(string): Queue to usereasoning(string): Why this entity was selected
cost_estimate(object): Financial cost breakdownestimated_cost_usd(number): Total estimated cost in USDllm_costs(array): Per-model cost breakdowntool_costs(array): Per-tool cost breakdown
realized_savings(object): Value generatedtime_saved_hours(number): Estimated time savedmoney_saved(number): Estimated cost saved vs manual work
risks(array): Identified risks and concernsprerequisites(array): Requirements before executionsuccess_criteria(array): Conditions for successful completion
Programmatic Usage
- Extract Fields
- List Plans
- Automation
- Analysis
On-Demand Workers
What are On-Demand Workers?
On-demand workers let you run AI tasks without needing a queue in advance - perfect for CI/CD pipelines! When you usekubiya exec, the Control Plane creates an ephemeral worker queue specifically for your execution, provisions a worker on managed infrastructure, runs your task, and automatically cleans everything up.
Benefits:
- ✓ No queue setup needed - run tasks instantly without pre-configuring queues
- ✓ Perfect for CI/CD - integrate into pipelines without infrastructure management
- ✓ Auto-cleanup - queue and worker automatically removed after execution
- ✓ Cost-effective - pay only for execution time
- ✓ Secure - isolated ephemeral environments
- ✓ Fully managed - handled by Kubiya Control Plane
How It Works
When to Use On-Demand vs Persistent Workers
On-Demand Workers
Best for:
- Occasional tasks
- One-off operations
- CI/CD pipelines
- Development/testing
- Cost optimization
- Zero setup
- Auto-scaling
- Cost-effective
- Latest versions
Persistent Workers
Best for:
- High-frequency tasks
- Real-time monitoring
- Custom compute environments
- Specific network access
- Low-latency needs
- Faster execution
- Run on any infrastructure
- Custom configuration
- Persistent state
- Your local machine (Mac, Linux, Windows)
- Kubernetes clusters
- Docker containers
- VMs (EC2, GCE, Azure)
- Bare metal servers
Worker Flexibility: Persistent workers can run on nearly any compute environment, making them incredibly powerful and easy to deploy wherever your infrastructure lives!
Output Formats
Text Output (Default)
Human-readable output with formatting:JSON Output
Machine-readable output for automation:YAML Output
Kubernetes-friendly output:Best Practices
Task Prompts
Be specific: “Deploy version 2.5.0 to production with health checks” vs “deploy”
Include context: “Scale api-service to 5 replicas in production” vs “scale service”
Specify environment: “Run tests on staging” vs “run tests”
Add constraints: “Deploy during maintenance window (2-4am UTC)” vs “deploy now”
Error Handling
- Check plan before approving: Review steps and cost estimates
- Use —stream for long tasks: Monitor progress in real-time
- Save important plans: Use —save-plan for repeatability
- Set appropriate priority: Use —priority for critical tasks
Cost Optimization
- Use on-demand workers: No infrastructure costs
- Auto-approve simple tasks: Skip confirmation with —yes
- Batch similar operations: Combine related tasks in one prompt
- Monitor costs: Review estimated vs actual costs in plans
Security
- Review plan steps: Ensure no unintended actions
- Use policies: Enforce compliance with OPA policies
- Rotate credentials: Don’t hardcode secrets in prompts
- Audit executions: Review saved plans and execution logs