Skip to main content
A runtime is the execution engine that powers your Kubiya agents. It’s the bridge between your agent’s configuration and the underlying AI models, managing everything from model interactions to tool execution and conversation state. Understanding runtimes helps you make informed decisions about agent configuration, optimize performance, and troubleshoot issues effectively.

What is a Runtime?

At its core, a runtime is responsible for:
  1. Model Orchestration: Routing requests to the appropriate LLM provider (OpenAI, Anthropic, Google, etc.) and managing model interactions
  2. Tool Integration: Executing Skills and MCP servers that give your agents capabilities
  3. State Management: Maintaining conversation history and context across multi-turn interactions
  4. Streaming: Providing real-time execution feedback as agents process requests
  5. Resource Management: Handling cancellation, timeouts, and resource cleanup
Think of runtimes as specialized interpreters - each optimized for different execution patterns. Just as different programming languages excel at different tasks, different runtimes are optimized for specific use cases.

Runtime Capabilities

Each runtime declares its capabilities, which determine what features are available to your agents:
Real-time execution feedbackStreaming runtimes provide immediate visibility into agent execution. As your agent processes a request, you can see:
  • Tool invocations as they happen
  • Partial responses as they’re generated
  • Token usage metrics in real-time
All Kubiya runtimes support streaming for optimal user experience.
Integration with Skills and capabilitiesTool calling enables agents to perform actions beyond text generation:
  • Execute shell commands (Shell skill)
  • Read and write files (File System skill)
  • Query databases and APIs
  • Manage infrastructure (Docker, Kubernetes)
Runtimes handle tool discovery, parameter validation, execution, and result parsing.
Model Context Protocol integrationMCP servers provide standardized interfaces for extending agent capabilities:
  • Connect to external APIs and services
  • Access proprietary data sources
  • Integrate custom tooling
  • Standardized protocol for tool discovery and execution
Both built-in runtimes fully support MCP servers.
Multi-turn conversation memoryConversation history enables agents to maintain context across interactions:
  • Remember previous requests and responses
  • Build on earlier context
  • Provide consistent, contextual answers
  • Support complex, multi-step workflows
Different runtimes support different history lengths (100-200 messages).
Stop long-running executionsCancellation allows you to interrupt agent execution:
  • Stop unresponsive agents
  • Terminate expensive operations
  • Clean up resources gracefully
  • Prevent runaway token consumption
Critical for production deployments and cost control.
Extend with your own capabilitiesCustom tool support enables runtime-specific extensions:
  • Agno: Python classes with get_tools() method
  • Claude Code: MCP servers with @tool decorator
  • Register tools dynamically at execution time
  • Validate tool interfaces before execution
Essential for integrating proprietary systems and workflows.

How Runtimes Fit in Kubiya

Runtimes sit at the heart of the agent execution pipeline:
User Request

[Agent Configuration]

[Runtime Selection]

[Runtime Execution Engine]
    ├─→ [Model Provider] (OpenAI, Anthropic, Google, etc.)
    ├─→ [Skills & Tools] (File System, Shell, Docker, etc.)
    ├─→ [MCP Servers] (Custom integrations)
    └─→ [Conversation State] (History & context)

Response & Tool Results

Integration Points:

Environments
  • Provide runtime configuration (model settings, timeouts)
  • Define execution boundaries (dev, staging, prod)
  • Set environment variables for runtime behavior
Models
  • Runtimes route requests to different LLM providers
  • Handle model-specific features (function calling, vision, etc.)
  • Manage token usage and cost tracking
Skills
  • Runtimes discover and execute configured Skills
  • Handle tool parameter validation and error recovery
  • Track tool execution for analytics
Teams
  • Teams can specify runtimes for all agents
  • Runtime selection is flexible: configure at agent, team, or environment level
Control Plane
  • Manages runtime registry and lifecycle
  • Routes execution requests to appropriate runtimes
  • Collects execution metrics and analytics

Selecting a Runtime

Choosing the right runtime depends on your use case, model preferences, and operational requirements:

Decision Framework:

1. What’s your primary use case?
  • General-purpose operations (Q&A, workflow orchestration, data processing) → Agno Runtime
  • Code generation, analysis, refactoringClaude Code Runtime
  • Specialized framework needs (LangChain, CrewAI, AutoGen) → Custom Runtime
2. What’s your model provider strategy?
  • Multiple providers (OpenAI + Anthropic + Google) → Agno Runtime
  • Claude-committed or exploring Claude capabilities → Claude Code Runtime
  • Custom provider integrationCustom Runtime
3. How complex are your conversations?
  • Short interactions (< 50 messages) → Either Agno or Claude Code
  • Long-running sessions (50-200 messages) → Claude Code (extended history)
  • Extremely long context (200+ messages) → Consider chunking or summarization
4. What are your performance priorities?
  • Fast startup timeAgno Runtime
  • Token efficiency → Depends on model choice (both runtimes efficient)
  • Specialized optimizations (code parsing, file operations) → Claude Code Runtime

Compare Runtimes Side-by-Side

See detailed feature comparison and use case recommendations

Common Questions

Yes, and it’s easy!You can change an agent’s runtime at any time by updating its configuration. The change takes effect on the next execution - no restart required.
Indirectly, through model selectionRuntimes themselves don’t have separate pricing. However:
  • Agno Runtime supports all model providers, so you can choose cost-effective options (GPT-3.5, Claude Haiku, Gemini Flash)
  • Claude Code Runtime requires Claude models, which have specific pricing
  • Token efficiency is comparable across runtimes when using the same model
The bigger cost factor is your model selection and usage patterns, not the runtime itself.
Both runtimes are production-gradePerformance characteristics:
  • Startup latency: Agno is slightly faster (< 100ms difference)
  • Streaming throughput: Comparable for both runtimes
  • Token processing: Determined by the model, not the runtime
  • Tool execution: Both runtimes execute tools efficiently
Choose based on features and model support, not performance - both are optimized for production.
Absolutely!Runtime selection is per-agent, so you can:
  • Use Agno for general-purpose agents
  • Use Claude Code for development-focused agents
  • Mix and match within the same organization
  • Even within the same team
This flexibility lets you optimize each agent for its specific use case.
Multiple approaches
  1. Enable debug logging in your agent configuration
  2. Check execution logs in the Kubiya dashboard
  3. Inspect tool execution details for failures
  4. Use runtime validation endpoints to verify configuration
  5. Review analytics for performance patterns
Choose based on your needs:
  • Agno Runtime: For multi-model flexibility and provider choice
  • Claude Code Runtime: For code-focused development workflows
  • Custom Runtime: For specialized frameworks (LangChain, CrewAI, etc.)
Each runtime can be configured at the agent, team, or environment level.

Next Steps