Understanding Runtimes

A runtime is the execution engine that powers your Kubiya agents. It’s the bridge between your agent’s configuration and the underlying AI models, managing everything from model interactions to tool execution and conversation state. Understanding runtimes helps you make informed decisions about agent configuration, optimize performance, and troubleshoot issues effectively.

What is a Runtime?

At its core, a runtime is responsible for:

Model Orchestration: Routing requests to the appropriate LLM provider (OpenAI, Anthropic, Google, etc.) and managing model interactions
Tool Integration: Executing Skills and MCP servers that give your agents capabilities
State Management: Maintaining conversation history and context across multi-turn interactions
Streaming: Providing real-time execution feedback as agents process requests
Resource Management: Handling cancellation, timeouts, and resource cleanup

Think of runtimes as specialized interpreters - each optimized for different execution patterns. Just as different programming languages excel at different tasks, different runtimes are optimized for specific use cases.

Runtime Capabilities

Each runtime declares its capabilities, which determine what features are available to your agents:

Streaming

Real-time execution feedbackStreaming runtimes provide immediate visibility into agent execution. As your agent processes a request, you can see:

Tool invocations as they happen
Partial responses as they’re generated
Token usage metrics in real-time

All Kubiya runtimes support streaming for optimal user experience.

Tool Calling

Integration with Skills and capabilitiesTool calling enables agents to perform actions beyond text generation:

Execute shell commands (Shell skill)
Read and write files (File System skill)
Query databases and APIs
Manage infrastructure (Docker, Kubernetes)

Runtimes handle tool discovery, parameter validation, execution, and result parsing.

MCP Server Support

Model Context Protocol integrationMCP servers provide standardized interfaces for extending agent capabilities:

Connect to external APIs and services
Access proprietary data sources
Integrate custom tooling
Standardized protocol for tool discovery and execution

Both built-in runtimes fully support MCP servers.

Conversation History

Multi-turn conversation memoryConversation history enables agents to maintain context across interactions:

Remember previous requests and responses
Build on earlier context
Provide consistent, contextual answers
Support complex, multi-step workflows

Different runtimes support different history lengths (100-200 messages).

Cancellation

Stop long-running executionsCancellation allows you to interrupt agent execution:

Stop unresponsive agents
Terminate expensive operations
Clean up resources gracefully
Prevent runaway token consumption

Critical for production deployments and cost control.

Custom Tools

Extend with your own capabilitiesCustom tool support enables runtime-specific extensions:

Agno: Python classes with get_tools() method
Claude Code: MCP servers with @tool decorator
Register tools dynamically at execution time
Validate tool interfaces before execution

Essential for integrating proprietary systems and workflows.

How Runtimes Fit in Kubiya

Runtimes sit at the heart of the agent execution pipeline:

User Request
    ↓
[Agent Configuration]
    ↓
[Runtime Selection]
    ↓
[Runtime Execution Engine]
    ├─→ [Model Provider] (OpenAI, Anthropic, Google, etc.)
    ├─→ [Skills & Tools] (File System, Shell, Docker, etc.)
    ├─→ [MCP Servers] (Custom integrations)
    └─→ [Conversation State] (History & context)
    ↓
Response & Tool Results

Integration Points:

Environments

Provide runtime configuration (model settings, timeouts)
Define execution boundaries (dev, staging, prod)
Set environment variables for runtime behavior

Models

Runtimes route requests to different LLM providers
Handle model-specific features (function calling, vision, etc.)
Manage token usage and cost tracking

Skills

Runtimes discover and execute configured Skills
Handle tool parameter validation and error recovery
Track tool execution for analytics

Teams

Teams can specify runtimes for all agents
Runtime selection is flexible: configure at agent, team, or environment level

Control Plane

Manages runtime registry and lifecycle
Routes execution requests to appropriate runtimes
Collects execution metrics and analytics

Selecting a Runtime

Choosing the right runtime depends on your use case, model preferences, and operational requirements:

Decision Framework:

1. What’s your primary use case?

General-purpose operations (Q&A, workflow orchestration, data processing) → Agno Runtime
Code generation, analysis, refactoring → Claude Code Runtime
Specialized framework needs (LangChain, CrewAI, AutoGen) → Custom Runtime

2. What’s your model provider strategy?

Multiple providers (OpenAI + Anthropic + Google) → Agno Runtime
Claude-committed or exploring Claude capabilities → Claude Code Runtime
Custom provider integration → Custom Runtime

3. How complex are your conversations?

Short interactions (< 50 messages) → Either Agno or Claude Code
Long-running sessions (50-200 messages) → Claude Code (extended history)
Extremely long context (200+ messages) → Consider chunking or summarization

4. What are your performance priorities?

Fast startup time → Agno Runtime
Token efficiency → Depends on model choice (both runtimes efficient)
Specialized optimizations (code parsing, file operations) → Claude Code Runtime

Compare Runtimes Side-by-Side

See detailed feature comparison and use case recommendations

Common Questions

Can I switch runtimes?

Yes, and it’s easy!You can change an agent’s runtime at any time by updating its configuration. The change takes effect on the next execution - no restart required.

Do runtimes affect cost?

Indirectly, through model selectionRuntimes themselves don’t have separate pricing. However:

Agno Runtime supports all model providers, so you can choose cost-effective options (GPT-3.5, Claude Haiku, Gemini Flash)
Claude Code Runtime requires Claude models, which have specific pricing
Token efficiency is comparable across runtimes when using the same model

The bigger cost factor is your model selection and usage patterns, not the runtime itself.

What about performance?

Both runtimes are production-gradePerformance characteristics:

Startup latency: Agno is slightly faster (< 100ms difference)
Streaming throughput: Comparable for both runtimes
Token processing: Determined by the model, not the runtime
Tool execution: Both runtimes execute tools efficiently

Choose based on features and model support, not performance - both are optimized for production.

Can I run both runtimes?

Absolutely!Runtime selection is per-agent, so you can:

Use Agno for general-purpose agents
Use Claude Code for development-focused agents
Mix and match within the same organization
Even within the same team

This flexibility lets you optimize each agent for its specific use case.

How do I debug runtime issues?

Multiple approaches

Enable debug logging in your agent configuration
Check execution logs in the Kubiya dashboard
Inspect tool execution details for failures
Use runtime validation endpoints to verify configuration
Review analytics for performance patterns

Which runtime should I choose?

Choose based on your needs:

Agno Runtime: For multi-model flexibility and provider choice
Claude Code Runtime: For code-focused development workflows
Custom Runtime: For specialized frameworks (LangChain, CrewAI, etc.)

Each runtime can be configured at the agent, team, or environment level.

Next Steps

Agno Runtime

Multi-model runtime with flexible provider support

Claude Code Runtime

Learn about the code-optimized runtime

Runtime Comparison

Compare features and choose the right runtime

Custom Runtimes

Build your own runtime with custom frameworks

Introduction

Quick Start

Web Interface

Core Concepts

Infrastructure

Understanding Runtimes

What is a Runtime?

Runtime Capabilities

How Runtimes Fit in Kubiya

Integration Points:

Selecting a Runtime

Decision Framework:

Compare Runtimes Side-by-Side

Common Questions

Next Steps

Agno Runtime

Claude Code Runtime

Runtime Comparison

Custom Runtimes

Introduction

Quick Start

Web Interface

Core Concepts

Infrastructure

​What is a Runtime?

​Runtime Capabilities

​How Runtimes Fit in Kubiya

​Integration Points:

​Selecting a Runtime

​Decision Framework:

Compare Runtimes Side-by-Side

​Common Questions

​Next Steps

Agno Runtime

Claude Code Runtime

Runtime Comparison

Custom Runtimes

What is a Runtime?

Runtime Capabilities

How Runtimes Fit in Kubiya

Integration Points:

Selecting a Runtime

Decision Framework:

Common Questions

Next Steps