A runtime is the execution engine that powers your Kubiya agents. It’s the bridge between your agent’s configuration and the underlying AI models, managing everything from model interactions to tool execution and conversation state. Understanding runtimes helps you make informed decisions about agent configuration, optimize performance, and troubleshoot issues effectively.Documentation Index
Fetch the complete documentation index at: https://docs.kubiya.ai/llms.txt
Use this file to discover all available pages before exploring further.
What is a Runtime?
At its core, a runtime is responsible for:- Model Orchestration: Routing requests to the appropriate LLM provider (OpenAI, Anthropic, Google, etc.) and managing model interactions
- Tool Integration: Executing Skills and MCP servers that give your agents capabilities
- State Management: Maintaining conversation history and context across multi-turn interactions
- Streaming: Providing real-time execution feedback as agents process requests
- Resource Management: Handling cancellation, timeouts, and resource cleanup
Runtime Capabilities
Each runtime declares its capabilities, which determine what features are available to your agents:Streaming
Streaming
Real-time execution feedbackStreaming runtimes provide immediate visibility into agent execution. As your agent processes a request, you can see:
- Tool invocations as they happen
- Partial responses as they’re generated
- Token usage metrics in real-time
Tool Calling
Tool Calling
Integration with Skills and capabilitiesTool calling enables agents to perform actions beyond text generation:
- Execute shell commands (Shell skill)
- Read and write files (File System skill)
- Query databases and APIs
- Manage infrastructure (Docker, Kubernetes)
MCP Server Support
MCP Server Support
Model Context Protocol integrationMCP servers provide standardized interfaces for extending agent capabilities:
- Connect to external APIs and services
- Access proprietary data sources
- Integrate custom tooling
- Standardized protocol for tool discovery and execution
Conversation History
Conversation History
Multi-turn conversation memoryConversation history enables agents to maintain context across interactions:
- Remember previous requests and responses
- Build on earlier context
- Provide consistent, contextual answers
- Support complex, multi-step workflows
Cancellation
Cancellation
Stop long-running executionsCancellation allows you to interrupt agent execution:
- Stop unresponsive agents
- Terminate expensive operations
- Clean up resources gracefully
- Prevent runaway token consumption
Custom Tools
Custom Tools
Extend with your own capabilitiesCustom tool support enables runtime-specific extensions:
- Agno: Python classes with
get_tools()method - Claude Code: MCP servers with
@tooldecorator - Register tools dynamically at execution time
- Validate tool interfaces before execution
How Runtimes Fit in Kubiya
Runtimes sit at the heart of the agent execution pipeline:Integration Points:
Environments- Provide runtime configuration (model settings, timeouts)
- Define execution boundaries (dev, staging, prod)
- Set environment variables for runtime behavior
- Runtimes route requests to different LLM providers
- Handle model-specific features (function calling, vision, etc.)
- Manage token usage and cost tracking
- Runtimes discover and execute configured Skills
- Handle tool parameter validation and error recovery
- Track tool execution for analytics
- Teams can specify runtimes for all agents
- Runtime selection is flexible: configure at agent, team, or environment level
- Manages runtime registry and lifecycle
- Routes execution requests to appropriate runtimes
- Collects execution metrics and analytics
Selecting a Runtime
Choosing the right runtime depends on your use case, model preferences, and operational requirements:Decision Framework:
1. What’s your primary use case?- General-purpose operations (Q&A, workflow orchestration, data processing) → Agno Runtime
- Code generation, analysis, refactoring → Claude Code Runtime
- Specialized framework needs (LangChain, CrewAI, AutoGen) → Custom Runtime
- Multiple providers (OpenAI + Anthropic + Google) → Agno Runtime
- Claude-committed or exploring Claude capabilities → Claude Code Runtime
- Custom provider integration → Custom Runtime
- Short interactions (< 50 messages) → Either Agno or Claude Code
- Long-running sessions (50-200 messages) → Claude Code (extended history)
- Extremely long context (200+ messages) → Consider chunking or summarization
- Fast startup time → Agno Runtime
- Token efficiency → Depends on model choice (both runtimes efficient)
- Specialized optimizations (code parsing, file operations) → Claude Code Runtime
Compare Runtimes Side-by-Side
See detailed feature comparison and use case recommendations
Common Questions
Can I switch runtimes?
Can I switch runtimes?
Yes, and it’s easy!You can change an agent’s runtime at any time by updating its configuration. The change takes effect on the next execution - no restart required.
Do runtimes affect cost?
Do runtimes affect cost?
Indirectly, through model selectionRuntimes themselves don’t have separate pricing. However:
- Agno Runtime supports all model providers, so you can choose cost-effective options (GPT-3.5, Claude Haiku, Gemini Flash)
- Claude Code Runtime requires Claude models, which have specific pricing
- Token efficiency is comparable across runtimes when using the same model
What about performance?
What about performance?
Both runtimes are production-gradePerformance characteristics:
- Startup latency: Agno is slightly faster (< 100ms difference)
- Streaming throughput: Comparable for both runtimes
- Token processing: Determined by the model, not the runtime
- Tool execution: Both runtimes execute tools efficiently
Can I run both runtimes?
Can I run both runtimes?
Absolutely!Runtime selection is per-agent, so you can:
- Use Agno for general-purpose agents
- Use Claude Code for development-focused agents
- Mix and match within the same organization
- Even within the same team
How do I debug runtime issues?
How do I debug runtime issues?
Multiple approaches
- Enable debug logging in your agent configuration
- Check execution logs in the Kubiya dashboard
- Inspect tool execution details for failures
- Use runtime validation endpoints to verify configuration
- Review analytics for performance patterns
Which runtime should I choose?
Which runtime should I choose?
Choose based on your needs:
- Agno Runtime: For multi-model flexibility and provider choice
- Claude Code Runtime: For code-focused development workflows
- Custom Runtime: For specialized frameworks (LangChain, CrewAI, etc.)
Next Steps
Agno Runtime
Multi-model runtime with flexible provider support
Claude Code Runtime
Learn about the code-optimized runtime
Runtime Comparison
Compare features and choose the right runtime
Custom Runtimes
Build your own runtime with custom frameworks