What is a Runtime?
At its core, a runtime is responsible for:- Model Orchestration: Routing requests to the appropriate LLM provider (OpenAI, Anthropic, Google, etc.) and managing model interactions
- Tool Integration: Executing Skills and MCP servers that give your agents capabilities
- State Management: Maintaining conversation history and context across multi-turn interactions
- Streaming: Providing real-time execution feedback as agents process requests
- Resource Management: Handling cancellation, timeouts, and resource cleanup
Runtime Capabilities
Each runtime declares its capabilities, which determine what features are available to your agents:Streaming
Streaming
Real-time execution feedbackStreaming runtimes provide immediate visibility into agent execution. As your agent processes a request, you can see:
- Tool invocations as they happen
- Partial responses as they’re generated
- Token usage metrics in real-time
Tool Calling
Tool Calling
Integration with Skills and capabilitiesTool calling enables agents to perform actions beyond text generation:
- Execute shell commands (Shell skill)
- Read and write files (File System skill)
- Query databases and APIs
- Manage infrastructure (Docker, Kubernetes)
MCP Server Support
MCP Server Support
Model Context Protocol integrationMCP servers provide standardized interfaces for extending agent capabilities:
- Connect to external APIs and services
- Access proprietary data sources
- Integrate custom tooling
- Standardized protocol for tool discovery and execution
Conversation History
Conversation History
Multi-turn conversation memoryConversation history enables agents to maintain context across interactions:
- Remember previous requests and responses
- Build on earlier context
- Provide consistent, contextual answers
- Support complex, multi-step workflows
Cancellation
Cancellation
Stop long-running executionsCancellation allows you to interrupt agent execution:
- Stop unresponsive agents
- Terminate expensive operations
- Clean up resources gracefully
- Prevent runaway token consumption
Custom Tools
Custom Tools
Extend with your own capabilitiesCustom tool support enables runtime-specific extensions:
- Agno: Python classes with
get_tools()method - Claude Code: MCP servers with
@tooldecorator - Register tools dynamically at execution time
- Validate tool interfaces before execution
How Runtimes Fit in Kubiya
Runtimes sit at the heart of the agent execution pipeline:Integration Points:
Environments- Provide runtime configuration (model settings, timeouts)
- Define execution boundaries (dev, staging, prod)
- Set environment variables for runtime behavior
- Runtimes route requests to different LLM providers
- Handle model-specific features (function calling, vision, etc.)
- Manage token usage and cost tracking
- Runtimes discover and execute configured Skills
- Handle tool parameter validation and error recovery
- Track tool execution for analytics
- Teams can specify runtimes for all agents
- Runtime selection is flexible: configure at agent, team, or environment level
- Manages runtime registry and lifecycle
- Routes execution requests to appropriate runtimes
- Collects execution metrics and analytics
Selecting a Runtime
Choosing the right runtime depends on your use case, model preferences, and operational requirements:Decision Framework:
1. What’s your primary use case?- General-purpose operations (Q&A, workflow orchestration, data processing) → Agno Runtime
- Code generation, analysis, refactoring → Claude Code Runtime
- Specialized framework needs (LangChain, CrewAI, AutoGen) → Custom Runtime
- Multiple providers (OpenAI + Anthropic + Google) → Agno Runtime
- Claude-committed or exploring Claude capabilities → Claude Code Runtime
- Custom provider integration → Custom Runtime
- Short interactions (< 50 messages) → Either Agno or Claude Code
- Long-running sessions (50-200 messages) → Claude Code (extended history)
- Extremely long context (200+ messages) → Consider chunking or summarization
- Fast startup time → Agno Runtime
- Token efficiency → Depends on model choice (both runtimes efficient)
- Specialized optimizations (code parsing, file operations) → Claude Code Runtime
Compare Runtimes Side-by-Side
See detailed feature comparison and use case recommendations
Common Questions
Can I switch runtimes?
Can I switch runtimes?
Yes, and it’s easy!You can change an agent’s runtime at any time by updating its configuration. The change takes effect on the next execution - no restart required.
Do runtimes affect cost?
Do runtimes affect cost?
Indirectly, through model selectionRuntimes themselves don’t have separate pricing. However:
- Agno Runtime supports all model providers, so you can choose cost-effective options (GPT-3.5, Claude Haiku, Gemini Flash)
- Claude Code Runtime requires Claude models, which have specific pricing
- Token efficiency is comparable across runtimes when using the same model
What about performance?
What about performance?
Both runtimes are production-gradePerformance characteristics:
- Startup latency: Agno is slightly faster (< 100ms difference)
- Streaming throughput: Comparable for both runtimes
- Token processing: Determined by the model, not the runtime
- Tool execution: Both runtimes execute tools efficiently
Can I run both runtimes?
Can I run both runtimes?
Absolutely!Runtime selection is per-agent, so you can:
- Use Agno for general-purpose agents
- Use Claude Code for development-focused agents
- Mix and match within the same organization
- Even within the same team
How do I debug runtime issues?
How do I debug runtime issues?
Multiple approaches
- Enable debug logging in your agent configuration
- Check execution logs in the Kubiya dashboard
- Inspect tool execution details for failures
- Use runtime validation endpoints to verify configuration
- Review analytics for performance patterns
Which runtime should I choose?
Which runtime should I choose?
Choose based on your needs:
- Agno Runtime: For multi-model flexibility and provider choice
- Claude Code Runtime: For code-focused development workflows
- Custom Runtime: For specialized frameworks (LangChain, CrewAI, etc.)