> ## Documentation Index > Fetch the complete documentation index at: https://docs.kubiya.ai/llms.txt > Use this file to discover all available pages before exploring further. # Runtime Comparison > Side-by-side comparison of Agno and Claude Code runtimes to help you choose the right execution engine. Choose the right runtime for your agents by understanding the capabilities, tradeoffs, and ideal use cases for each option. *** ## Quick Comparison | Feature | Agno | Claude Code | | --------------------- | ------------------------- | --------------------- | | **Framework** | Agno + LiteLLM | Claude Code SDK | | **Model Support** | All providers via LiteLLM | Claude only | | **Streaming** | ✅ Yes | ✅ Yes | | **Tool Calling** | ✅ Python-based | ✅ MCP-based | | **MCP Servers** | ✅ Via MCPTools | ✅ First-class support | | **Max History** | 100 messages | 200 messages | | **Cancellation** | ✅ Yes | ✅ Yes | | **Custom Tools** | Python classes | MCP servers | | **Session Resume** | ❌ No | ✅ Yes | | **Code Optimization** | General | Specialized | | **Startup Time** | Fast (\~50ms) | Moderate (\~150ms) | | **Best For** | General-purpose | Code & development | *** ## Detailed Comparison ### Model Support **All major LLM providers via LiteLLM** Supported providers: * **OpenAI**: GPT-4, GPT-4 Turbo, GPT-3.5 * **Anthropic**: Claude 3 Opus, Sonnet, Haiku * **Google**: Gemini Pro, Gemini Flash * **Mistral**: Mistral Large, Mistral Medium * **Cohere**: Command, Command R+ * **Custom providers**: Any LiteLLM-compatible endpoint **Model selection example:** ```json theme={null} { "runtime": "agno", "model_id": "gpt-4o", // or "kubiya/claude-sonnet-4", "gemini-pro", etc. "model_config": { "temperature": 0.7, "max_tokens": 4096 } } ``` **Why choose Agno for models?** * Need to use multiple providers * Want flexibility to switch providers * Cost optimization across different models * Provider-agnostic architecture **Claude models only (optimized integration)** Supported models: * **Claude 3.5 Sonnet** (claude-sonnet-4) - Most capable, balanced * **Claude 3 Opus** - Maximum intelligence * **Claude 3 Sonnet** - Balanced performance * **Claude 3 Haiku** - Fast and economical **Model selection example:** ```json theme={null} { "runtime": "claude_code", "model_id": "anthropic/claude-3-opus", "runtime_config": { "max_history": 200, "session_resumption": true } } ``` **Why Claude-only?** * Deep integration with Claude SDK features * Optimized for Claude's extended context * Code-specific prompt engineering * Advanced file operation support *** ### Tool Integration **Python-based tools with Agno Toolkit** **Skills Integration:** * Python classes implementing Kubiya Skills interface * `get_tools()` method returns Agno Toolkit * Rich type system for parameters * Automatic validation and error handling **MCP Server Support:** * Via Agno's MCPTools adapter * Stdio and HTTP/SSE transports * Automatic tool discovery * Parameter mapping to Agno format **Custom Tool Example:** ```python theme={null} from agno.tools import Toolkit class CustomDatabaseTool: def get_tools(self) -> Toolkit: return Toolkit( name="database", tools=[self.query_database] ) def query_database(self, sql: str) -> str: # Execute database query return results ``` **Strengths:** * Flexible Python integration * Rich ecosystem of Agno tools * Easy local development * Type-safe parameter handling **MCP-based tools with first-class support** **Skills Integration:** * Skills mapped to MCP tool names * Automatic discovery by Claude Code SDK * Tools appear as `mcp____` * Built-in permission handling **MCP Server Support:** * Native MCP protocol support * Stdio and SSE transports * Resource discovery and caching * Tool name sanitization **Custom MCP Server Example:** ```python theme={null} from claude_agent_sdk import tool, create_sdk_mcp_server @tool( "query_database", "Execute a database query", {"sql": str} ) async def query_database(args: dict) -> dict: # Execute database query return { "content": [{ "type": "text", "text": str(results) }] } mcp_server = create_sdk_mcp_server( name="database", version="1.0.0", tools=[query_database] ) ``` **Strengths:** * Standardized MCP protocol * First-class Claude Code integration * Advanced permission control * Built-in resource management *** ### Streaming & Real-time Execution **Both runtimes support streaming**, but with different implementation details: **Agno Runtime:** * Event batching for efficiency * Tool execution hooks (`tool_start`, `tool_result`) * Real-time token streaming * Custom event callbacks **Claude Code Runtime:** * Character-by-character streaming (partial messages) * Direct SDK streaming integration * Real-time tool execution events * Session-aware streaming (resume support) **Performance:** Comparable for both - streaming adds \~10-20ms latency but provides significantly better UX. *** ### Conversation History **100-message capacity** **History Management:** * Messages stored in Control Plane database * Automatic history pruning at 100 messages * FIFO (first-in, first-out) pruning strategy * Per-agent history isolation **Message Format:** ```json theme={null} { "role": "user", // or "assistant", "system" "content": "Message text" } ``` **Best Practices:** * Use for short to medium conversations (\< 50 messages typical) * Consider summarization for longer workflows * History impacts token usage linearly * Monitor history length in analytics **When to use:** * Standard agent interactions * Q\&A workflows * Task-based executions * Most production use cases **200-message extended capacity** **History Management:** * Extended history for complex sessions * Session resumption across executions * Smart context management * Optimized for code context **Session Resumption:** ```json theme={null} { "runtime": "claude_code", "user_metadata": { "claude_code_session_id": "session-123" } } ``` **Best Practices:** * Leverage for long-running development sessions * Use session resumption for multi-turn refactoring * Extended history ideal for code review conversations * Monitor token usage carefully **When to use:** * Complex code generation (multiple files) * Iterative refactoring sessions * Architecture discussions * Technical design reviews *** ### Specialization & Optimization **Agno Runtime:** * **General-purpose** execution optimized for flexibility * No domain-specific optimizations * Model-agnostic performance tuning * Broad use case coverage **Claude Code Runtime:** * **Code-specific** optimizations: * Advanced file operation handling * Multi-file context awareness * Code parsing and analysis * Repository structure understanding * Syntax highlighting and formatting * Development workflow patterns * Extended context for complex codebases *** ## Decision Matrix ### Choose Agno Runtime if... ✅ You need **multiple LLM providers** (OpenAI, Anthropic, Google, Mistral, etc.) ✅ You want **maximum flexibility** in model selection ✅ Your use case is **general-purpose** (Q\&A, workflows, data processing) ✅ You prioritize **fast startup time** (\< 50ms) ✅ You're building **Python-based custom tools** ✅ You need **cost optimization** across different model tiers ✅ Your conversations are **short to medium** (\< 100 messages) **Ideal for:** * Customer support agents * Data processing workflows * General automation tasks * Multi-model testing and comparison * Organizations with diverse LLM strategies *** ### Choose Claude Code Runtime if... ✅ You're **Claude-committed** or heavily using Claude models ✅ Your primary use case is **code generation or analysis** ✅ You need **extended conversation history** (100-200 messages) ✅ You want **session resumption** for multi-turn workflows ✅ You're working with **complex codebases** or multi-file operations ✅ You need **optimized file operation** handling ✅ You're building **developer tools** or automation **Ideal for:** * Code generation and scaffolding * Automated refactoring * Repository analysis and audits * Technical documentation generation * Development workflow automation * Architecture review agents *** ## Switching Between Runtimes **Can you switch?** Yes! Runtime selection is per-agent and can be changed at any time. **Impact:** * Configuration change only - no code changes required * Takes effect on next execution - no restart needed * History is preserved (format is runtime-agnostic) * Tool configurations may need adjustment (Python vs MCP) **Migration Strategy:** 1. **Test in development** environment first 2. **Run parallel** (new runtime alongside old) 3. **Selective migration** (low-risk agents first) 4. **Monitor performance** and cost metrics 5. **Full cutover** once validated *** ## Runtime Interoperability **Can you use both runtimes simultaneously?** Absolutely! **Scenarios:** * **Agent-level selection**: Agent A uses Agno, Agent B uses Claude Code * **Team-level configuration**: Team 1 uses Agno, Team 2 uses Claude Code * **Use case optimization**: General agents on Agno, dev agents on Claude Code * **A/B testing**: Compare runtime performance for specific workflows **No conflicts** - runtimes are completely isolated at execution time. *** ## Performance Comparison ### Startup Latency | Runtime | Cold Start | Warm Start | | ----------- | ---------- | ---------- | | Agno | \~50ms | \~10ms | | Claude Code | \~150ms | \~30ms | **Note**: Startup difference is negligible for most use cases. Total execution time is dominated by model inference, not runtime overhead. ### Streaming Throughput | Runtime | Tokens/sec | Notes | | ----------- | ---------- | ----------------- | | Agno | \~40-60 | Via LiteLLM proxy | | Claude Code | \~50-70 | Direct Claude SDK | **Note**: Throughput depends more on model selection and provider API performance than runtime choice. ### Token Efficiency **Comparable** - both runtimes use tokens efficiently. Token consumption is determined by: * Model selection (primary factor) * Conversation history length * Tool usage patterns * Prompt engineering **Cost optimization tip**: Choose cheaper models (GPT-3.5, Claude Haiku, Gemini Flash) rather than switching runtimes. *** ## Cost Comparison **Runtime costs:** No separate charge - runtimes are included in Kubiya platform **Model costs:** Pay for underlying model usage (determined by provider) **Cost factors:** 1. **Model selection** (largest impact): * Premium: GPT-4, Claude Opus (\~\$0.015-0.03/1K tokens) * Balanced: Claude Sonnet, GPT-3.5 Turbo (\~\$0.001-0.003/1K tokens) * Economical: Claude Haiku, Gemini Flash (\~\$0.0001-0.0005/1K tokens) 2. **Conversation history** (linear impact): * 100-message history: \~10-30K tokens per execution * 200-message history: \~20-60K tokens per execution 3. **Tool usage** (moderate impact): * Each tool call adds \~500-1000 tokens overhead * Complex tools increase token usage **Cost optimization strategies:** * Use **Agno Runtime** with economical models (Haiku, Gemini Flash) * Implement **history pruning** for long conversations * Cache **frequent tool results** * Use **Claude Code** only for code-intensive tasks *** ## Frequently Asked Questions **Choose based on your needs:** * **Multi-model flexibility?** Start with Agno Runtime * **Code-focused tasks?** Start with Claude Code Runtime You can easily switch between runtimes at any time as your needs evolve. **Yes, easily** Update your agent configuration with the new runtime value. The change takes effect on the next execution. No downtime, no data loss. **No runtime charges** You pay for the underlying Claude model usage, but the runtime itself is included. Agno and Claude Code have the same platform cost. **Use multiple agents** Create one agent with Agno for general tasks and another with Claude Code for development tasks. You can use both simultaneously. **Maximum flexibility, custom integration** Custom runtimes let you integrate frameworks like LangChain, CrewAI, or AutoGen. You control all execution logic, tool integration, and model interaction. See our [custom runtime guide](/core-concepts/runtimes/custom-runtimes) for details. *** ## Next Steps Deep dive into Agno runtime configuration and capabilities Explore Claude Code runtime features and use cases Build your own runtime with custom frameworks Runtime registry and orchestration