> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kubiya.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Agno Runtime

> Multi-model runtime with support for all major LLM providers via Agno framework and LiteLLM.

Multi-model execution engine supporting all major LLM providers. Built on [Agno](https://agno.dev) framework with [LiteLLM](https://docs.litellm.ai) integration for maximum flexibility.

***

## Key Features

<CardGroup cols={3}>
  <Card title="All Providers" icon="shuffle">
    OpenAI, Anthropic, Google, Mistral, Cohere, and any LiteLLM-compatible endpoint
  </Card>

  <Card title="Python Tools" icon="python">
    Native Python integration for custom tools and Skills
  </Card>

  <Card title="Fast Startup" icon="bolt">
    \~50ms cold start, \~10ms warm start
  </Card>
</CardGroup>

***

## Capabilities

| Feature      | Support      |
| ------------ | ------------ |
| Streaming    | ✅ Yes        |
| Tool Calling | ✅ Yes        |
| MCP Servers  | ✅ Yes        |
| History      | 100 messages |
| Cancellation | ✅ Yes        |
| Custom Tools | ✅ Yes        |

***

## Supported Models

Agno Runtime supports **any model available through LiteLLM**. There are no restrictions in the runtime itself - model support is determined by your LiteLLM proxy configuration.

**Commonly used providers:**

* **OpenAI**: GPT-4, GPT-4 Turbo, GPT-4o, GPT-3.5 Turbo
* **Anthropic**: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
* **Google**: Gemini Pro, Gemini Flash
* **Mistral AI**: Mistral Large, Mistral Medium
* **Cohere**: Command, Command R+
* **Any LiteLLM-compatible endpoint**

**Model selection**: Configure via `model_id` parameter or `LITELLM_DEFAULT_MODEL` environment variable.

Learn more: [LiteLLM Supported Models](https://docs.litellm.ai/docs/providers)

***

## When to Use

**Best for:**

* Multi-provider flexibility
* Cost optimization across providers
* General-purpose agents (support, Q\&A, workflows)
* Organizations avoiding vendor lock-in

**Consider alternatives:**

* Code-intensive tasks → Use [Claude Code Runtime](/core-concepts/runtimes/claude-code-runtime)
* Extended conversations (200+ messages) → Use [Claude Code Runtime](/core-concepts/runtimes/claude-code-runtime)

***

## Trade-offs

**Pros:**

* ✅ All LLM providers supported
* ✅ Mix models from different providers
* ✅ Python ecosystem integration
* ✅ Fast startup time

**Cons:**

* ❌ 100-message history limit (vs 200 for Claude Code)
* ❌ Not code-specialized
* ❌ Requires LiteLLM proxy

***

## Framework Links

<CardGroup cols={2}>
  <Card title="Agno Framework" icon="book" href="https://agno.dev">
    Official documentation and API reference
  </Card>

  <Card title="LiteLLM" icon="book" href="https://docs.litellm.ai">
    Provider list and configuration
  </Card>
</CardGroup>

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Compare Runtimes" icon="scale-balanced" href="/core-concepts/runtimes/comparison">
    See side-by-side comparison
  </Card>

  <Card title="Agent Setup" icon="robot" href="/core-concepts/agents">
    Configure agents with Agno
  </Card>
</CardGroup>
