Models & Providers
AgentFlow uses LiteLLM as its model gateway, providing unified access to 100+ LLM providers through a single interface. Switch between models per-agent, per-request, or per-user — with full visibility into capabilities, context windows, and pricing.Multi-provider architecture
All LLM calls go through LiteLLM, which handles:- Provider dispatch — routes to OpenAI, Anthropic, Google, xAI, Azure, AWS Bedrock, and 100+ others
- Unified API — same request/response format regardless of provider
- Automatic retries — exponential backoff with configurable retry count
- Rate limit handling — concurrency semaphore prevents flooding provider rate limits
- Semantic caching — optional deduplication of identical requests
Supported providers
| Provider | Models | Highlights |
|---|---|---|
| OpenAI | GPT-5, GPT-5 Mini, GPT-4.1, o-series | Reasoning effort controls, vision, structured output |
| Anthropic | Claude Sonnet 4, Claude 4 Opus | Large context windows, vision |
| Gemini 2.5, Gemini 3 Preview | Multi-modal, long context | |
| xAI | Grok 3, Grok 3 Fast | Up to 2M token context |
| Azure OpenAI | GPT-4.1, GPT-5 (via Azure) | Enterprise compliance, private endpoints |
| AWS Bedrock | Claude, Titan, Llama | VPC-native, no data leaves AWS |
| 100+ more | Via LiteLLM | Any provider LiteLLM supports works out of the box |
Bring your own API keys
AgentFlow supports tenant-scoped API key management. Each tenant can configure their own provider API keys, ensuring LLM traffic routes through their accounts for billing, compliance, and data residency requirements.Model catalog API
List all available models with capabilities and pricing:Model selection
Models can be set at three levels, with per-request taking highest priority:Per-agent default
Per-user default
Users set a preferred model via the settings API. All requests use this model unless overridden at the agent or request level.Per-request override
Reasoning effort
For models that support reasoning (OpenAI o-series, GPT-5, and others), control the depth of chain-of-thought reasoning:| Level | Use case |
|---|---|
none | Skip reasoning entirely |
minimal | Lightest reasoning pass |
low | Fast, simple tasks |
medium | Balanced (default) |
high | Complex analysis, multi-step reasoning |
xhigh | Maximum reasoning depth |
Reasoning summary
Control whether and how the model’s reasoning is surfaced:| Mode | Behavior |
|---|---|
"auto" | Model decides whether to include reasoning |
"concise" | Brief reasoning summary included |
"detailed" | Full reasoning trace included |
Structured output
Force the model to return valid JSON matching a specific schema:Via SDK (Pydantic models)
Via REST API (JSON schema)
response_format types:
{"type": "json_object"}— model returns valid JSON (schema not enforced){"type": "json_schema", ...}— model returns JSON matching the exact schema (strict mode)
Vision
Models with vision support automatically handle image attachments:image_detail controls resolution: "low" (faster, cheaper), "high" (full resolution), or "auto" (model decides, default).
When images are present and no model override is specified, AgentFlow automatically selects a vision-capable model.
Context window management
AgentFlow tracks token usage against each model’s context window:- SSE events include
context_window_sizeandcontext_usage_percentagein metadata - Per-request metrics:
primary_total_tokens,primary_model,primary_context_usage_percentage - Conversation memory management automatically triggers when approaching context limits
- The LLM API itself enforces hard limits — oversized requests return clear errors through the SSE error handler

