Skip to main content
AgentFlow gives operators a live view into AI usage and a practical control plane for LLM spend. The Usage settings dashboard combines durable per-call telemetry with live monthly budget counters, so teams can see where cost is coming from, tune caps, and understand what happens when a workspace reaches its limit.

Tenant-wide visibility

Track request volume, token usage, cost, failures, embeddings, and model mix across the selected time window.

Per-user accountability

See current-month user spend, effective caps, and selected-range cost by principal.

Live budget enforcement

Monthly tenant and user caps are checked before outbound LLM calls and reset automatically each month.

Flexible operations

Defaults can come from environment variables, then admins can override tenant and user caps in settings.

Dashboard experience

The Usage tab in settings is admin-only and built for operators who need to answer four questions quickly:
QuestionDashboard section
How much have we spent this month?Tenant budget card with spend, cap, percent used, and remaining amount
Is usage accelerating?Daily cost trend for 7d, 30d, 90d, or all
Which models are driving cost?By-model donut chart with provider/model labels
Which agents and users are driving usage?By-agent chart and by-user table with monthly caps
The dashboard intentionally separates the selected reporting window from the monthly budget window. For example, an operator can view the last 7 days of cost while still seeing the current-month spend against the active cap. While the page is open, it quietly refreshes usage and budget data every 30 seconds and refreshes again when the browser tab becomes visible.

Access control

Usage and budget endpoints require an authenticated human user with an admin scope. This prevents ordinary tenant users from viewing tenant-wide spend or changing caps.
LayerBehavior
API guard/api/v1/admin/usage/* and /api/v1/admin/budget* reject non-admin users with HTTP 403
Scope sourceTokens can carry one of AGENTFLOW_ADMIN_SCOPES, and Auth0 role or permissionType values of admin map to agentflow:admin
SPA guardThe Usage settings tab and cap-edit controls are hidden unless the current user is an admin
Admin authorization is enforced server-side. The SPA guard is an operator experience improvement, not the security boundary.

Tracking pipeline

Every tracked LLM call flows through the LiteLLM gateway. The gateway resolves tenant and user attribution from request context, checks budgets and shared request-rate limits, calls the provider, extracts usage, computes cost from the model registry, updates the live budget counter, and writes a durable event through the outbox. The same accounting path is used for chat completions, Responses API calls, batch secondary model calls, and LiteLLM embeddings used by knowledge-base ingestion or search. That means RAG-heavy tenants see embedding spend in the dashboard instead of only generation spend. Durable usage events include:
Field groupExamples
Attributiontenant_name, user_id, principal_type, agent_id, agent_name, conversation_id, call_id
Modelprovider, model_name, pricing_version
Tokensinput, output, cached-read, cache-write, and reasoning token counts
Costinput, output, cached-read, cache-write, and total USD cost
Outcomeduration, success flag, error type, and metadata
The dashboard uses durable rows for historical breakdowns, while the budget card uses the larger of durable monthly spend and the live counter. That gives operators immediate enforcement visibility even if the outbox is still catching up. For cost-tracked generation and embedding models, AgentFlow fails closed when pricing is missing. Add pricing to the model registry before enabling a new model in production; otherwise the request is rejected instead of being recorded as zero-cost.

Budget model

AgentFlow supports two monthly spend caps:
CapDefaultOverride locationEnforcement
Tenant capLLM_BUDGET_PER_TENANTSettings > Usage, tenant budget modalChecked before every tracked LLM call
User capLLM_BUDGET_PER_USERSettings > Usage, by-user tableChecked for authenticated user-attributed calls
Tenant caps are the workspace-level guardrail. User caps provide accountability and prevent one user from consuming the whole allocation. In the UI, a user’s effective cap is clamped by the tenant cap so the table reflects the maximum spend that user can consume under the workspace policy. When a cap is exceeded, AgentFlow returns HTTP 429 with a structured payload and a Retry-After header. The reset timestamp is the first instant of the next UTC month.
{
  "error": "tenant_budget_exceeded",
  "message": "Monthly LLM budget reached. Contact your admin or wait until the next reset.",
  "current_spend_usd": 100.12,
  "limit_usd": 100,
  "period": "month",
  "period_resets_at": "2026-05-01T00:00:00Z"
}

Rate controls

AgentFlow’s runtime controls are deliberately layered:
ControlWhat it doesConfiguration
LLM concurrencyCaps simultaneous outbound LLM calls per API processLLM_CONCURRENCY_LIMIT
Shared request-rate limitsFixed-window tenant and user RPM limits shared through RedisLLM_RATE_LIMIT_RPM_PER_TENANT, LLM_RATE_LIMIT_RPM_PER_USER
Monthly spend budgetsStops new tracked LLM calls after the tenant or user cap is reachedLLM_BUDGET_PER_TENANT, LLM_BUDGET_PER_USER, settings overrides
Provider limitsProvider-enforced RPM/TPM and quota limitsProvider account or BYOK key tier
LLM_CONCURRENCY_LIMIT is a pressure valve. The LLM_RATE_LIMIT_RPM_PER_* settings add shared request-per-minute throttles across horizontally scaled API replicas when Redis is configured. Provider TPM limits, quota limits, and any model-specific enforcement still come from the provider account or BYOK key tier. When AgentFlow rate-limits a call, it returns HTTP 429 with error: "llm_rate_limit_exceeded", the limit scope, limit value, and a Retry-After header.
Because exact spend is known after a response returns, monthly budget enforcement is preflight-based. Concurrent in-flight requests can exceed the cap by the cost of requests that passed the check before earlier requests recorded spend. Use conservative caps, shared RPM limits, provider-side limits, and concurrency tuning for high-volume tenants.

Configuration

# Per-process outbound LLM concurrency
LLM_CONCURRENCY_LIMIT=30

# Shared fixed-window request limits; 0 disables the limit
LLM_RATE_LIMIT_RPM_PER_TENANT=600
LLM_RATE_LIMIT_RPM_PER_USER=120

# Monthly caps used when settings overrides are unset
LLM_BUDGET_PER_TENANT=100.0
LLM_BUDGET_PER_USER=100.0

# Admin scopes accepted for Usage and Budget settings APIs
AGENTFLOW_ADMIN_SCOPES=agentflow:admin admin:agentflow admin:usage usage:admin admin:budget budget:admin

# Shared counter/cache backend for multi-replica deployments
REDIS_URL=redis://redis:6379/0
For production, configure Redis so budget counters and RPM counters are shared across API replicas. Without Redis, AgentFlow falls back to in-process counters, which are useful for local development and single-process deployments but are not global ledgers.

Operations checklist

  • Keep pricing entries current in the model registry before enabling a new model for production use.
  • Treat the Usage dashboard as the source for LLM call spend and adoption analytics; reconcile with provider invoices for finance-close workflows.
  • Grant one of AGENTFLOW_ADMIN_SCOPES only to operators who should see tenant-wide usage and update caps.
  • Tune LLM_CONCURRENCY_LIMIT to the provider tier and expected request duration.
  • Set tenant and user RPM limits when you need shared burst control across replicas.
  • Configure Redis for any horizontally scaled deployment.
  • Add the same usage-event and spend-recording hooks when introducing new provider paths or non-chat model calls.