Usage Tracking & Budgets

AgentFlow gives operators a live view into AI usage and a practical control plane for LLM spend. The Usage settings dashboard combines durable per-call telemetry with live monthly budget counters, so teams can see where cost is coming from, tune caps, and understand what happens when a workspace reaches its limit.

Tenant-wide visibility

Track request volume, token usage, cost, failures, embeddings, and model mix across the selected time window.

Per-user accountability

See current-month user spend, effective caps, and selected-range cost by principal.

Live budget enforcement

Monthly tenant and user caps are checked before outbound LLM calls and reset automatically each month.

Flexible operations

Defaults can come from environment variables, then admins can override tenant and user caps in settings.

Dashboard experience

The Usage tab in settings is admin-only and built for operators who need to answer four questions quickly:

Question	Dashboard section
How much have we spent this month?	Tenant budget card with spend, cap, percent used, and remaining amount
Is usage accelerating?	Daily cost trend for `7d`, `30d`, `90d`, or `all`
Which models are driving cost?	By-model donut chart with provider/model labels
Which agents and users are driving usage?	By-agent chart and by-user table with monthly caps

The dashboard intentionally separates the selected reporting window from the monthly budget window. For example, an operator can view the last 7 days of cost while still seeing the current-month spend against the active cap. While the page is open, it quietly refreshes usage and budget data every 30 seconds and refreshes again when the browser tab becomes visible.

Access control

Usage and budget endpoints require an authenticated human user with an admin scope. This prevents ordinary tenant users from viewing tenant-wide spend or changing caps.

Layer	Behavior
API guard	`/api/v1/admin/usage/` and `/api/v1/admin/budget` reject non-admin users with HTTP `403`
Scope source	Tokens can carry one of `AGENTFLOW_ADMIN_SCOPES`, and Auth0 `role` or `permissionType` values of `admin` map to `agentflow:admin`
SPA guard	The Usage settings tab and cap-edit controls are hidden unless the current user is an admin

Admin authorization is enforced server-side. The SPA guard is an operator experience improvement, not the security boundary.

Tracking pipeline

Every tracked LLM call flows through the LiteLLM gateway. The gateway resolves tenant and user attribution from request context, checks budgets and shared request-rate limits, calls the provider, extracts usage, computes cost from the model registry, updates the live budget counter, and writes a durable event through the outbox. The same accounting path is used for chat completions, Responses API calls, batch secondary model calls, and LiteLLM embeddings used by knowledge-base ingestion or search. That means RAG-heavy tenants see embedding spend in the dashboard instead of only generation spend. Durable usage events include:

Field group	Examples
Attribution	`tenant_name`, `user_id`, `principal_type`, `agent_id`, `agent_name`, `conversation_id`, `call_id`
Model	`provider`, `model_name`, `pricing_version`
Tokens	input, output, cached-read, cache-write, and reasoning token counts
Cost	input, output, cached-read, cache-write, and total USD cost
Outcome	duration, success flag, error type, and metadata

The dashboard uses durable rows for historical breakdowns, while the budget card uses the larger of durable monthly spend and the live counter. That gives operators immediate enforcement visibility even if the outbox is still catching up. For cost-tracked generation and embedding models, AgentFlow fails closed when pricing is missing. Add pricing to the model registry before enabling a new model in production; otherwise the request is rejected instead of being recorded as zero-cost.

Budget model

AgentFlow supports two monthly spend caps:

Cap	Default	Override location	Enforcement
Tenant cap	`LLM_BUDGET_PER_TENANT`	Settings > Usage, tenant budget modal	Checked before every tracked LLM call
User cap	`LLM_BUDGET_PER_USER`	Settings > Usage, by-user table	Checked for authenticated user-attributed calls

Tenant caps are the workspace-level guardrail. User caps provide accountability and prevent one user from consuming the whole allocation. In the UI, a user’s effective cap is clamped by the tenant cap so the table reflects the maximum spend that user can consume under the workspace policy. When a cap is exceeded, AgentFlow returns HTTP 429 with a structured payload and a Retry-After header. The reset timestamp is the first instant of the next UTC month.

{
  "error": "tenant_budget_exceeded",
  "message": "Monthly LLM budget reached. Contact your admin or wait until the next reset.",
  "current_spend_usd": 100.12,
  "limit_usd": 100,
  "period": "month",
  "period_resets_at": "2026-05-01T00:00:00Z"
}

Rate controls

AgentFlow’s runtime controls are deliberately layered:

Control	What it does	Configuration
LLM concurrency	Caps simultaneous outbound LLM calls per API process	`LLM_CONCURRENCY_LIMIT`
Shared request-rate limits	Fixed-window tenant and user RPM limits shared through Redis	`LLM_RATE_LIMIT_RPM_PER_TENANT`, `LLM_RATE_LIMIT_RPM_PER_USER`
Monthly spend budgets	Stops new tracked LLM calls after the tenant or user cap is reached	`LLM_BUDGET_PER_TENANT`, `LLM_BUDGET_PER_USER`, settings overrides
Provider limits	Provider-enforced RPM/TPM and quota limits	Provider account or BYOK key tier

LLM_CONCURRENCY_LIMIT is a pressure valve. The LLM_RATE_LIMIT_RPM_PER_* settings add shared request-per-minute throttles across horizontally scaled API replicas when Redis is configured. Provider TPM limits, quota limits, and any model-specific enforcement still come from the provider account or BYOK key tier. When AgentFlow rate-limits a call, it returns HTTP 429 with error: "llm_rate_limit_exceeded", the limit scope, limit value, and a Retry-After header.

Because exact spend is known after a response returns, monthly budget enforcement is preflight-based. Concurrent in-flight requests can exceed the cap by the cost of requests that passed the check before earlier requests recorded spend. Use conservative caps, shared RPM limits, provider-side limits, and concurrency tuning for high-volume tenants.

Configuration

# Per-process outbound LLM concurrency
LLM_CONCURRENCY_LIMIT=30

# Shared fixed-window request limits; 0 disables the limit
LLM_RATE_LIMIT_RPM_PER_TENANT=600
LLM_RATE_LIMIT_RPM_PER_USER=120

# Monthly caps used when settings overrides are unset
LLM_BUDGET_PER_TENANT=100.0
LLM_BUDGET_PER_USER=100.0

# Admin scopes accepted for Usage and Budget settings APIs
AGENTFLOW_ADMIN_SCOPES=agentflow:admin admin:agentflow admin:usage usage:admin admin:budget budget:admin

# Shared counter/cache backend for multi-replica deployments
REDIS_URL=redis://redis:6379/0

For production, configure Redis so budget counters and RPM counters are shared across API replicas. Without Redis, AgentFlow falls back to in-process counters, which are useful for local development and single-process deployments but are not global ledgers.

Operations checklist

Keep pricing entries current in the model registry before enabling a new model for production use.
Treat the Usage dashboard as the source for LLM call spend and adoption analytics; reconcile with provider invoices for finance-close workflows.
Grant one of AGENTFLOW_ADMIN_SCOPES only to operators who should see tenant-wide usage and update caps.
Tune LLM_CONCURRENCY_LIMIT to the provider tier and expected request duration.
Set tenant and user RPM limits when you need shared burst control across replicas.
Configure Redis for any horizontally scaled deployment.
Add the same usage-event and spend-recording hooks when introducing new provider paths or non-chat model calls.

Tenant-wide visibility

Per-user accountability

Live budget enforcement

Flexible operations

​Dashboard experience

​Access control

​Tracking pipeline

​Budget model

​Rate controls

​Configuration

​Operations checklist

Dashboard experience

Access control

Tracking pipeline

Budget model

Rate controls

Configuration

Operations checklist