Architecture

AgentFlow is built around a delegation-first orchestration model. A central agent receives every user request and routes it to the specialized sub-agent that owns the domain — never attempting to handle domain actions directly.

System overview

Request lifecycle

Every chat request follows the same execution path through the system. This diagram shows a single turn where the agent delegates to a sub-agent that calls a tool and produces an artifact:

The orchestration model

MainAgent: the router

MainAgent is the entry point for all user interactions. It primarily routes work to specialized sub-agents, while also owning shared/internal utility tools such as web context, retrieval, planning, and background sub-agent coordination:

Request type	Routed to	Example
Email operations	`EmailsAgent`	”Draft an email to Sarah about the proposal”
Task management	`TasksAgent`	”Create a follow-up task for the Acme call”
Calendar & meetings	`MeetingsAgent`	”What’s on my calendar tomorrow?”
Research & intelligence	`ResearchAgent`	”Find recent SEC filings for Acme Corp”
CRM & records	`RecordsAgent`	”Show me open opportunities over $100K”

Multi-part requests are decomposed automatically. “Draft an email and create a follow-up task” triggers sequential delegation to EmailsAgent then TasksAgent.

Sub-agents: domain specialists

Each sub-agent is a fully autonomous agent with:

Its own system prompt optimized for the domain
A dedicated tool set (e.g., EmailsAgent has Gmail tools; RecordsAgent has CRM query tools)
Optional knowledge bases for domain-specific RAG
Configurable capabilities — planning, reflection, retrieval can be toggled per agent
Independent LLM configuration — different models or parameters per agent

Call hierarchy

Every execution produces a tree of events with explicit parent-child relationships: Each event carries call_id, parent_call_id, and root_call_id, making it possible to reconstruct the full execution trace, attribute costs, and build timeline UIs.

Tool system

Tools are the atomic units of capability. They’re registered with a decorator and automatically discovered by the agents that need them.

Global tool registry

The tool registry ensures zero duplication — tools with the same name share a single instance across all agents:

@tool(
    name="search_accounts",
    description="Search CRM accounts by name or criteria",
    agent="RecordsAgent",
    tags=["crm", "search"],
    require_approval=False,
)
async def search_accounts(query: str, limit: int = 10) -> list[dict]:
    results = await crm.search("Account", query, limit=limit)
    return results

Key properties:

Decorator-based registration — @tool() handles discovery, schema generation, and agent targeting
Agent targeting — agent="RecordsAgent" binds the tool to one agent; agent=["RecordsAgent", "MainAgent"] binds it to several
Approval gates — require_approval=True pauses execution for human confirmation
Shared instances — tools are singletons in the registry; no memory waste from duplication
Thread-safe — concurrent access with proper locking

Tool categories

AgentFlow ships with tools across multiple domains:

Category	Examples
CRM / Records	Account search, opportunity management, contact lookup, pipeline queries
Email	Draft, send, reply, search inbox, manage threads
Meetings	Schedule, check availability, transcript analysis, calendar management
Research	Web search, company news, SEC filings, industry intelligence
Tasks	Create, assign, track, prioritize, follow-up management
Knowledge	Search knowledge bases, document retrieval
Data	Query execution, report generation, data visualization
Internal	Planning, reasoning, reflection (framework capabilities exposed as tools)

Artifact system

Artifacts are the structured, interactive outputs agents produce — email drafts, meeting invites, tasks, reports, CRM tables, calendars. Rather than relying on visible inline markers in assistant text, the runtime emits structured artifact lifecycle events that clients can render, fetch, edit, and act on.

Artifact library

The artifact library is a global registry of artifact types shared across all agents. Each type declares:

Summarizer — a (dict, str) -> dict function that compresses full artifact content into a compact summary for LLM context
Prompt block — artifact formatting instructions injected into the agent’s system prompt so it knows how to produce the artifact payload
Lifecycle — default_status, streaming_status, and actions that drive state transitions (Draft → Sent, Draft → Scheduled)
Display mode — panel (right sidebar) or inline (rendered in chat)

Types are registered in two ways:

Explicit — artifact() calls in specs.py for draft types, reports, and display artifacts
Tool-result backed — @artifact(...) registers the artifact type and summarizer, while sibling register_cached_tool(...) calls wire cacheable tool results to the same summarizer

Cache-ref system

When a tool returns large structured data (50-row CRM query, email thread, meeting list), the full payload would consume the context window. Instead:

The result is stored in a TTL-scoped cache
A compact summary (count, field names, 3-row preview) is returned to the LLM
The agent references the cache ID in its structured artifact payload
The artifact lifecycle event points the UI at the persisted artifact or cache-backed render payload

This keeps LLM token usage low (~200 tokens for a summary vs 10,000+ for full data) while the complete dataset remains accessible to both the agent (via get_cached_result) and the UI.

Registered types

Category	Types
Drafts	Email, meeting invite, Slack message, task, scheduled task, plan
Reports	Markdown documents with charts, tables, and images
Data display	Records table, calendar (inline), Slack channel/thread history
Auto-registered	Account health, account summary, company news, recommendations, activity timelines

See Artifacts for the full API, summarizer interface, and SDK usage.

Prompt & context system

Agent behavior is assembled from prompt blocks, then augmented with hidden current-turn system_context, hidden current-turn system_reminders, and durable system_events.

Core pieces

Piece	Purpose	Examples
`prompt_blocks`	Reusable prompt/context definitions	`role`, `tool_guidance`, `crm_context`, `available_skills`
`system_context`	Hidden model-only context attached to a user turn	resolved refs, KB snippets, attachments, inline turn blocks
`system_reminders`	Hidden current-turn reminders supplied by the system	automatic archival memory recall
`system_events`	Durable system-authored facts between turns	artifact actions, artifact summaries, compaction notices

Prompt blocks

Prompt blocks have one public shape: name, description, message, type, mode, scope, tags, order, ttl, agents, optional body, and enabled. Static blocks are DB-backed editable body text. Dynamic blocks are local AgentFlow functions registered with @prompt_block and returning body text at runtime. Inline blocks are request-time body text supplied by a client.

Runtime assembly

Before every agent request:

Cached conversation prompt blocks assemble the system prefix
Conversation history is rebuilt, including prior hidden turn context and system_events
Current-turn system_context is built from inline prompt blocks, context refs, retrieval, files, and page/selection context
Current-turn system_reminders are built from bounded system reminders such as automatic archival memory recall
The active user text is sent with the hidden context/reminder envelope

Volatile turn context stays out of the cached system prefix so provider prompt caching remains effective.

See Prompt System for the full taxonomy and Context Blocks for preloadable dynamic blocks.

Memory

Memory gives agents persistent awareness of user preferences, facts, and behavioral patterns across conversations. The memory service stores per-user memory blocks in two tiers:

Core — compact, high-signal facts (4K char limit) injected into every prompt via the memory context block
Archival — longer-form context (8K char limit) available for deeper recall

Sleep-time memory

After a conversation settles (no new messages for a configurable window), a background process reviews the conversation and distills new facts into the user’s memory block. This runs asynchronously — the user never waits for memory updates. If the agent explicitly wrote memory during the conversation (via the update_memory tool), sleep-time creation is suppressed to avoid conflicts. Memory updates automatically invalidate the prompt block cache, ensuring the next agent request sees fresh context.

Knowledge & retrieval

Knowledge bases provide agents with access to your organization’s documents and data through vector search.

PGVector storage with tenant isolation
Hybrid search combining semantic (embedding-based) and keyword (BM25-style) retrieval
HyDE (Hypothetical Document Embeddings) for improved recall
MMR (Maximal Marginal Relevance) for result diversity
Agent-scoped binding — attach knowledge bases to specific agents
Configurable chunking — control chunk size, overlap, and strategy per KB

Streaming architecture

All agent executions support Server-Sent Events (SSE) for real-time streaming:

Every event has a seq number for deterministic client-side ordering
Events carry call_id / parent_call_id / root_call_id for hierarchy reconstruction
Artifact middleware — intercepts structured artifact payloads, resolves cache refs, persists artifacts, and emits lifecycle events alongside text tokens
Cooperative cancellation — cancel any execution mid-stream without corrupting state
Back-pressure — configurable max concurrent SSE connections per replica (default: 200)
Connection lifecycle — heartbeats, timeout protection (30 min default), automatic cleanup
Disconnect behavior — chat subscribers can detach without cancelling work; explicit cancel endpoints stop executions

Caching architecture

AgentFlow uses a multi-tier caching strategy to minimize latency, reduce LLM token usage, and avoid redundant data fetches. Each tier serves a different purpose:

Tier	Storage	TTL	What it caches
Prompt block cache	In-process dict	5 min default, 1 hr max	Dynamic cached prompt block output (CRM context, memory, tasks, personalization)
Agent instance cache	In-process per-tenant	5 min	Agent objects loaded from DB, avoiding repeated queries
Tool definition cache	In-process	Until invalidated	Serialized tool schemas per agent per tool selection
Secrets cache	In-process	Lifetime (positive), 60s (negative)	AWS Secrets Manager lookups to avoid per-request calls
Shared cache	Redis (in-process fallback)	5 min	User profiles (from GraphQL), merged settings, conversation history snapshots
Result cache	PostgreSQL `cached_results`	10 min – 24 hr per tool	Full tool result payloads for cache-ref artifact resolution

Cache invalidation

Memory updates automatically invalidate the prompt block cache so the next request sees fresh user context
Settings changes invalidate the shared cache settings entry
Tool registration/removal invalidates the tool definition cache for affected agents
Result cache cleanup runs periodically across all tenants via the background worker, evicting expired entries

Fallback behavior

When Redis is unavailable, the shared cache falls back to an in-process TTL + LRU dict. This means single-replica deployments work without Redis, but multi-replica deployments need Redis for cross-worker cache coherence.

Database architecture

AgentFlow uses PostgreSQL with PGVector and enforces strict database-per-tenant isolation — each tenant gets its own PostgreSQL database with no tenant_id columns or row-level filtering.

Tenant isolation

The connection layer resolves tenant name → database name at request time. Each tenant’s database is fully independent:

No shared tables — tenant A cannot query tenant B’s data even with a SQL injection
Independent migrations — Alembic runs against each tenant database
Connection pooling — one cached Database instance per tenant, created on first access

Table landscape

Area	Tables	Purpose
Agents	`agents`, `tools`, `agent_tools`, `agent_subagents`, `agent_kbs`, `llm_configurations`	Agent definitions, tool bindings, sub-agent relationships, KB attachments, model configs
Conversations	`conversations`, `conversation_parts`, `execution_metrics`, `attachments`	Message history, execution traces, file attachments
Knowledge	`knowledge_bases`, `kb_documents`	KB metadata, document chunks with PGVector embeddings + full-text search
User	`user_settings`, `prompt_configs`, `memory_blocks`	Per-user preferences, prompt customizations, persistent memory
Artifacts	`artifacts`	Structured outputs with content, state, and lifecycle metadata
Caching	`cached_results`	Tool result payloads for cache-ref resolution (TTL-managed)

PGVector setup

The kb_documents table stores vector embeddings (1536 dimensions) with HNSW indexing for fast approximate nearest-neighbor search. A computed content_tsvector column enables BM25-style keyword search alongside semantic retrieval. The vector extension is enabled per-database via Alembic migrations.

Authentication & multi-tenancy

Auth0 bearer tokens are verified directly for human requests
Machine tokens use Auth0 client credentials plus snc-tenant and snc-userid headers
Database-per-tenant isolation - see Database architecture above
Audience-bound deployments accept only tokens for the configured API audience
CORS with per-domain regex matching for deployment domains
Dev auth bypass for local development (DEV_AUTH_BYPASS=true)

See Authentication for the Auth0, M2M, tenant claim, and local development contract.

Deployment model

AgentFlow runs as a FastAPI application with configurable worker modes:

Mode	Role
`web`	Serves HTTP/SSE requests
`worker`	Runs background jobs (batch processing, scheduled tasks, cleanup)
`all`	Both web and worker in a single process (development)

Health probes:

GET /health — liveness check (API + memory)
GET /ready — readiness check (verifies database connectivity)

​Architecture

​System overview

​Request lifecycle

​The orchestration model

​MainAgent: the router

​Sub-agents: domain specialists

​Call hierarchy

​Tool system

​Global tool registry

​Tool categories

​Artifact system

​Artifact library

​Cache-ref system

​Registered types

​Prompt & context system

​Core pieces

​Prompt blocks

​Runtime assembly

​Memory

​Sleep-time memory

​Knowledge & retrieval

​Streaming architecture

​Caching architecture

​Cache invalidation

​Fallback behavior

​Database architecture

​Tenant isolation

​Table landscape

​PGVector setup

​Authentication & multi-tenancy

​Deployment model

Architecture

System overview

Request lifecycle

The orchestration model

MainAgent: the router

Sub-agents: domain specialists

Call hierarchy

Tool system

Global tool registry

Tool categories

Artifact system

Artifact library

Cache-ref system

Registered types

Prompt & context system

Core pieces

Prompt blocks

Runtime assembly

Memory

Sleep-time memory

Knowledge & retrieval

Streaming architecture

Caching architecture

Cache invalidation

Fallback behavior

Database architecture

Tenant isolation

Table landscape

PGVector setup

Authentication & multi-tenancy

Deployment model