Architecture
AgentFlow is built around a delegation-first orchestration model. A central agent receives every user request and routes it to the specialized sub-agent that owns the domain — never attempting to handle domain actions directly.System overview
Request lifecycle
Every chat request follows the same execution path through the system. This diagram shows a single turn where the agent delegates to a sub-agent that calls a tool and produces an artifact:The orchestration model
MainAgent: the router
MainAgent is the entry point for all user interactions. It primarily routes work to specialized sub-agents, while also owning shared/internal utility tools such as web context, retrieval, planning, and background sub-agent coordination:
| Request type | Routed to | Example |
|---|---|---|
| Email operations | EmailsAgent | ”Draft an email to Sarah about the proposal” |
| Task management | TasksAgent | ”Create a follow-up task for the Acme call” |
| Calendar & meetings | MeetingsAgent | ”What’s on my calendar tomorrow?” |
| Research & intelligence | ResearchAgent | ”Find recent SEC filings for Acme Corp” |
| CRM & records | RecordsAgent | ”Show me open opportunities over $100K” |
EmailsAgent then TasksAgent.
Sub-agents: domain specialists
Each sub-agent is a fully autonomous agent with:- Its own system prompt optimized for the domain
- A dedicated tool set (e.g.,
EmailsAgenthas Gmail tools;RecordsAgenthas CRM query tools) - Optional knowledge bases for domain-specific RAG
- Configurable capabilities — planning, reflection, retrieval can be toggled per agent
- Independent LLM configuration — different models or parameters per agent
Call hierarchy
Every execution produces a tree of events with explicit parent-child relationships: Each event carriescall_id, parent_call_id, and root_call_id, making it possible to reconstruct the full execution trace, attribute costs, and build timeline UIs.
Tool system
Tools are the atomic units of capability. They’re registered with a decorator and automatically discovered by the agents that need them.Global tool registry
The tool registry ensures zero duplication — tools with the same name share a single instance across all agents:- Decorator-based registration —
@tool()handles discovery, schema generation, and agent targeting - Agent targeting —
agent="RecordsAgent"binds the tool to one agent;agent=["RecordsAgent", "MainAgent"]binds it to several - Approval gates —
require_approval=Truepauses execution for human confirmation - Shared instances — tools are singletons in the registry; no memory waste from duplication
- Thread-safe — concurrent access with proper locking
Tool categories
AgentFlow ships with tools across multiple domains:| Category | Examples |
|---|---|
| CRM / Records | Account search, opportunity management, contact lookup, pipeline queries |
| Draft, send, reply, search inbox, manage threads | |
| Meetings | Schedule, check availability, transcript analysis, calendar management |
| Research | Web search, company news, SEC filings, industry intelligence |
| Tasks | Create, assign, track, prioritize, follow-up management |
| Knowledge | Search knowledge bases, document retrieval |
| Data | Query execution, report generation, data visualization |
| Internal | Planning, reasoning, reflection (framework capabilities exposed as tools) |
Artifact system
Artifacts are the structured, interactive outputs agents produce — email drafts, meeting invites, tasks, reports, CRM tables, calendars. Rather than relying on visible inline markers in assistant text, the runtime emits structured artifact lifecycle events that clients can render, fetch, edit, and act on.Artifact library
The artifact library is a global registry of artifact types shared across all agents. Each type declares:- Summarizer — a
(dict, str) -> dictfunction that compresses full artifact content into a compact summary for LLM context - Prompt block — artifact formatting instructions injected into the agent’s system prompt so it knows how to produce the artifact payload
- Lifecycle —
default_status,streaming_status, andactionsthat drive state transitions (Draft → Sent, Draft → Scheduled) - Display mode —
panel(right sidebar) orinline(rendered in chat)
- Explicit —
artifact()calls inspecs.pyfor draft types, reports, and display artifacts - Tool-result backed —
@artifact(...)registers the artifact type and summarizer, while siblingregister_cached_tool(...)calls wire cacheable tool results to the same summarizer
Cache-ref system
When a tool returns large structured data (50-row CRM query, email thread, meeting list), the full payload would consume the context window. Instead:- The result is stored in a TTL-scoped cache
- A compact summary (count, field names, 3-row preview) is returned to the LLM
- The agent references the cache ID in its structured artifact payload
- The artifact lifecycle event points the UI at the persisted artifact or cache-backed render payload
get_cached_result) and the UI.
Registered types
| Category | Types |
|---|---|
| Drafts | Email, meeting invite, Slack message, task, scheduled task, plan |
| Reports | Markdown documents with charts, tables, and images |
| Data display | Records table, calendar (inline), Slack channel/thread history |
| Auto-registered | Account health, account summary, company news, recommendations, activity timelines |
See Artifacts for the full API, summarizer interface, and SDK usage.
Prompt & context system
Agent behavior is assembled from prompt blocks, then augmented with hidden current-turnsystem_context, hidden current-turn system_reminders, and durable system_events.
Core pieces
| Piece | Purpose | Examples |
|---|---|---|
prompt_blocks | Reusable prompt/context definitions | role, tool_guidance, crm_context, available_skills |
system_context | Hidden model-only context attached to a user turn | resolved refs, KB snippets, attachments, inline turn blocks |
system_reminders | Hidden current-turn reminders supplied by the system | automatic archival memory recall |
system_events | Durable system-authored facts between turns | artifact actions, artifact summaries, compaction notices |
Prompt blocks
Prompt blocks have one public shape:name, description, message, type, mode, scope, tags, order, ttl, agents, optional body, and enabled.
Static blocks are DB-backed editable body text. Dynamic blocks are local AgentFlow functions registered with @prompt_block and returning body text at runtime. Inline blocks are request-time body text supplied by a client.
Runtime assembly
Before every agent request:- Cached conversation prompt blocks assemble the system prefix
- Conversation history is rebuilt, including prior hidden turn context and
system_events - Current-turn
system_contextis built from inline prompt blocks, context refs, retrieval, files, and page/selection context - Current-turn
system_remindersare built from bounded system reminders such as automatic archival memory recall - The active user text is sent with the hidden context/reminder envelope
See Prompt System for the full taxonomy and Context Blocks for preloadable dynamic blocks.
Memory
Memory gives agents persistent awareness of user preferences, facts, and behavioral patterns across conversations. The memory service stores per-user memory blocks in two tiers:- Core — compact, high-signal facts (4K char limit) injected into every prompt via the
memorycontext block - Archival — longer-form context (8K char limit) available for deeper recall
Sleep-time memory
After a conversation settles (no new messages for a configurable window), a background process reviews the conversation and distills new facts into the user’s memory block. This runs asynchronously — the user never waits for memory updates. If the agent explicitly wrote memory during the conversation (via theupdate_memory tool), sleep-time creation is suppressed to avoid conflicts.
Memory updates automatically invalidate the prompt block cache, ensuring the next agent request sees fresh context.
Knowledge & retrieval
Knowledge bases provide agents with access to your organization’s documents and data through vector search.- PGVector storage with tenant isolation
- Hybrid search combining semantic (embedding-based) and keyword (BM25-style) retrieval
- HyDE (Hypothetical Document Embeddings) for improved recall
- MMR (Maximal Marginal Relevance) for result diversity
- Agent-scoped binding — attach knowledge bases to specific agents
- Configurable chunking — control chunk size, overlap, and strategy per KB
Streaming architecture
All agent executions support Server-Sent Events (SSE) for real-time streaming:- Every event has a
seqnumber for deterministic client-side ordering - Events carry
call_id/parent_call_id/root_call_idfor hierarchy reconstruction - Artifact middleware — intercepts structured artifact payloads, resolves cache refs, persists artifacts, and emits lifecycle events alongside text tokens
- Cooperative cancellation — cancel any execution mid-stream without corrupting state
- Back-pressure — configurable max concurrent SSE connections per replica (default: 200)
- Connection lifecycle — heartbeats, timeout protection (30 min default), automatic cleanup
- Disconnect behavior — chat subscribers can detach without cancelling work; explicit cancel endpoints stop executions
Caching architecture
AgentFlow uses a multi-tier caching strategy to minimize latency, reduce LLM token usage, and avoid redundant data fetches. Each tier serves a different purpose:| Tier | Storage | TTL | What it caches |
|---|---|---|---|
| Prompt block cache | In-process dict | 5 min default, 1 hr max | Dynamic cached prompt block output (CRM context, memory, tasks, personalization) |
| Agent instance cache | In-process per-tenant | 5 min | Agent objects loaded from DB, avoiding repeated queries |
| Tool definition cache | In-process | Until invalidated | Serialized tool schemas per agent per tool selection |
| Secrets cache | In-process | Lifetime (positive), 60s (negative) | AWS Secrets Manager lookups to avoid per-request calls |
| Shared cache | Redis (in-process fallback) | 5 min | User profiles (from GraphQL), merged settings, conversation history snapshots |
| Result cache | PostgreSQL cached_results | 10 min – 24 hr per tool | Full tool result payloads for cache-ref artifact resolution |
Cache invalidation
- Memory updates automatically invalidate the prompt block cache so the next request sees fresh user context
- Settings changes invalidate the shared cache settings entry
- Tool registration/removal invalidates the tool definition cache for affected agents
- Result cache cleanup runs periodically across all tenants via the background worker, evicting expired entries
Fallback behavior
When Redis is unavailable, the shared cache falls back to an in-process TTL + LRU dict. This means single-replica deployments work without Redis, but multi-replica deployments need Redis for cross-worker cache coherence.Database architecture
AgentFlow uses PostgreSQL with PGVector and enforces strict database-per-tenant isolation — each tenant gets its own PostgreSQL database with notenant_id columns or row-level filtering.
Tenant isolation
The connection layer resolves tenant name → database name at request time. Each tenant’s database is fully independent:- No shared tables — tenant A cannot query tenant B’s data even with a SQL injection
- Independent migrations — Alembic runs against each tenant database
- Connection pooling — one cached
Databaseinstance per tenant, created on first access
Table landscape
| Area | Tables | Purpose |
|---|---|---|
| Agents | agents, tools, agent_tools, agent_subagents, agent_kbs, llm_configurations | Agent definitions, tool bindings, sub-agent relationships, KB attachments, model configs |
| Conversations | conversations, conversation_parts, execution_metrics, attachments | Message history, execution traces, file attachments |
| Knowledge | knowledge_bases, kb_documents | KB metadata, document chunks with PGVector embeddings + full-text search |
| User | user_settings, prompt_configs, memory_blocks | Per-user preferences, prompt customizations, persistent memory |
| Artifacts | artifacts | Structured outputs with content, state, and lifecycle metadata |
| Caching | cached_results | Tool result payloads for cache-ref resolution (TTL-managed) |
PGVector setup
Thekb_documents table stores vector embeddings (1536 dimensions) with HNSW indexing for fast approximate nearest-neighbor search. A computed content_tsvector column enables BM25-style keyword search alongside semantic retrieval. The vector extension is enabled per-database via Alembic migrations.
Authentication & multi-tenancy
- Auth0 bearer tokens are verified directly for human requests
- Machine tokens use Auth0 client credentials plus
snc-tenantandsnc-useridheaders - Database-per-tenant isolation - see Database architecture above
- Audience-bound deployments accept only tokens for the configured API audience
- CORS with per-domain regex matching for deployment domains
- Dev auth bypass for local development (
DEV_AUTH_BYPASS=true)
Deployment model
AgentFlow runs as a FastAPI application with configurable worker modes:| Mode | Role |
|---|---|
web | Serves HTTP/SSE requests |
worker | Runs background jobs (batch processing, scheduled tasks, cleanup) |
all | Both web and worker in a single process (development) |
GET /health— liveness check (API + memory)GET /ready— readiness check (verifies database connectivity)

