Skip to main content

Architecture

AgentFlow is built around a delegation-first orchestration model. A central agent receives every user request and routes it to the specialized sub-agent that owns the domain — never attempting to handle domain actions directly.

System overview

Request lifecycle

Every chat request follows the same execution path through the system. This diagram shows a single turn where the agent delegates to a sub-agent that calls a tool and produces an artifact:

The orchestration model

MainAgent: the router

MainAgent is the entry point for all user interactions. It primarily routes work to specialized sub-agents, while also owning shared/internal utility tools such as web context, retrieval, planning, and background sub-agent coordination:
Request typeRouted toExample
Email operationsEmailsAgent”Draft an email to Sarah about the proposal”
Task managementTasksAgent”Create a follow-up task for the Acme call”
Calendar & meetingsMeetingsAgent”What’s on my calendar tomorrow?”
Research & intelligenceResearchAgent”Find recent SEC filings for Acme Corp”
CRM & recordsRecordsAgent”Show me open opportunities over $100K”
Multi-part requests are decomposed automatically. “Draft an email and create a follow-up task” triggers sequential delegation to EmailsAgent then TasksAgent.

Sub-agents: domain specialists

Each sub-agent is a fully autonomous agent with:
  • Its own system prompt optimized for the domain
  • A dedicated tool set (e.g., EmailsAgent has Gmail tools; RecordsAgent has CRM query tools)
  • Optional knowledge bases for domain-specific RAG
  • Configurable capabilities — planning, reflection, retrieval can be toggled per agent
  • Independent LLM configuration — different models or parameters per agent

Call hierarchy

Every execution produces a tree of events with explicit parent-child relationships: Each event carries call_id, parent_call_id, and root_call_id, making it possible to reconstruct the full execution trace, attribute costs, and build timeline UIs.

Tool system

Tools are the atomic units of capability. They’re registered with a decorator and automatically discovered by the agents that need them.

Global tool registry

The tool registry ensures zero duplication — tools with the same name share a single instance across all agents:
@tool(
    name="search_accounts",
    description="Search CRM accounts by name or criteria",
    agent="RecordsAgent",
    tags=["crm", "search"],
    require_approval=False,
)
async def search_accounts(query: str, limit: int = 10) -> list[dict]:
    results = await crm.search("Account", query, limit=limit)
    return results
Key properties:
  • Decorator-based registration@tool() handles discovery, schema generation, and agent targeting
  • Agent targetingagent="RecordsAgent" binds the tool to one agent; agent=["RecordsAgent", "MainAgent"] binds it to several
  • Approval gatesrequire_approval=True pauses execution for human confirmation
  • Shared instances — tools are singletons in the registry; no memory waste from duplication
  • Thread-safe — concurrent access with proper locking

Tool categories

AgentFlow ships with tools across multiple domains:
CategoryExamples
CRM / RecordsAccount search, opportunity management, contact lookup, pipeline queries
EmailDraft, send, reply, search inbox, manage threads
MeetingsSchedule, check availability, transcript analysis, calendar management
ResearchWeb search, company news, SEC filings, industry intelligence
TasksCreate, assign, track, prioritize, follow-up management
KnowledgeSearch knowledge bases, document retrieval
DataQuery execution, report generation, data visualization
InternalPlanning, reasoning, reflection (framework capabilities exposed as tools)

Artifact system

Artifacts are the structured, interactive outputs agents produce — email drafts, meeting invites, tasks, reports, CRM tables, calendars. Rather than relying on visible inline markers in assistant text, the runtime emits structured artifact lifecycle events that clients can render, fetch, edit, and act on.

Artifact library

The artifact library is a global registry of artifact types shared across all agents. Each type declares:
  • Summarizer — a (dict, str) -> dict function that compresses full artifact content into a compact summary for LLM context
  • Prompt block — artifact formatting instructions injected into the agent’s system prompt so it knows how to produce the artifact payload
  • Lifecycledefault_status, streaming_status, and actions that drive state transitions (Draft → Sent, Draft → Scheduled)
  • Display modepanel (right sidebar) or inline (rendered in chat)
Types are registered in two ways:
  1. Explicitartifact() calls in specs.py for draft types, reports, and display artifacts
  2. Tool-result backed@artifact(...) registers the artifact type and summarizer, while sibling register_cached_tool(...) calls wire cacheable tool results to the same summarizer

Cache-ref system

When a tool returns large structured data (50-row CRM query, email thread, meeting list), the full payload would consume the context window. Instead:
  1. The result is stored in a TTL-scoped cache
  2. A compact summary (count, field names, 3-row preview) is returned to the LLM
  3. The agent references the cache ID in its structured artifact payload
  4. The artifact lifecycle event points the UI at the persisted artifact or cache-backed render payload
This keeps LLM token usage low (~200 tokens for a summary vs 10,000+ for full data) while the complete dataset remains accessible to both the agent (via get_cached_result) and the UI.

Registered types

CategoryTypes
DraftsEmail, meeting invite, Slack message, task, scheduled task, plan
ReportsMarkdown documents with charts, tables, and images
Data displayRecords table, calendar (inline), Slack channel/thread history
Auto-registeredAccount health, account summary, company news, recommendations, activity timelines
See Artifacts for the full API, summarizer interface, and SDK usage.

Prompt & context system

Agent behavior is assembled from prompt blocks, then augmented with hidden current-turn system_context, hidden current-turn system_reminders, and durable system_events.

Core pieces

PiecePurposeExamples
prompt_blocksReusable prompt/context definitionsrole, tool_guidance, crm_context, available_skills
system_contextHidden model-only context attached to a user turnresolved refs, KB snippets, attachments, inline turn blocks
system_remindersHidden current-turn reminders supplied by the systemautomatic archival memory recall
system_eventsDurable system-authored facts between turnsartifact actions, artifact summaries, compaction notices

Prompt blocks

Prompt blocks have one public shape: name, description, message, type, mode, scope, tags, order, ttl, agents, optional body, and enabled. Static blocks are DB-backed editable body text. Dynamic blocks are local AgentFlow functions registered with @prompt_block and returning body text at runtime. Inline blocks are request-time body text supplied by a client.

Runtime assembly

Before every agent request:
  1. Cached conversation prompt blocks assemble the system prefix
  2. Conversation history is rebuilt, including prior hidden turn context and system_events
  3. Current-turn system_context is built from inline prompt blocks, context refs, retrieval, files, and page/selection context
  4. Current-turn system_reminders are built from bounded system reminders such as automatic archival memory recall
  5. The active user text is sent with the hidden context/reminder envelope
Volatile turn context stays out of the cached system prefix so provider prompt caching remains effective.
See Prompt System for the full taxonomy and Context Blocks for preloadable dynamic blocks.

Memory

Memory gives agents persistent awareness of user preferences, facts, and behavioral patterns across conversations. The memory service stores per-user memory blocks in two tiers:
  • Core — compact, high-signal facts (4K char limit) injected into every prompt via the memory context block
  • Archival — longer-form context (8K char limit) available for deeper recall

Sleep-time memory

After a conversation settles (no new messages for a configurable window), a background process reviews the conversation and distills new facts into the user’s memory block. This runs asynchronously — the user never waits for memory updates. If the agent explicitly wrote memory during the conversation (via the update_memory tool), sleep-time creation is suppressed to avoid conflicts. Memory updates automatically invalidate the prompt block cache, ensuring the next agent request sees fresh context.

Knowledge & retrieval

Knowledge bases provide agents with access to your organization’s documents and data through vector search.
  • PGVector storage with tenant isolation
  • Hybrid search combining semantic (embedding-based) and keyword (BM25-style) retrieval
  • HyDE (Hypothetical Document Embeddings) for improved recall
  • MMR (Maximal Marginal Relevance) for result diversity
  • Agent-scoped binding — attach knowledge bases to specific agents
  • Configurable chunking — control chunk size, overlap, and strategy per KB

Streaming architecture

All agent executions support Server-Sent Events (SSE) for real-time streaming:
  • Every event has a seq number for deterministic client-side ordering
  • Events carry call_id / parent_call_id / root_call_id for hierarchy reconstruction
  • Artifact middleware — intercepts structured artifact payloads, resolves cache refs, persists artifacts, and emits lifecycle events alongside text tokens
  • Cooperative cancellation — cancel any execution mid-stream without corrupting state
  • Back-pressure — configurable max concurrent SSE connections per replica (default: 200)
  • Connection lifecycle — heartbeats, timeout protection (30 min default), automatic cleanup
  • Disconnect behavior — chat subscribers can detach without cancelling work; explicit cancel endpoints stop executions

Caching architecture

AgentFlow uses a multi-tier caching strategy to minimize latency, reduce LLM token usage, and avoid redundant data fetches. Each tier serves a different purpose:
TierStorageTTLWhat it caches
Prompt block cacheIn-process dict5 min default, 1 hr maxDynamic cached prompt block output (CRM context, memory, tasks, personalization)
Agent instance cacheIn-process per-tenant5 minAgent objects loaded from DB, avoiding repeated queries
Tool definition cacheIn-processUntil invalidatedSerialized tool schemas per agent per tool selection
Secrets cacheIn-processLifetime (positive), 60s (negative)AWS Secrets Manager lookups to avoid per-request calls
Shared cacheRedis (in-process fallback)5 minUser profiles (from GraphQL), merged settings, conversation history snapshots
Result cachePostgreSQL cached_results10 min – 24 hr per toolFull tool result payloads for cache-ref artifact resolution

Cache invalidation

  • Memory updates automatically invalidate the prompt block cache so the next request sees fresh user context
  • Settings changes invalidate the shared cache settings entry
  • Tool registration/removal invalidates the tool definition cache for affected agents
  • Result cache cleanup runs periodically across all tenants via the background worker, evicting expired entries

Fallback behavior

When Redis is unavailable, the shared cache falls back to an in-process TTL + LRU dict. This means single-replica deployments work without Redis, but multi-replica deployments need Redis for cross-worker cache coherence.

Database architecture

AgentFlow uses PostgreSQL with PGVector and enforces strict database-per-tenant isolation — each tenant gets its own PostgreSQL database with no tenant_id columns or row-level filtering.

Tenant isolation

The connection layer resolves tenant name → database name at request time. Each tenant’s database is fully independent:
  • No shared tables — tenant A cannot query tenant B’s data even with a SQL injection
  • Independent migrations — Alembic runs against each tenant database
  • Connection pooling — one cached Database instance per tenant, created on first access

Table landscape

AreaTablesPurpose
Agentsagents, tools, agent_tools, agent_subagents, agent_kbs, llm_configurationsAgent definitions, tool bindings, sub-agent relationships, KB attachments, model configs
Conversationsconversations, conversation_parts, execution_metrics, attachmentsMessage history, execution traces, file attachments
Knowledgeknowledge_bases, kb_documentsKB metadata, document chunks with PGVector embeddings + full-text search
Useruser_settings, prompt_configs, memory_blocksPer-user preferences, prompt customizations, persistent memory
ArtifactsartifactsStructured outputs with content, state, and lifecycle metadata
Cachingcached_resultsTool result payloads for cache-ref resolution (TTL-managed)

PGVector setup

The kb_documents table stores vector embeddings (1536 dimensions) with HNSW indexing for fast approximate nearest-neighbor search. A computed content_tsvector column enables BM25-style keyword search alongside semantic retrieval. The vector extension is enabled per-database via Alembic migrations.

Authentication & multi-tenancy

  • Auth0 bearer tokens are verified directly for human requests
  • Machine tokens use Auth0 client credentials plus snc-tenant and snc-userid headers
  • Database-per-tenant isolation - see Database architecture above
  • Audience-bound deployments accept only tokens for the configured API audience
  • CORS with per-domain regex matching for deployment domains
  • Dev auth bypass for local development (DEV_AUTH_BYPASS=true)
See Authentication for the Auth0, M2M, tenant claim, and local development contract.

Deployment model

AgentFlow runs as a FastAPI application with configurable worker modes:
ModeRole
webServes HTTP/SSE requests
workerRuns background jobs (batch processing, scheduled tasks, cleanup)
allBoth web and worker in a single process (development)
Health probes:
  • GET /health — liveness check (API + memory)
  • GET /ready — readiness check (verifies database connectivity)