Skip to main content

High-Level Overview

AgentFlow follows a layered architecture with clear separation between the API surface, the core framework, and the persistence layer.
┌─────────────────────────────────────────────┐
│  Client (Browser / SDK / CLI)               │
└──────────────────┬──────────────────────────┘
                   │  HTTP / SSE
┌──────────────────▼──────────────────────────┐
│  FastAPI Endpoints                          │
│  (execution, management, conversations)     │
└──────────────────┬──────────────────────────┘

┌──────────────────▼──────────────────────────┐
│  Framework Layer                            │
│  ┌───────────┐ ┌──────────┐ ┌────────────┐ │
│  │  Agents   │ │  Tools   │ │ Knowledge  │ │
│  │  Engine   │ │  Engine  │ │ Base Engine │ │
│  └─────┬─────┘ └────┬─────┘ └─────┬──────┘ │
│        │             │             │        │
│  ┌─────▼─────────────▼─────────────▼──────┐ │
│  │  LLM Service (LiteLLM)                 │ │
│  └────────────────────────────────────────┘ │
└──────────────────┬──────────────────────────┘

┌──────────────────▼──────────────────────────┐
│  Persistence                                │
│  PostgreSQL + PGVector │ S3 │ Redis         │
└─────────────────────────────────────────────┘

Executable Protocol

Agents, tools, and sub-agents all implement the Executable base class. This gives every component a uniform streaming interface:
class Executable(ABC):
    async def run(...) -> AsyncIterator[Event]:
        async for event in self._do(...):
            yield event

    @abstractmethod
    async def _do(...) -> AsyncIterator[Event]: ...
All executables produce Event objects with types like START, DELTA, END, and ERROR. This means the streaming infrastructure works identically regardless of whether content comes from an LLM, a tool, or a sub-agent.

Agent Execution Flow

  1. Request arrives at a FastAPI endpoint
  2. Conversation memory is loaded and pruned to fit token limits
  3. Agent loop begins — the agent builds a prompt, calls the LLM, and streams DELTA events
  4. If the LLM requests a tool call, the tool is resolved, validated, and executed (with optional approval gating)
  5. If the LLM requests a sub-agent, a new execution context is created and delegated
  6. If retrieval is enabled, relevant knowledge base documents are injected into context
  7. The agent loops until the LLM produces a final response or hits max_turns
  8. The response and metadata are persisted, and the SSE stream closes

Mixin-Based Agent Design

The Agent class is composed from focused mixins:
MixinResponsibility
PromptMixinPrompt building and context aggregation
CapabilityMixinPlanning, reflection, and retrieval features
PersistenceMixinDatabase read/write for agent state
ExecutableThe streaming execution protocol

Service Registry

Framework services are accessed through a central ServiceRegistry:
llm = ServiceRegistry.get(LLMService)
conversations = ServiceRegistry.get(ConversationService)
Services are singleton-scoped and tenant-aware.

Multi-Tenant Isolation

Each tenant gets isolated:
  • Database schema (or separate database)
  • Tool registry state — agents can have different tool sets per tenant
  • LLM configuration — model, temperature, and token limits per tenant
  • Knowledge bases — documents and embeddings are tenant-scoped