Skip to main content

Conversations & Memory

AgentFlow manages multi-turn conversations with persistent history, intelligent memory pruning, and automatic title generation.

Conversations

Every agent interaction happens within a conversation. Conversations are automatically created on the first message and persist full execution history.

Creating a conversation

Conversations are created implicitly when you send a message with a new conversation_id:
import uuid

response = await agent.run(
    "What open opportunities do we have?",
    conversation_id=str(uuid.uuid4()),
)
Or via REST:
POST /agent/{agent_id}/chat
{
  "message": "What open opportunities do we have?",
  "conversation_id": "conv_new_001",
  "message_id": "msg_001",
  "stream": true
}

Listing conversations

conversations = await af.Conversation.list(agent_id=agent.id)
for conv in conversations:
    print(f"{conv.title}{conv.message_count} messages")
GET /agent/conversations?limit=50&offset=0&agent_id={agent_id}

Retrieving conversation history

conv = await af.Conversation.get("conv_001")
for msg in conv.messages:
    print(f"[{msg.role}] {msg.content[:100]}...")

Timeline view

The timeline API provides the full execution trace — every agent delegation, tool call, and content segment:
GET /api/conversations/{conversation_id}/timeline

LLM-format export

Export conversation history in the format expected by LLM APIs:
GET /api/conversations/{conversation_id}/llm-format

Memory management

Long conversations accumulate history that eventually exceeds LLM context windows. AgentFlow handles this automatically with intelligent memory management:

How it works

  1. Message-based pruning — keeps the system prompt and most recent turns
  2. LLM summarization — older messages are summarized into a compact context block
  3. Recent-turn preservation — a configurable number of recent user turns are always preserved
  4. Checkpoint persistence — memory checkpoints are stored in conversation metadata for fast resumption

Configuration

Memory management is configured per-agent:
ParameterDefaultDescription
max_conversation_messages50Trigger pruning when history exceeds this count
target_conversation_messages30Target message count after pruning
preserve_recent_turns10Always keep this many recent user turns

On-demand compression

You can trigger memory compression explicitly:
POST /agent/{agent_id}/conversations/{conversation_id}/compress
This returns a streaming response showing the compression process and resulting summary.

Automatic title generation

Conversation titles are generated automatically from the first user message using an LLM:
# Auto-generated on first message — or generate/regenerate manually:
title = await conv.generate_title()

# Set a custom title
await conv.set_title("Q3 Revenue Analysis")
# Generate title
POST /agent/conversations/{conversation_id}/title/generate
{ "seed_message": "What does our Q3 pipeline look like?", "force": false }

# Update title
PUT /agent/conversations/{conversation_id}/title
{ "title": "Q3 Pipeline Review" }

# Get title
GET /agent/conversations/{conversation_id}/title

Conversation lifecycle

# List conversations
GET /agent/conversations

# Get full conversation with messages
GET /agent/conversations/{conversation_id}

# Get timeline (execution trace)
GET /api/conversations/{conversation_id}/timeline

# Get LLM-format history
GET /api/conversations/{conversation_id}/llm-format

# Update conversation
PUT /api/conversations/{conversation_id}

# Delete conversation
DELETE /api/conversations/{conversation_id}