Blog

AI Agent System Architecture

16 min readFeb 9, 2026

AI Agent System Architecture

Systems design for multi-agent coordination


Architecture Notes

Vision

Evolve Ralph from single-agent PRD executor → multi-agent team system with:

  • SOUL-based personalities - agents with distinct roles/skills
  • Team coordination - multiple agents collaborating on shared work
  • Filesystem-native - no external DB, everything in files (matches Doc 2)
  • Observable - live dashboard with progress, costs, activity

User Workflow

# 1. Install to any project
curl -fsSL https://tybarho.com/ralph/install.sh | bash

# 2. Opens local dashboard (React)
# 3. Define teams + agents (AI-assisted)
# 4. Create tasks/PRDs/crons
# 5. Watch agents work with live progress + costs

Directory Structure (per project)

.ralph/
├── config.json              # Teams, agent assignments, schedules
├── dashboard/               # React app (served locally)
├── agents/                  # Agent definitions
│   ├── friday/
│   │   ├── SOUL.md          # Personality, skills, voice
│   │   └── MEMORY.md        # Agent-specific learnings
│   ├── loki/
│   │   ├── SOUL.md
│   │   └── MEMORY.md
│   └── shuri/
│       ├── SOUL.md
│       └── MEMORY.md
├── teams/                   # Team compositions
│   ├── engineering.json     # { agents: ["friday", "shuri"], focus: "..." }
│   ├── marketing.json       # { agents: ["loki", "quill", "vision"] }
│   └── research.json        # { agents: ["fury", "shuri"] }
├── state/                   # Shared state (filesystem-based)
│   ├── tasks.json           # Task board (inbox, assigned, in_progress, review, done)
│   ├── activity.jsonl       # Append-only activity log
│   └── agents/              # Per-agent runtime state
│       ├── friday.json      # { status, currentTask, lastHeartbeat, tokenUsage }
│       └── loki.json
├── work/                    # Task workspaces
│   ├── {task-id}/
│   │   ├── task.json        # Task definition
│   │   ├── thread.jsonl     # Comments/discussion
│   │   └── deliverables/    # Output files
├── memory/                  # Shared knowledge
│   ├── CODEBASE.md          # Project-wide patterns (current MEMORY.md)
│   └── CONTEXT.md           # User context (who you are)
└── logs/                    # Execution logs + costs
    ├── 2026-02-05.jsonl     # Daily log with token counts
    └── costs.json           # Aggregated cost tracking

Agent SOUL Structure

# SOUL.md — Friday (Developer)

## Identity
Name: Friday
Role: Developer
Team: Engineering

## Personality
Code is poetry. Clean, tested, documented.
Prefers small PRs. Runs tests before committing.
Asks Shuri for review on anything user-facing.

## Skills
- TypeScript, React, Next.js
- Testing (vitest, playwright)
- Database migrations
- API design

## Voice
Direct. Technical. Cites file paths and line numbers.
Uses code blocks liberally. Explains the "why" not just "what."

## Boundaries
- Won't merge without tests passing
- Escalates to human if touching auth/payments
- Asks for design review on UI changes

MCP Tools Integration

MCP (Model Context Protocol) tools like calendar, email, image generation, etc. are configured at the CLI level but need to be managed per-agent.

The Challenge:

  • Tools configured in ~/.cursor/mcp.json or ~/.claude/mcp.json (global)
  • Different agents need different tools (Friday doesn't need calendar, Pepper does)
  • Some tools are sensitive (email, calendar) — not every agent should have access
  • Tool availability should be part of agent identity

Solution: Tool Allowlists in SOUL

# SOUL.md — Pepper (Email Marketing)

## Tools
Allowed:
- email (send, draft, search)
- calendar (read, create events)
- generate_image (for email graphics)

Denied:
- shell (no arbitrary command execution)
- filesystem (only through orchestrator)

How It Works:

User's MCP config (global)
Orchestrator reads available tools
Agent SOUL specifies allowed tools
Orchestrator filters tools before spawning agent
Agent only sees tools in their allowlist

Implementation Options:

Option A: CLI flag filtering (if supported)

# Pass allowed tools to CLI
cursor agent --tools "email,calendar,generate_image" --prompt "..."
claude --tools "email,calendar" --prompt "..."

Option B: System prompt enforcement

# Injected into agent's system prompt
You have access to these tools ONLY:
- email: send, draft, search emails
- calendar: read and create events

Do NOT attempt to use: shell, filesystem, web_search

Option C: Orchestrator proxy (most control)

// Orchestrator intercepts tool calls
const allowedTools = loadAgentTools(agent);

const toolProxy = {
  async callTool(name: string, args: any) {
    if (!allowedTools.includes(name)) {
      throw new Error(`Agent ${agent} not authorized for tool: ${name}`);
    }
    return await mcpClient.callTool(name, args);
  }
};

Tool Categories:

CategoryToolsAgents
Codeshell, filesystem, gitfriday, shuri
Communicationemail, slack, calendarpepper, jarvis
Contentgenerate_image, web_search, pdf_readerloki, wanda, fury
Datadatabase, analyticsvision, fury

Team-Level Tool Inheritance:

// .ralph/teams/marketing.json
{
  "agents": ["loki", "quill", "pepper"],
  "tools": {
    "shared": ["generate_image", "web_search"],
    "per_agent": {
      "pepper": ["email", "calendar"],
      "quill": ["twitter", "linkedin"]
    }
  }
}

Agent Tool Config:

// .ralph/agents/pepper/config.json
{
  "tools": {
    "allow": ["email", "calendar", "generate_image"],
    "deny": ["shell", "filesystem"],
    "require_approval": ["email:send"]  // Human approves before sending
  }
}

Approval Workflow for Sensitive Tools:

Agent wants to send email
Tool marked as "require_approval"
Orchestrator pauses agent
Dashboard shows pending approval:
  "Pepper wants to send email to user@example.com
   Subject: Welcome to our newsletter
   [Approve] [Deny] [Edit]"
Human approves/denies
Agent continues or handles denial

Directory Structure Update:

.ralph/
├── agents/
│   └── pepper/
│       ├── SOUL.md
│       ├── MEMORY.md
│       └── config.json      # Tool permissions, model preferences
├── tools/                   # Tool configs and wrappers
│   ├── available.json       # Discovered from global MCP config
│   └── approvals.jsonl      # Pending/completed approval log

Discovery: Reading Global MCP Config:

// orchestrator reads user's MCP setup
function discoverTools() {
  const cursorConfig = readJson('~/.cursor/mcp.json');
  const claudeConfig = readJson('~/.claude/mcp.json');
  
  const allTools = [
    ...parseMcpServers(cursorConfig),
    ...parseMcpServers(claudeConfig)
  ];
  
  writeJson('.ralph/tools/available.json', allTools);
}

Orchestration

Option A: Shell-based (simple, current Ralph pattern)

# ralph-orchestrator.sh
# Runs in terminal, manages heartbeats for all agents

while true; do
  for agent in $(ls .ralph/agents/); do
    if should_wake "$agent"; then
      ralph-agent "$agent" &
    fi
  done
  sleep 60
done

Option B: Node process (better for dashboard integration)

// orchestrator.ts
// Manages agents, serves dashboard, tracks costs

const orchestrator = new RalphOrchestrator({
  agents: loadAgents('.ralph/agents/'),
  teams: loadTeams('.ralph/teams/'),
  heartbeatInterval: 15 * 60 * 1000, // 15 min
});

orchestrator.on('agent:wake', (agent) => {
  // Spawn cursor agent CLI with agent's SOUL
});

orchestrator.on('agent:complete', (agent, result) => {
  // Log activity, update costs, notify subscribers
});

// Serve dashboard on localhost:3333
orchestrator.serveDashboard();

Heartbeat Flow (per agent)

Agent wakes (cron or orchestrator)
1. Read own SOUL.md (who am I?)
2. Read state/tasks.json (what needs doing?)
3. Check for @mentions in activity.jsonl
4. Check assigned tasks for my teams
5. If work found:
   - Claim task (update state)
   - Read task workspace (work/{task-id}/)
   - Read relevant MEMORY files
   - Do work (one unit)
   - Post to thread.jsonl
   - Update deliverables/
   - Log activity + tokens
6. If no work:
   - Log HEARTBEAT_OK
   - Go back to sleep

Task Structure

{
  "id": "task_2026020512345",
  "title": "Add dark mode toggle to settings",
  "description": "...",
  "status": "in_progress",
  "assignees": ["friday", "shuri"],
  "team": "engineering",
  "created": "2026-02-05T10:00:00Z",
  "priority": 1,
  "type": "feature",  // feature | bug | research | content
  "steps": [
    { "description": "Add toggle component", "passes": true },
    { "description": "Wire up to theme context", "passes": false },
    { "description": "Add tests", "passes": false }
  ]
}

Agent Communication

@mentions in activity log:

{"ts":"...","agent":"friday","type":"comment","task":"task_123","content":"@shuri can you test this on mobile?"}
{"ts":"...","agent":"shuri","type":"comment","task":"task_123","content":"Tested. Found edge case with system preference..."}

Thread subscriptions: Agent auto-subscribed when they:

  • Get assigned to task
  • Comment on task
  • Get @mentioned

Dashboard (React)

Views:

  1. Activity Feed - real-time stream of all agent activity
  2. Task Board - kanban by status (inbox → done)
  3. Agent Status - who's awake, what they're doing, last heartbeat
  4. Cost Tracker - tokens/$ per agent, per team, per day
  5. Team View - filter by team

Tech:

  • Vite + React (fast, local)
  • Watches .ralph/state/ for changes (chokidar or polling)
  • SSE or WebSocket from orchestrator for live updates
  • TailwindCSS for styling

Cost Tracking

// .ralph/logs/costs.json
{
  "daily": {
    "2026-02-05": {
      "total_tokens": 145000,
      "total_cost": 2.34,
      "by_agent": {
        "friday": { "tokens": 80000, "cost": 1.28 },
        "loki": { "tokens": 65000, "cost": 1.06 }
      }
    }
  },
  "lifetime": {
    "total_cost": 47.82
  }
}

Eval Integration

Each agent can have evals (like evals/agents/calendar-assistant.yaml):

# .ralph/agents/friday/eval.yaml
name: friday-code-quality
cases:
  - input: "Add a button component"
    expected:
      - creates component file
      - exports named function (not arrow)
      - includes basic test

Migration from Current Ralph

CurrentNew
plans/MEMORY.md.ralph/memory/CODEBASE.md
plans/CONTEXT.md.ralph/memory/CONTEXT.md
plans/PROMPT.md.ralph/agents/{name}/SOUL.md (per agent)
plans/*.prd.json.ralph/work/{task-id}/task.json
plans/*.progress.txt.ralph/work/{task-id}/thread.jsonl
plans/ralph.sh.ralph/orchestrator (shell or node)

Decisions

  1. CLI: Support both claude CLI and cursor agent CLI (user choice)
  2. Dashboard: Local only (served from orchestrator)
  3. Model selection: Cheap models for heartbeats/routing, expensive for creative work
  4. Git workflow: Git worktrees (see primer below)
  5. Cross-project teams: No — agents defined per-project

Git Worktrees Primer

Git worktrees let you check out multiple branches simultaneously in separate directories, all linked to the same repo. Perfect for parallel agent work.

Basic Commands:

# Create a worktree for an agent's branch
git worktree add .ralph/worktrees/friday feature/friday-dark-mode

# List all worktrees
git worktree list

# Remove a worktree when done
git worktree remove .ralph/worktrees/friday

# Prune stale worktree references
git worktree prune

Directory Structure with Worktrees:

my-project/                    # Main worktree (main branch)
├── .ralph/
│   └── worktrees/
│       ├── friday/            # Friday's worktree (feature/friday-dark-mode)
│       │   ├── src/
│       │   └── ...
│       └── loki/              # Loki's worktree (content/loki-blog-post)
│           ├── src/
│           └── ...

How Agents Use Worktrees:

Agent claims task
1. Create branch: git branch feature/{agent}-{task-slug}
2. Create worktree: git worktree add .ralph/worktrees/{agent} feature/{agent}-{task-slug}
3. Agent works in worktree (isolated from main)
4. Agent commits to their branch
5. When task complete: PR or merge to main
6. Cleanup: git worktree remove .ralph/worktrees/{agent}

Why Worktrees (vs regular branches):

ApproachProblem
Shared branchAgents step on each other's changes
Branch switchingCan only work on one branch at a time
Separate clonesWastes disk space, syncing headaches
WorktreesParallel work, shared .git, isolated working dirs

Limitations:

  1. Can't double-checkout — Same branch can't be in two worktrees simultaneously
  2. Cleanup required — Must remove worktrees when done or they pile up
  3. Some tools confused — IDE file watchers, some git GUIs don't handle worktrees well
  4. Merge conflicts — Still need to resolve when merging back to main
  5. Disk space — Each worktree is a full working copy (but shares .git objects)

Orchestrator Worktree Management:

// When agent claims task
async function setupAgentWorktree(agent: string, taskId: string) {
  const branch = `feature/${agent}-${taskId}`;
  const worktreePath = `.ralph/worktrees/${agent}`;
  
  // Create branch from main
  await exec(`git branch ${branch} main`);
  
  // Create worktree
  await exec(`git worktree add ${worktreePath} ${branch}`);
  
  return worktreePath;
}

// When task complete
async function cleanupAgentWorktree(agent: string) {
  const worktreePath = `.ralph/worktrees/${agent}`;
  
  // Remove worktree
  await exec(`git worktree remove ${worktreePath}`);
  
  // Optionally delete branch after merge
  // await exec(`git branch -d ${branch}`);
}

Merge Strategy Options:

# Option A: Direct merge (simple, but messy history)
git checkout main
git merge feature/friday-dark-mode

# Option B: Squash merge (clean history, loses granular commits)
git checkout main
git merge --squash feature/friday-dark-mode
git commit -m "feat: dark mode toggle (friday)"

# Option C: PR-based (best for review, needs GitHub CLI)
gh pr create --base main --head feature/friday-dark-mode
# Human reviews, approves, merges

Recommended Flow for Teams:

main ─────────────────────────────────────────────►
       \                    /         \          /
        friday-task-1 ─────►           friday-task-2 ─►
       \              /
        loki-task-1 ─►

Each agent:

  1. Branches from latest main
  2. Works in isolation
  3. Squash merges back (or PR for review)
  4. Worktree cleaned up

Edge Case: Agent Needs Another Agent's Work

# Loki needs Friday's changes before they're merged
cd .ralph/worktrees/loki
git fetch origin
git merge origin/feature/friday-dark-mode
# Or cherry-pick specific commits

Updated Directory Structure

my-project/
├── .ralph/
│   ├── config.json
│   ├── dashboard/
│   ├── agents/
│   ├── teams/
│   ├── state/
│   ├── work/
│   ├── memory/
│   ├── logs/
│   └── worktrees/           # Agent working directories
│       ├── friday/          # → feature/friday-{task}
│       └── loki/            # → content/loki-{task}


Reference Documents


Reference Document 1: Building Mission Control (AI Agent Squad)

Summary of @pbteja1998's guide on building a 10-agent AI team using Clawdbot/OpenClaw

The Problem

Every AI tool has the same issue: no continuity. Conversations start fresh, context from yesterday is gone, research gets lost in chat threads.

The goal: AI that works like a team, not a search box.

Core Architecture: Clawdbot (OpenClaw)

An open-source AI agent framework with three jobs:

  1. Connects AI to real world - file access, shell, web browsing, APIs
  2. Maintains persistent sessions - conversation history survives restarts
  3. Routes messages - Telegram, Discord, Slack, etc.

Sessions: The Key Concept

Each session has:

  • Unique session key (e.g., agent:main:main)
  • Independent conversation history (JSONL files on disk)
  • Own model and tools

Sessions are independent - each agent is just a Clawdbot session with specialized config.

The Workspace

/home/usr/clawd/           ← Workspace root
├── AGENTS.md              ← Operating manual
├── SOUL.md                ← Agent personality
├── memory/
│   ├── WORKING.md         ← Current task state
│   └── YYYY-MM-DD.md      ← Daily notes
├── scripts/
└── config/

Multi-Agent Setup

10 Agents = 10 Sessions

AgentRoleSession Key
JarvisSquad Leadagent:main:main
ShuriProduct Analystagent:product-analyst:main
FuryCustomer Researcheragent:customer-researcher:main
VisionSEO Analystagent:seo-analyst:main
LokiContent Writeragent:content-writer:main
QuillSocial Mediaagent:social-media-manager:main
WandaDesigneragent:designer:main
PepperEmail Marketingagent:email-marketing:main
FridayDeveloperagent:developer:main
WongDocumentationagent:notion-agent:main

The Heartbeat System

Agents wake every 15 minutes via cron (staggered schedule):

  • :00 Pepper, :02 Shuri, :04 Friday, :06 Loki, :07 Wanda, :08 Vision, :10 Fury, :12 Quill

Each heartbeat:

  1. Load context (read WORKING.md)
  2. Check for @mentions and assigned tasks
  3. Scan activity feed
  4. Take action or report HEARTBEAT_OK

Why 15 minutes? 5 min = too expensive, 30 min = too slow.

Mission Control: The Shared Brain

Convex database with 6 tables:

  • agents - name, role, status, currentTaskId, sessionKey
  • tasks - title, description, status, assigneeIds
  • messages - taskId, fromAgentId, content, attachments
  • activities - type, agentId, message
  • documents - title, content, type, taskId
  • notifications - mentionedAgentId, content, delivered

Agent Communication

Option 1: Direct session messaging

clawdbot sessions send --session "agent:seo-analyst:main" --message "Vision, review this?"

Option 2: Shared database (preferred) - all agents read/write to same Convex DB.

@Mentions & Thread Subscriptions

  • Type @Vision → Vision notified on next heartbeat
  • Type @all → everyone notified
  • Interact with a task → auto-subscribed to all future comments

The SOUL System (Agent Personalities)

# SOUL.md — Who You Are

**Name:** Shuri
**Role:** Product Analyst

## Personality
Skeptical tester. Thorough bug hunter. Finds edge cases.
Think like a first-time user. Question everything.

## What You're Good At
- Testing features from user perspective
- Finding UX issues and edge cases
- Competitive analysis

Key insight: An agent "good at everything" is mediocre. Constraints focus them.

Memory Stack

  1. Session Memory (built-in) - JSONL conversation history
  2. Working Memory (/memory/WORKING.md) - current task state, read on wake
  3. Daily Notes (/memory/YYYY-MM-DD.md) - raw logs
  4. Long-term Memory (MEMORY.md) - curated important stuff

Golden Rule: If you want to remember something, write it to a file.

Task Lifecycle

  1. Inbox - new, unassigned
  2. Assigned - has owner(s), not started
  3. In Progress - being worked on
  4. Review - done, needs approval
  5. Done - finished
  6. Blocked - stuck

Daily Standup

Cron at 11:30 PM sends summary to Telegram:

  • Completed today
  • In progress
  • Blocked items
  • Needs review
  • Key decisions

Lessons Learned

  1. Start smaller - get 2-3 agents solid before adding more
  2. Use cheaper models for routine work - heartbeats don't need expensive models
  3. Memory is hard - put everything in files, not "mental notes"
  4. Let agents surprise you - they'll contribute to unassigned tasks

Quick Start

npm install -g clawdbot
clawdbot init
clawdbot gateway start

# Add heartbeat
clawdbot cron add --name "agent-heartbeat" --cron "*/15 * * * *" \
  --session "isolated" \
  --message "Check for work. If nothing, reply HEARTBEAT_OK."

The Real Secret

Treat AI agents like team members: give them roles, memory, let them collaborate, hold them accountable.


Source: X Article by @pbteja1998 | Built on OpenClaw


Reference Document 2: Agents with Filesystems and Bash

Summary of Vercel's blog post by Ashka Stephen (Jan 9, 2026)

The Core Insight

Replace custom tooling with filesystem + bash. Sales call summarization agent went from ~$1.00 to ~$0.25 per call on Claude Opus 4.5, with improved output quality.

Why it works: LLMs trained on massive amounts of code. They've spent countless hours navigating directories, grepping files, managing state. If agents excel at filesystem ops for code, they excel at filesystem ops for anything.

How Agents Read Filesystems

Agent receives task
Explores filesystem (ls, find)
Searches for relevant content (grep, cat)
Sends context + request to LLM
Returns structured output

Agent runs in sandbox. Reasoning is trusted, but sandbox isolates what it can actually do.

Why Filesystems Beat Vector Search

ApproachProblem
Prompt stuffingHits token limits
Vector searchImprecise for specific values
FilesystemStructure matches domain, precise retrieval, minimal context

Key advantages:

  • Structure matches domain - hierarchies map to directories
  • Retrieval is precise - grep -r "pricing objection" transcripts/ returns exact matches
  • Context stays minimal - agent loads files on demand, not upfront

Domain Mapping Examples

Customer Support:

/customers/
  /cust_12345/
    profile.json
    tickets/
      ticket_001.md
      ticket_002.md
    conversations/
      2024-01-15.txt
    preferences.json

Document Analysis:

/documents/
  /uploaded/
    contract_abc123.pdf
  /extracted/
    contract_abc123.txt
  /analysis/
    contract_abc123/
      summary.md
      key_terms.json
      risk_assessment.md
/templates/
  contract_analysis_prompt.md

Sales Call Summary Agent Structure

gong-calls/
  demo-call-001-companyname-product-demo.md
  metadata.json
  previous-calls/
    demo-call-000-discovery-call.md
salesforce/
  account.md
  opportunity.md
  contacts.md
slack/
  slack-channel.md
research/
  company-research.md
  competitive-intel.md
playbooks/
  sales-playbook.md

Agent explores like a codebase:

$ ls sales-calls/
$ cat sales-calls/metadata.json
$ grep -i "concern\|worried\|issue" sales-calls/*.md

Why Bash + Filesystem

  1. Native model capabilities - grep, cat, find, awk are native ops, not bolted on
  2. Future-proof - as models improve at coding, agents improve automatically
  3. Debuggable - see exactly what files were read, what commands ran
  4. Secure - sandbox isolates execution
  5. Less code - no retrieval pipelines, just write files to directories

Tools

The Punchline

"The future of agents might be surprisingly simple. Maybe the best architecture is almost no architecture at all. Just filesystems and bash."

Source: Vercel Blog