AI Agent System Architecture

Systems design for multi-agent coordination

Architecture Notes

Vision

Evolve Ralph from single-agent PRD executor → multi-agent team system with:

SOUL-based personalities - agents with distinct roles/skills
Team coordination - multiple agents collaborating on shared work
Filesystem-native - no external DB, everything in files (matches Doc 2)
Observable - live dashboard with progress, costs, activity

User Workflow

# 1. Install to any project
curl -fsSL https://tybarho.com/ralph/install.sh | bash

# 2. Opens local dashboard (React)
# 3. Define teams + agents (AI-assisted)
# 4. Create tasks/PRDs/crons
# 5. Watch agents work with live progress + costs

Directory Structure (per project)

.ralph/
├── config.json              # Teams, agent assignments, schedules
├── dashboard/               # React app (served locally)
│
├── agents/                  # Agent definitions
│   ├── friday/
│   │   ├── SOUL.md          # Personality, skills, voice
│   │   └── MEMORY.md        # Agent-specific learnings
│   ├── loki/
│   │   ├── SOUL.md
│   │   └── MEMORY.md
│   └── shuri/
│       ├── SOUL.md
│       └── MEMORY.md
│
├── teams/                   # Team compositions
│   ├── engineering.json     # { agents: ["friday", "shuri"], focus: "..." }
│   ├── marketing.json       # { agents: ["loki", "quill", "vision"] }
│   └── research.json        # { agents: ["fury", "shuri"] }
│
├── state/                   # Shared state (filesystem-based)
│   ├── tasks.json           # Task board (inbox, assigned, in_progress, review, done)
│   ├── activity.jsonl       # Append-only activity log
│   └── agents/              # Per-agent runtime state
│       ├── friday.json      # { status, currentTask, lastHeartbeat, tokenUsage }
│       └── loki.json
│
├── work/                    # Task workspaces
│   ├── {task-id}/
│   │   ├── task.json        # Task definition
│   │   ├── thread.jsonl     # Comments/discussion
│   │   └── deliverables/    # Output files
│
├── memory/                  # Shared knowledge
│   ├── CODEBASE.md          # Project-wide patterns (current MEMORY.md)
│   └── CONTEXT.md           # User context (who you are)
│
└── logs/                    # Execution logs + costs
    ├── 2026-02-05.jsonl     # Daily log with token counts
    └── costs.json           # Aggregated cost tracking

Agent SOUL Structure

# SOUL.md — Friday (Developer)

## Identity
Name: Friday
Role: Developer
Team: Engineering

## Personality
Code is poetry. Clean, tested, documented.
Prefers small PRs. Runs tests before committing.
Asks Shuri for review on anything user-facing.

## Skills
- TypeScript, React, Next.js
- Testing (vitest, playwright)
- Database migrations
- API design

## Voice
Direct. Technical. Cites file paths and line numbers.
Uses code blocks liberally. Explains the "why" not just "what."

## Boundaries
- Won't merge without tests passing
- Escalates to human if touching auth/payments
- Asks for design review on UI changes

MCP Tools Integration

MCP (Model Context Protocol) tools like calendar, email, image generation, etc. are configured at the CLI level but need to be managed per-agent.

The Challenge:

Tools configured in ~/.cursor/mcp.json or ~/.claude/mcp.json (global)
Different agents need different tools (Friday doesn't need calendar, Pepper does)
Some tools are sensitive (email, calendar) — not every agent should have access
Tool availability should be part of agent identity

Solution: Tool Allowlists in SOUL

# SOUL.md — Pepper (Email Marketing)

## Tools
Allowed:
- email (send, draft, search)
- calendar (read, create events)
- generate_image (for email graphics)

Denied:
- shell (no arbitrary command execution)
- filesystem (only through orchestrator)

How It Works:

User's MCP config (global)
    ↓
Orchestrator reads available tools
    ↓
Agent SOUL specifies allowed tools
    ↓
Orchestrator filters tools before spawning agent
    ↓
Agent only sees tools in their allowlist

Implementation Options:

Option A: CLI flag filtering (if supported)

# Pass allowed tools to CLI
cursor agent --tools "email,calendar,generate_image" --prompt "..."
claude --tools "email,calendar" --prompt "..."

Option B: System prompt enforcement

# Injected into agent's system prompt
You have access to these tools ONLY:
- email: send, draft, search emails
- calendar: read and create events

Do NOT attempt to use: shell, filesystem, web_search

Option C: Orchestrator proxy (most control)

// Orchestrator intercepts tool calls
const allowedTools = loadAgentTools(agent);

const toolProxy = {
  async callTool(name: string, args: any) {
    if (!allowedTools.includes(name)) {
      throw new Error(`Agent ${agent} not authorized for tool: ${name}`);
    }
    return await mcpClient.callTool(name, args);
  }
};

Tool Categories:

Category	Tools	Agents
Code	shell, filesystem, git	friday, shuri
Communication	email, slack, calendar	pepper, jarvis
Content	generate_image, web_search, pdf_reader	loki, wanda, fury
Data	database, analytics	vision, fury

Team-Level Tool Inheritance:

// .ralph/teams/marketing.json
{
  "agents": ["loki", "quill", "pepper"],
  "tools": {
    "shared": ["generate_image", "web_search"],
    "per_agent": {
      "pepper": ["email", "calendar"],
      "quill": ["twitter", "linkedin"]
    }
  }
}

Agent Tool Config:

// .ralph/agents/pepper/config.json
{
  "tools": {
    "allow": ["email", "calendar", "generate_image"],
    "deny": ["shell", "filesystem"],
    "require_approval": ["email:send"]  // Human approves before sending
  }
}

Approval Workflow for Sensitive Tools:

Agent wants to send email
    ↓
Tool marked as "require_approval"
    ↓
Orchestrator pauses agent
    ↓
Dashboard shows pending approval:
  "Pepper wants to send email to user@example.com
   Subject: Welcome to our newsletter
   [Approve] [Deny] [Edit]"
    ↓
Human approves/denies
    ↓
Agent continues or handles denial

Directory Structure Update:

.ralph/
├── agents/
│   └── pepper/
│       ├── SOUL.md
│       ├── MEMORY.md
│       └── config.json      # Tool permissions, model preferences
│
├── tools/                   # Tool configs and wrappers
│   ├── available.json       # Discovered from global MCP config
│   └── approvals.jsonl      # Pending/completed approval log

Discovery: Reading Global MCP Config:

// orchestrator reads user's MCP setup
function discoverTools() {
  const cursorConfig = readJson('~/.cursor/mcp.json');
  const claudeConfig = readJson('~/.claude/mcp.json');
  
  const allTools = [
    ...parseMcpServers(cursorConfig),
    ...parseMcpServers(claudeConfig)
  ];
  
  writeJson('.ralph/tools/available.json', allTools);
}

Orchestration

Option A: Shell-based (simple, current Ralph pattern)

# ralph-orchestrator.sh
# Runs in terminal, manages heartbeats for all agents

while true; do
  for agent in $(ls .ralph/agents/); do
    if should_wake "$agent"; then
      ralph-agent "$agent" &
    fi
  done
  sleep 60
done

Option B: Node process (better for dashboard integration)

// orchestrator.ts
// Manages agents, serves dashboard, tracks costs

const orchestrator = new RalphOrchestrator({
  agents: loadAgents('.ralph/agents/'),
  teams: loadTeams('.ralph/teams/'),
  heartbeatInterval: 15 * 60 * 1000, // 15 min
});

orchestrator.on('agent:wake', (agent) => {
  // Spawn cursor agent CLI with agent's SOUL
});

orchestrator.on('agent:complete', (agent, result) => {
  // Log activity, update costs, notify subscribers
});

// Serve dashboard on localhost:3333
orchestrator.serveDashboard();

Heartbeat Flow (per agent)

Agent wakes (cron or orchestrator)
    ↓
1. Read own SOUL.md (who am I?)
2. Read state/tasks.json (what needs doing?)
3. Check for @mentions in activity.jsonl
4. Check assigned tasks for my teams
    ↓
5. If work found:
   - Claim task (update state)
   - Read task workspace (work/{task-id}/)
   - Read relevant MEMORY files
   - Do work (one unit)
   - Post to thread.jsonl
   - Update deliverables/
   - Log activity + tokens
    ↓
6. If no work:
   - Log HEARTBEAT_OK
   - Go back to sleep

Task Structure

{
  "id": "task_2026020512345",
  "title": "Add dark mode toggle to settings",
  "description": "...",
  "status": "in_progress",
  "assignees": ["friday", "shuri"],
  "team": "engineering",
  "created": "2026-02-05T10:00:00Z",
  "priority": 1,
  "type": "feature",  // feature | bug | research | content
  "steps": [
    { "description": "Add toggle component", "passes": true },
    { "description": "Wire up to theme context", "passes": false },
    { "description": "Add tests", "passes": false }
  ]
}

Agent Communication

@mentions in activity log:

{"ts":"...","agent":"friday","type":"comment","task":"task_123","content":"@shuri can you test this on mobile?"}
{"ts":"...","agent":"shuri","type":"comment","task":"task_123","content":"Tested. Found edge case with system preference..."}

Thread subscriptions: Agent auto-subscribed when they:

Get assigned to task
Comment on task
Get @mentioned

Dashboard (React)

Views:

Activity Feed - real-time stream of all agent activity
Task Board - kanban by status (inbox → done)
Agent Status - who's awake, what they're doing, last heartbeat
Cost Tracker - tokens/$ per agent, per team, per day
Team View - filter by team

Tech:

Vite + React (fast, local)
Watches .ralph/state/ for changes (chokidar or polling)
SSE or WebSocket from orchestrator for live updates
TailwindCSS for styling

Cost Tracking

// .ralph/logs/costs.json
{
  "daily": {
    "2026-02-05": {
      "total_tokens": 145000,
      "total_cost": 2.34,
      "by_agent": {
        "friday": { "tokens": 80000, "cost": 1.28 },
        "loki": { "tokens": 65000, "cost": 1.06 }
      }
    }
  },
  "lifetime": {
    "total_cost": 47.82
  }
}

Eval Integration

Each agent can have evals (like evals/agents/calendar-assistant.yaml):

# .ralph/agents/friday/eval.yaml
name: friday-code-quality
cases:
  - input: "Add a button component"
    expected:
      - creates component file
      - exports named function (not arrow)
      - includes basic test

Migration from Current Ralph

Current	New
plans/MEMORY.md	.ralph/memory/CODEBASE.md
plans/CONTEXT.md	.ralph/memory/CONTEXT.md
plans/PROMPT.md	.ralph/agents/{name}/SOUL.md (per agent)
plans/*.prd.json	.ralph/work/{task-id}/task.json
plans/*.progress.txt	.ralph/work/{task-id}/thread.jsonl
plans/ralph.sh	.ralph/orchestrator (shell or node)

Decisions

CLI: Support both claude CLI and cursor agent CLI (user choice)
Dashboard: Local only (served from orchestrator)
Model selection: Cheap models for heartbeats/routing, expensive for creative work
Git workflow: Git worktrees (see primer below)
Cross-project teams: No — agents defined per-project

Git Worktrees Primer

Git worktrees let you check out multiple branches simultaneously in separate directories, all linked to the same repo. Perfect for parallel agent work.

Basic Commands:

# Create a worktree for an agent's branch
git worktree add .ralph/worktrees/friday feature/friday-dark-mode

# List all worktrees
git worktree list

# Remove a worktree when done
git worktree remove .ralph/worktrees/friday

# Prune stale worktree references
git worktree prune

Directory Structure with Worktrees:

my-project/                    # Main worktree (main branch)
├── .ralph/
│   └── worktrees/
│       ├── friday/            # Friday's worktree (feature/friday-dark-mode)
│       │   ├── src/
│       │   └── ...
│       └── loki/              # Loki's worktree (content/loki-blog-post)
│           ├── src/
│           └── ...

How Agents Use Worktrees:

Agent claims task
    ↓
1. Create branch: git branch feature/{agent}-{task-slug}
2. Create worktree: git worktree add .ralph/worktrees/{agent} feature/{agent}-{task-slug}
3. Agent works in worktree (isolated from main)
4. Agent commits to their branch
5. When task complete: PR or merge to main
6. Cleanup: git worktree remove .ralph/worktrees/{agent}

Why Worktrees (vs regular branches):

Approach	Problem
Shared branch	Agents step on each other's changes
Branch switching	Can only work on one branch at a time
Separate clones	Wastes disk space, syncing headaches
Worktrees	Parallel work, shared .git, isolated working dirs

Limitations:

Can't double-checkout — Same branch can't be in two worktrees simultaneously
Cleanup required — Must remove worktrees when done or they pile up
Some tools confused — IDE file watchers, some git GUIs don't handle worktrees well
Merge conflicts — Still need to resolve when merging back to main
Disk space — Each worktree is a full working copy (but shares .git objects)

Orchestrator Worktree Management:

// When agent claims task
async function setupAgentWorktree(agent: string, taskId: string) {
  const branch = `feature/${agent}-${taskId}`;
  const worktreePath = `.ralph/worktrees/${agent}`;
  
  // Create branch from main
  await exec(`git branch ${branch} main`);
  
  // Create worktree
  await exec(`git worktree add ${worktreePath} ${branch}`);
  
  return worktreePath;
}

// When task complete
async function cleanupAgentWorktree(agent: string) {
  const worktreePath = `.ralph/worktrees/${agent}`;
  
  // Remove worktree
  await exec(`git worktree remove ${worktreePath}`);
  
  // Optionally delete branch after merge
  // await exec(`git branch -d ${branch}`);
}

Merge Strategy Options:

# Option A: Direct merge (simple, but messy history)
git checkout main
git merge feature/friday-dark-mode

# Option B: Squash merge (clean history, loses granular commits)
git checkout main
git merge --squash feature/friday-dark-mode
git commit -m "feat: dark mode toggle (friday)"

# Option C: PR-based (best for review, needs GitHub CLI)
gh pr create --base main --head feature/friday-dark-mode
# Human reviews, approves, merges

Recommended Flow for Teams:

main ─────────────────────────────────────────────►
       \                    /         \          /
        friday-task-1 ─────►           friday-task-2 ─►
       \              /
        loki-task-1 ─►

Each agent:

Branches from latest main
Works in isolation
Squash merges back (or PR for review)
Worktree cleaned up

Edge Case: Agent Needs Another Agent's Work

# Loki needs Friday's changes before they're merged
cd .ralph/worktrees/loki
git fetch origin
git merge origin/feature/friday-dark-mode
# Or cherry-pick specific commits

Updated Directory Structure

my-project/
├── .ralph/
│   ├── config.json
│   ├── dashboard/
│   ├── agents/
│   ├── teams/
│   ├── state/
│   ├── work/
│   ├── memory/
│   ├── logs/
│   └── worktrees/           # Agent working directories
│       ├── friday/          # → feature/friday-{task}
│       └── loki/            # → content/loki-{task}

Reference Documents

Reference Document 1: Building Mission Control (AI Agent Squad)

Summary of @pbteja1998's guide on building a 10-agent AI team using Clawdbot/OpenClaw

The Problem

Every AI tool has the same issue: no continuity. Conversations start fresh, context from yesterday is gone, research gets lost in chat threads.

The goal: AI that works like a team, not a search box.

Core Architecture: Clawdbot (OpenClaw)

An open-source AI agent framework with three jobs:

Connects AI to real world - file access, shell, web browsing, APIs
Maintains persistent sessions - conversation history survives restarts
Routes messages - Telegram, Discord, Slack, etc.

Sessions: The Key Concept

Each session has:

Unique session key (e.g., agent:main:main)
Independent conversation history (JSONL files on disk)
Own model and tools

Sessions are independent - each agent is just a Clawdbot session with specialized config.

The Workspace

/home/usr/clawd/           ← Workspace root
├── AGENTS.md              ← Operating manual
├── SOUL.md                ← Agent personality
├── memory/
│   ├── WORKING.md         ← Current task state
│   └── YYYY-MM-DD.md      ← Daily notes
├── scripts/
└── config/

Multi-Agent Setup

10 Agents = 10 Sessions

Agent	Role	Session Key
Jarvis	Squad Lead	agent:main:main
Shuri	Product Analyst	agent:product-analyst:main
Fury	Customer Researcher	agent:customer-researcher:main
Vision	SEO Analyst	agent:seo-analyst:main
Loki	Content Writer	agent:content-writer:main
Quill	Social Media	agent:social-media-manager:main
Wanda	Designer	agent:designer:main
Pepper	Email Marketing	agent:email-marketing:main
Friday	Developer	agent:developer:main
Wong	Documentation	agent:notion-agent:main

The Heartbeat System

Agents wake every 15 minutes via cron (staggered schedule):

:00 Pepper, :02 Shuri, :04 Friday, :06 Loki, :07 Wanda, :08 Vision, :10 Fury, :12 Quill

Each heartbeat:

Load context (read WORKING.md)
Check for @mentions and assigned tasks
Scan activity feed
Take action or report HEARTBEAT_OK

Why 15 minutes? 5 min = too expensive, 30 min = too slow.

Mission Control: The Shared Brain

Convex database with 6 tables:

agents - name, role, status, currentTaskId, sessionKey
tasks - title, description, status, assigneeIds
messages - taskId, fromAgentId, content, attachments
activities - type, agentId, message
documents - title, content, type, taskId
notifications - mentionedAgentId, content, delivered

Agent Communication

Option 1: Direct session messaging

clawdbot sessions send --session "agent:seo-analyst:main" --message "Vision, review this?"

Option 2: Shared database (preferred) - all agents read/write to same Convex DB.

@Mentions & Thread Subscriptions

Type @Vision → Vision notified on next heartbeat
Type @all → everyone notified
Interact with a task → auto-subscribed to all future comments

The SOUL System (Agent Personalities)

# SOUL.md — Who You Are

**Name:** Shuri
**Role:** Product Analyst

## Personality
Skeptical tester. Thorough bug hunter. Finds edge cases.
Think like a first-time user. Question everything.

## What You're Good At
- Testing features from user perspective
- Finding UX issues and edge cases
- Competitive analysis

Key insight: An agent "good at everything" is mediocre. Constraints focus them.

Memory Stack

Session Memory (built-in) - JSONL conversation history
Working Memory (/memory/WORKING.md) - current task state, read on wake
Daily Notes (/memory/YYYY-MM-DD.md) - raw logs
Long-term Memory (MEMORY.md) - curated important stuff

Golden Rule: If you want to remember something, write it to a file.

Task Lifecycle

Inbox - new, unassigned
Assigned - has owner(s), not started
In Progress - being worked on
Review - done, needs approval
Done - finished
Blocked - stuck

Daily Standup

Cron at 11:30 PM sends summary to Telegram:

Completed today
In progress
Blocked items
Needs review
Key decisions

Lessons Learned

Start smaller - get 2-3 agents solid before adding more
Use cheaper models for routine work - heartbeats don't need expensive models
Memory is hard - put everything in files, not "mental notes"
Let agents surprise you - they'll contribute to unassigned tasks

Quick Start

npm install -g clawdbot
clawdbot init
clawdbot gateway start

# Add heartbeat
clawdbot cron add --name "agent-heartbeat" --cron "*/15 * * * *" \
  --session "isolated" \
  --message "Check for work. If nothing, reply HEARTBEAT_OK."

The Real Secret

Treat AI agents like team members: give them roles, memory, let them collaborate, hold them accountable.

Source: X Article by @pbteja1998 | Built on OpenClaw

Reference Document 2: Agents with Filesystems and Bash

Summary of Vercel's blog post by Ashka Stephen (Jan 9, 2026)

The Core Insight

Replace custom tooling with filesystem + bash. Sales call summarization agent went from ~$1.00 to ~$0.25 per call on Claude Opus 4.5, with improved output quality.

Why it works: LLMs trained on massive amounts of code. They've spent countless hours navigating directories, grepping files, managing state. If agents excel at filesystem ops for code, they excel at filesystem ops for anything.

How Agents Read Filesystems

Agent receives task
    ↓
Explores filesystem (ls, find)
    ↓
Searches for relevant content (grep, cat)
    ↓
Sends context + request to LLM
    ↓
Returns structured output

Agent runs in sandbox. Reasoning is trusted, but sandbox isolates what it can actually do.

Why Filesystems Beat Vector Search

Approach	Problem
Prompt stuffing	Hits token limits
Vector search	Imprecise for specific values
Filesystem	Structure matches domain, precise retrieval, minimal context

Key advantages:

Structure matches domain - hierarchies map to directories
Retrieval is precise - grep -r "pricing objection" transcripts/ returns exact matches
Context stays minimal - agent loads files on demand, not upfront

Domain Mapping Examples

Customer Support:

/customers/
  /cust_12345/
    profile.json
    tickets/
      ticket_001.md
      ticket_002.md
    conversations/
      2024-01-15.txt
    preferences.json

Document Analysis:

/documents/
  /uploaded/
    contract_abc123.pdf
  /extracted/
    contract_abc123.txt
  /analysis/
    contract_abc123/
      summary.md
      key_terms.json
      risk_assessment.md
/templates/
  contract_analysis_prompt.md

Sales Call Summary Agent Structure

gong-calls/
  demo-call-001-companyname-product-demo.md
  metadata.json
  previous-calls/
    demo-call-000-discovery-call.md
salesforce/
  account.md
  opportunity.md
  contacts.md
slack/
  slack-channel.md
research/
  company-research.md
  competitive-intel.md
playbooks/
  sales-playbook.md

Agent explores like a codebase:

$ ls sales-calls/
$ cat sales-calls/metadata.json
$ grep -i "concern\|worried\|issue" sales-calls/*.md

Why Bash + Filesystem

Native model capabilities - grep, cat, find, awk are native ops, not bolted on
Future-proof - as models improve at coding, agents improve automatically
Debuggable - see exactly what files were read, what commands ran
Secure - sandbox isolates execution
Less code - no retrieval pipelines, just write files to directories

Tools

AI SDK - tool execution and model calls
bash-tool - sandboxed filesystem access
Sales Call Summary template - full pattern example

The Punchline

"The future of agents might be surprisingly simple. Maybe the best architecture is almost no architecture at all. Just filesystems and bash."

Source: Vercel Blog