AI Agent System Architecture
AI Agent System Architecture
Systems design for multi-agent coordination
Architecture Notes
Vision
Evolve Ralph from single-agent PRD executor → multi-agent team system with:
- SOUL-based personalities - agents with distinct roles/skills
- Team coordination - multiple agents collaborating on shared work
- Filesystem-native - no external DB, everything in files (matches Doc 2)
- Observable - live dashboard with progress, costs, activity
User Workflow
# 1. Install to any project
curl -fsSL https://tybarho.com/ralph/install.sh | bash
# 2. Opens local dashboard (React)
# 3. Define teams + agents (AI-assisted)
# 4. Create tasks/PRDs/crons
# 5. Watch agents work with live progress + costs
Directory Structure (per project)
.ralph/
├── config.json # Teams, agent assignments, schedules
├── dashboard/ # React app (served locally)
│
├── agents/ # Agent definitions
│ ├── friday/
│ │ ├── SOUL.md # Personality, skills, voice
│ │ └── MEMORY.md # Agent-specific learnings
│ ├── loki/
│ │ ├── SOUL.md
│ │ └── MEMORY.md
│ └── shuri/
│ ├── SOUL.md
│ └── MEMORY.md
│
├── teams/ # Team compositions
│ ├── engineering.json # { agents: ["friday", "shuri"], focus: "..." }
│ ├── marketing.json # { agents: ["loki", "quill", "vision"] }
│ └── research.json # { agents: ["fury", "shuri"] }
│
├── state/ # Shared state (filesystem-based)
│ ├── tasks.json # Task board (inbox, assigned, in_progress, review, done)
│ ├── activity.jsonl # Append-only activity log
│ └── agents/ # Per-agent runtime state
│ ├── friday.json # { status, currentTask, lastHeartbeat, tokenUsage }
│ └── loki.json
│
├── work/ # Task workspaces
│ ├── {task-id}/
│ │ ├── task.json # Task definition
│ │ ├── thread.jsonl # Comments/discussion
│ │ └── deliverables/ # Output files
│
├── memory/ # Shared knowledge
│ ├── CODEBASE.md # Project-wide patterns (current MEMORY.md)
│ └── CONTEXT.md # User context (who you are)
│
└── logs/ # Execution logs + costs
├── 2026-02-05.jsonl # Daily log with token counts
└── costs.json # Aggregated cost tracking
Agent SOUL Structure
# SOUL.md — Friday (Developer)
## Identity
Name: Friday
Role: Developer
Team: Engineering
## Personality
Code is poetry. Clean, tested, documented.
Prefers small PRs. Runs tests before committing.
Asks Shuri for review on anything user-facing.
## Skills
- TypeScript, React, Next.js
- Testing (vitest, playwright)
- Database migrations
- API design
## Voice
Direct. Technical. Cites file paths and line numbers.
Uses code blocks liberally. Explains the "why" not just "what."
## Boundaries
- Won't merge without tests passing
- Escalates to human if touching auth/payments
- Asks for design review on UI changes
MCP Tools Integration
MCP (Model Context Protocol) tools like calendar, email, image generation, etc. are configured at the CLI level but need to be managed per-agent.
The Challenge:
- Tools configured in
~/.cursor/mcp.jsonor~/.claude/mcp.json(global) - Different agents need different tools (Friday doesn't need calendar, Pepper does)
- Some tools are sensitive (email, calendar) — not every agent should have access
- Tool availability should be part of agent identity
Solution: Tool Allowlists in SOUL
# SOUL.md — Pepper (Email Marketing)
## Tools
Allowed:
- email (send, draft, search)
- calendar (read, create events)
- generate_image (for email graphics)
Denied:
- shell (no arbitrary command execution)
- filesystem (only through orchestrator)
How It Works:
User's MCP config (global)
↓
Orchestrator reads available tools
↓
Agent SOUL specifies allowed tools
↓
Orchestrator filters tools before spawning agent
↓
Agent only sees tools in their allowlist
Implementation Options:
Option A: CLI flag filtering (if supported)
# Pass allowed tools to CLI
cursor agent --tools "email,calendar,generate_image" --prompt "..."
claude --tools "email,calendar" --prompt "..."
Option B: System prompt enforcement
# Injected into agent's system prompt
You have access to these tools ONLY:
- email: send, draft, search emails
- calendar: read and create events
Do NOT attempt to use: shell, filesystem, web_search
Option C: Orchestrator proxy (most control)
// Orchestrator intercepts tool calls
const allowedTools = loadAgentTools(agent);
const toolProxy = {
async callTool(name: string, args: any) {
if (!allowedTools.includes(name)) {
throw new Error(`Agent ${agent} not authorized for tool: ${name}`);
}
return await mcpClient.callTool(name, args);
}
};
Tool Categories:
| Category | Tools | Agents |
|---|---|---|
| Code | shell, filesystem, git | friday, shuri |
| Communication | email, slack, calendar | pepper, jarvis |
| Content | generate_image, web_search, pdf_reader | loki, wanda, fury |
| Data | database, analytics | vision, fury |
Team-Level Tool Inheritance:
// .ralph/teams/marketing.json
{
"agents": ["loki", "quill", "pepper"],
"tools": {
"shared": ["generate_image", "web_search"],
"per_agent": {
"pepper": ["email", "calendar"],
"quill": ["twitter", "linkedin"]
}
}
}
Agent Tool Config:
// .ralph/agents/pepper/config.json
{
"tools": {
"allow": ["email", "calendar", "generate_image"],
"deny": ["shell", "filesystem"],
"require_approval": ["email:send"] // Human approves before sending
}
}
Approval Workflow for Sensitive Tools:
Agent wants to send email
↓
Tool marked as "require_approval"
↓
Orchestrator pauses agent
↓
Dashboard shows pending approval:
"Pepper wants to send email to user@example.com
Subject: Welcome to our newsletter
[Approve] [Deny] [Edit]"
↓
Human approves/denies
↓
Agent continues or handles denial
Directory Structure Update:
.ralph/
├── agents/
│ └── pepper/
│ ├── SOUL.md
│ ├── MEMORY.md
│ └── config.json # Tool permissions, model preferences
│
├── tools/ # Tool configs and wrappers
│ ├── available.json # Discovered from global MCP config
│ └── approvals.jsonl # Pending/completed approval log
Discovery: Reading Global MCP Config:
// orchestrator reads user's MCP setup
function discoverTools() {
const cursorConfig = readJson('~/.cursor/mcp.json');
const claudeConfig = readJson('~/.claude/mcp.json');
const allTools = [
...parseMcpServers(cursorConfig),
...parseMcpServers(claudeConfig)
];
writeJson('.ralph/tools/available.json', allTools);
}
Orchestration
Option A: Shell-based (simple, current Ralph pattern)
# ralph-orchestrator.sh
# Runs in terminal, manages heartbeats for all agents
while true; do
for agent in $(ls .ralph/agents/); do
if should_wake "$agent"; then
ralph-agent "$agent" &
fi
done
sleep 60
done
Option B: Node process (better for dashboard integration)
// orchestrator.ts
// Manages agents, serves dashboard, tracks costs
const orchestrator = new RalphOrchestrator({
agents: loadAgents('.ralph/agents/'),
teams: loadTeams('.ralph/teams/'),
heartbeatInterval: 15 * 60 * 1000, // 15 min
});
orchestrator.on('agent:wake', (agent) => {
// Spawn cursor agent CLI with agent's SOUL
});
orchestrator.on('agent:complete', (agent, result) => {
// Log activity, update costs, notify subscribers
});
// Serve dashboard on localhost:3333
orchestrator.serveDashboard();
Heartbeat Flow (per agent)
Agent wakes (cron or orchestrator)
↓
1. Read own SOUL.md (who am I?)
2. Read state/tasks.json (what needs doing?)
3. Check for @mentions in activity.jsonl
4. Check assigned tasks for my teams
↓
5. If work found:
- Claim task (update state)
- Read task workspace (work/{task-id}/)
- Read relevant MEMORY files
- Do work (one unit)
- Post to thread.jsonl
- Update deliverables/
- Log activity + tokens
↓
6. If no work:
- Log HEARTBEAT_OK
- Go back to sleep
Task Structure
{
"id": "task_2026020512345",
"title": "Add dark mode toggle to settings",
"description": "...",
"status": "in_progress",
"assignees": ["friday", "shuri"],
"team": "engineering",
"created": "2026-02-05T10:00:00Z",
"priority": 1,
"type": "feature", // feature | bug | research | content
"steps": [
{ "description": "Add toggle component", "passes": true },
{ "description": "Wire up to theme context", "passes": false },
{ "description": "Add tests", "passes": false }
]
}
Agent Communication
@mentions in activity log:
{"ts":"...","agent":"friday","type":"comment","task":"task_123","content":"@shuri can you test this on mobile?"}
{"ts":"...","agent":"shuri","type":"comment","task":"task_123","content":"Tested. Found edge case with system preference..."}
Thread subscriptions: Agent auto-subscribed when they:
- Get assigned to task
- Comment on task
- Get @mentioned
Dashboard (React)
Views:
- Activity Feed - real-time stream of all agent activity
- Task Board - kanban by status (inbox → done)
- Agent Status - who's awake, what they're doing, last heartbeat
- Cost Tracker - tokens/$ per agent, per team, per day
- Team View - filter by team
Tech:
- Vite + React (fast, local)
- Watches
.ralph/state/for changes (chokidar or polling) - SSE or WebSocket from orchestrator for live updates
- TailwindCSS for styling
Cost Tracking
// .ralph/logs/costs.json
{
"daily": {
"2026-02-05": {
"total_tokens": 145000,
"total_cost": 2.34,
"by_agent": {
"friday": { "tokens": 80000, "cost": 1.28 },
"loki": { "tokens": 65000, "cost": 1.06 }
}
}
},
"lifetime": {
"total_cost": 47.82
}
}
Eval Integration
Each agent can have evals (like evals/agents/calendar-assistant.yaml):
# .ralph/agents/friday/eval.yaml
name: friday-code-quality
cases:
- input: "Add a button component"
expected:
- creates component file
- exports named function (not arrow)
- includes basic test
Migration from Current Ralph
| Current | New |
|---|---|
| plans/MEMORY.md | .ralph/memory/CODEBASE.md |
| plans/CONTEXT.md | .ralph/memory/CONTEXT.md |
| plans/PROMPT.md | .ralph/agents/{name}/SOUL.md (per agent) |
| plans/*.prd.json | .ralph/work/{task-id}/task.json |
| plans/*.progress.txt | .ralph/work/{task-id}/thread.jsonl |
| plans/ralph.sh | .ralph/orchestrator (shell or node) |
Decisions
- CLI: Support both
claudeCLI andcursor agentCLI (user choice) - Dashboard: Local only (served from orchestrator)
- Model selection: Cheap models for heartbeats/routing, expensive for creative work
- Git workflow: Git worktrees (see primer below)
- Cross-project teams: No — agents defined per-project
Git Worktrees Primer
Git worktrees let you check out multiple branches simultaneously in separate directories, all linked to the same repo. Perfect for parallel agent work.
Basic Commands:
# Create a worktree for an agent's branch
git worktree add .ralph/worktrees/friday feature/friday-dark-mode
# List all worktrees
git worktree list
# Remove a worktree when done
git worktree remove .ralph/worktrees/friday
# Prune stale worktree references
git worktree prune
Directory Structure with Worktrees:
my-project/ # Main worktree (main branch)
├── .ralph/
│ └── worktrees/
│ ├── friday/ # Friday's worktree (feature/friday-dark-mode)
│ │ ├── src/
│ │ └── ...
│ └── loki/ # Loki's worktree (content/loki-blog-post)
│ ├── src/
│ └── ...
How Agents Use Worktrees:
Agent claims task
↓
1. Create branch: git branch feature/{agent}-{task-slug}
2. Create worktree: git worktree add .ralph/worktrees/{agent} feature/{agent}-{task-slug}
3. Agent works in worktree (isolated from main)
4. Agent commits to their branch
5. When task complete: PR or merge to main
6. Cleanup: git worktree remove .ralph/worktrees/{agent}
Why Worktrees (vs regular branches):
| Approach | Problem |
|---|---|
| Shared branch | Agents step on each other's changes |
| Branch switching | Can only work on one branch at a time |
| Separate clones | Wastes disk space, syncing headaches |
| Worktrees | Parallel work, shared .git, isolated working dirs |
Limitations:
- Can't double-checkout — Same branch can't be in two worktrees simultaneously
- Cleanup required — Must remove worktrees when done or they pile up
- Some tools confused — IDE file watchers, some git GUIs don't handle worktrees well
- Merge conflicts — Still need to resolve when merging back to main
- Disk space — Each worktree is a full working copy (but shares .git objects)
Orchestrator Worktree Management:
// When agent claims task
async function setupAgentWorktree(agent: string, taskId: string) {
const branch = `feature/${agent}-${taskId}`;
const worktreePath = `.ralph/worktrees/${agent}`;
// Create branch from main
await exec(`git branch ${branch} main`);
// Create worktree
await exec(`git worktree add ${worktreePath} ${branch}`);
return worktreePath;
}
// When task complete
async function cleanupAgentWorktree(agent: string) {
const worktreePath = `.ralph/worktrees/${agent}`;
// Remove worktree
await exec(`git worktree remove ${worktreePath}`);
// Optionally delete branch after merge
// await exec(`git branch -d ${branch}`);
}
Merge Strategy Options:
# Option A: Direct merge (simple, but messy history)
git checkout main
git merge feature/friday-dark-mode
# Option B: Squash merge (clean history, loses granular commits)
git checkout main
git merge --squash feature/friday-dark-mode
git commit -m "feat: dark mode toggle (friday)"
# Option C: PR-based (best for review, needs GitHub CLI)
gh pr create --base main --head feature/friday-dark-mode
# Human reviews, approves, merges
Recommended Flow for Teams:
main ─────────────────────────────────────────────►
\ / \ /
friday-task-1 ─────► friday-task-2 ─►
\ /
loki-task-1 ─►
Each agent:
- Branches from latest main
- Works in isolation
- Squash merges back (or PR for review)
- Worktree cleaned up
Edge Case: Agent Needs Another Agent's Work
# Loki needs Friday's changes before they're merged
cd .ralph/worktrees/loki
git fetch origin
git merge origin/feature/friday-dark-mode
# Or cherry-pick specific commits
Updated Directory Structure
my-project/
├── .ralph/
│ ├── config.json
│ ├── dashboard/
│ ├── agents/
│ ├── teams/
│ ├── state/
│ ├── work/
│ ├── memory/
│ ├── logs/
│ └── worktrees/ # Agent working directories
│ ├── friday/ # → feature/friday-{task}
│ └── loki/ # → content/loki-{task}
Reference Documents
Reference Document 1: Building Mission Control (AI Agent Squad)
Summary of @pbteja1998's guide on building a 10-agent AI team using Clawdbot/OpenClaw
The Problem
Every AI tool has the same issue: no continuity. Conversations start fresh, context from yesterday is gone, research gets lost in chat threads.
The goal: AI that works like a team, not a search box.
Core Architecture: Clawdbot (OpenClaw)
An open-source AI agent framework with three jobs:
- Connects AI to real world - file access, shell, web browsing, APIs
- Maintains persistent sessions - conversation history survives restarts
- Routes messages - Telegram, Discord, Slack, etc.
Sessions: The Key Concept
Each session has:
- Unique session key (e.g.,
agent:main:main) - Independent conversation history (JSONL files on disk)
- Own model and tools
Sessions are independent - each agent is just a Clawdbot session with specialized config.
The Workspace
/home/usr/clawd/ ← Workspace root
├── AGENTS.md ← Operating manual
├── SOUL.md ← Agent personality
├── memory/
│ ├── WORKING.md ← Current task state
│ └── YYYY-MM-DD.md ← Daily notes
├── scripts/
└── config/
Multi-Agent Setup
10 Agents = 10 Sessions
| Agent | Role | Session Key |
|---|---|---|
| Jarvis | Squad Lead | agent:main:main |
| Shuri | Product Analyst | agent:product-analyst:main |
| Fury | Customer Researcher | agent:customer-researcher:main |
| Vision | SEO Analyst | agent:seo-analyst:main |
| Loki | Content Writer | agent:content-writer:main |
| Quill | Social Media | agent:social-media-manager:main |
| Wanda | Designer | agent:designer:main |
| Pepper | Email Marketing | agent:email-marketing:main |
| Friday | Developer | agent:developer:main |
| Wong | Documentation | agent:notion-agent:main |
The Heartbeat System
Agents wake every 15 minutes via cron (staggered schedule):
- :00 Pepper, :02 Shuri, :04 Friday, :06 Loki, :07 Wanda, :08 Vision, :10 Fury, :12 Quill
Each heartbeat:
- Load context (read
WORKING.md) - Check for @mentions and assigned tasks
- Scan activity feed
- Take action or report
HEARTBEAT_OK
Why 15 minutes? 5 min = too expensive, 30 min = too slow.
Mission Control: The Shared Brain
Convex database with 6 tables:
agents- name, role, status, currentTaskId, sessionKeytasks- title, description, status, assigneeIdsmessages- taskId, fromAgentId, content, attachmentsactivities- type, agentId, messagedocuments- title, content, type, taskIdnotifications- mentionedAgentId, content, delivered
Agent Communication
Option 1: Direct session messaging
clawdbot sessions send --session "agent:seo-analyst:main" --message "Vision, review this?"
Option 2: Shared database (preferred) - all agents read/write to same Convex DB.
@Mentions & Thread Subscriptions
- Type
@Vision→ Vision notified on next heartbeat - Type
@all→ everyone notified - Interact with a task → auto-subscribed to all future comments
The SOUL System (Agent Personalities)
# SOUL.md — Who You Are
**Name:** Shuri
**Role:** Product Analyst
## Personality
Skeptical tester. Thorough bug hunter. Finds edge cases.
Think like a first-time user. Question everything.
## What You're Good At
- Testing features from user perspective
- Finding UX issues and edge cases
- Competitive analysis
Key insight: An agent "good at everything" is mediocre. Constraints focus them.
Memory Stack
- Session Memory (built-in) - JSONL conversation history
- Working Memory (
/memory/WORKING.md) - current task state, read on wake - Daily Notes (
/memory/YYYY-MM-DD.md) - raw logs - Long-term Memory (
MEMORY.md) - curated important stuff
Golden Rule: If you want to remember something, write it to a file.
Task Lifecycle
- Inbox - new, unassigned
- Assigned - has owner(s), not started
- In Progress - being worked on
- Review - done, needs approval
- Done - finished
- Blocked - stuck
Daily Standup
Cron at 11:30 PM sends summary to Telegram:
- Completed today
- In progress
- Blocked items
- Needs review
- Key decisions
Lessons Learned
- Start smaller - get 2-3 agents solid before adding more
- Use cheaper models for routine work - heartbeats don't need expensive models
- Memory is hard - put everything in files, not "mental notes"
- Let agents surprise you - they'll contribute to unassigned tasks
Quick Start
npm install -g clawdbot
clawdbot init
clawdbot gateway start
# Add heartbeat
clawdbot cron add --name "agent-heartbeat" --cron "*/15 * * * *" \
--session "isolated" \
--message "Check for work. If nothing, reply HEARTBEAT_OK."
The Real Secret
Treat AI agents like team members: give them roles, memory, let them collaborate, hold them accountable.
Source: X Article by @pbteja1998 | Built on OpenClaw
Reference Document 2: Agents with Filesystems and Bash
Summary of Vercel's blog post by Ashka Stephen (Jan 9, 2026)
The Core Insight
Replace custom tooling with filesystem + bash. Sales call summarization agent went from ~$1.00 to ~$0.25 per call on Claude Opus 4.5, with improved output quality.
Why it works: LLMs trained on massive amounts of code. They've spent countless hours navigating directories, grepping files, managing state. If agents excel at filesystem ops for code, they excel at filesystem ops for anything.
How Agents Read Filesystems
Agent receives task
↓
Explores filesystem (ls, find)
↓
Searches for relevant content (grep, cat)
↓
Sends context + request to LLM
↓
Returns structured output
Agent runs in sandbox. Reasoning is trusted, but sandbox isolates what it can actually do.
Why Filesystems Beat Vector Search
| Approach | Problem |
|---|---|
| Prompt stuffing | Hits token limits |
| Vector search | Imprecise for specific values |
| Filesystem | Structure matches domain, precise retrieval, minimal context |
Key advantages:
- Structure matches domain - hierarchies map to directories
- Retrieval is precise -
grep -r "pricing objection" transcripts/returns exact matches - Context stays minimal - agent loads files on demand, not upfront
Domain Mapping Examples
Customer Support:
/customers/
/cust_12345/
profile.json
tickets/
ticket_001.md
ticket_002.md
conversations/
2024-01-15.txt
preferences.json
Document Analysis:
/documents/
/uploaded/
contract_abc123.pdf
/extracted/
contract_abc123.txt
/analysis/
contract_abc123/
summary.md
key_terms.json
risk_assessment.md
/templates/
contract_analysis_prompt.md
Sales Call Summary Agent Structure
gong-calls/
demo-call-001-companyname-product-demo.md
metadata.json
previous-calls/
demo-call-000-discovery-call.md
salesforce/
account.md
opportunity.md
contacts.md
slack/
slack-channel.md
research/
company-research.md
competitive-intel.md
playbooks/
sales-playbook.md
Agent explores like a codebase:
$ ls sales-calls/
$ cat sales-calls/metadata.json
$ grep -i "concern\|worried\|issue" sales-calls/*.md
Why Bash + Filesystem
- Native model capabilities - grep, cat, find, awk are native ops, not bolted on
- Future-proof - as models improve at coding, agents improve automatically
- Debuggable - see exactly what files were read, what commands ran
- Secure - sandbox isolates execution
- Less code - no retrieval pipelines, just write files to directories
Tools
- AI SDK - tool execution and model calls
- bash-tool - sandboxed filesystem access
- Sales Call Summary template - full pattern example
The Punchline
"The future of agents might be surprisingly simple. Maybe the best architecture is almost no architecture at all. Just filesystems and bash."
Source: Vercel Blog
