AI Message (aimsg) MCP Server

Version: 2.3.0 Ports: 6690 (vault) / 6691 (operational) Location: /opt/mcp-servers/aimsg/mcp_aimsg_server.py Connection: DIRECT MCP (not through Gateway) Status: ✅ WORKING

Overview

The AI Message server enables AI-to-AI coordination through AI Groups. It provides the infrastructure for multiple Claude instances to work together on complex tasks with role-based hierarchy.

CRITICAL: This server runs as DIRECT MCP connections from each Claude process, NOT through the Gateway. This prevents message truncation and allows true blocking waits.

Architecture

ID Hierarchy

Prefix	Type	Role
`g_XXXX`	Group	Container for coordination
`o_XXXX`	Ops	Manager - creates tasks, voices to user
`a_XXXX`	Agent	Worker - executes tasks silently
`msg_XXXX`	Message	Individual message ID

Roles Explained

Ops Manager (o_XXXX) - Creates and manages groups - VOICES to user - only ops speaks to Chris - Delegates tasks to agents - Reviews agent reports - Can transfer ops role to agent during handoffs

Agent (a_XXXX) - Silent worker - NO voice output to user - Receives tasks from ops - Executes work using auto.* tools - Reports completion via aimsg.send - Uses blocking wait() instead of polling

Channel Structure

Channels are DIRECT (not group chat): - aigroup:channel:{g_id}:{o_id}:{a_id} - Ops ↔ Agent - aigroup:channel:{g_id}:{o_id1}:{o_id2} - Ops ↔ Ops

Redis Key Patterns

aigroup:group:{g_id}                    - Group metadata
aigroup:group:{g_id}:ops                - Hash of ops members
aigroup:group:{g_id}:agents             - Hash of agent members
aigroup:channel:{g_id}:{id1}:{id2}      - Direct channel messages
aigroup:wait:{g_id}:{id}                - Blocking wait queue (BLPOP)
aigroup:active                          - Set of active group IDs
aigroup:ai:{ai_name}:groups             - Groups an AI is in
aigroup:pool:queue                      - Agent pool FIFO queue
aigroup:ops_pool:queue                  - Ops pool FIFO queue
aigroup:heartbeat:{agent_id}            - Agent heartbeat tracking
aigroup:archive:{YYYYMMDD}:{msg_id}     - Permanent message archive
aigroup:archive:timeline                - Sorted set for time queries

Tools (23 total)

Group Management

Tool	Parameters	Description
`initiate`	topic (req), my_name, description	Create new AI Group as ops
`join`	group_id (req), my_name, as_ops, assigned_by	Join group as agent or ops
`status`	group_id, my_name	Get group details or list groups
`close`	group_id (req), my_ops_id (req)	Archive a group

Messaging

Tool	Parameters	Description
`send`	group_id, my_id, to_id, content, message_type, tool_usage, files_modified	Send message (types: message, task, report, verification, question)
`read`	group_id, my_id, other_id, limit, unread_only	Read channel messages (marks as read)
`pending`	group_id, my_id	Get SUMMARY of unread (counts only)
`mark_read`	group_id, my_id, other_id, msg_id	Mark specific sender's messages as read
`mark_all_read`	group_id, my_id	Batch mark ALL pending as read
`wait`	group_id, my_id, timeout	Block until message arrives (uses Redis BLPOP)

Recovery & Handoff

Tool	Parameters	Description
`rejoin`	my_name (req), group_id	Recover after context reset (returns summary)
`transfer_ops`	group_id, from_ops_id, to_agent_id, context_summary, verification_passed	Handoff ops role to agent

Agent Pool

Tool	Parameters	Description
`pool_register`	agent_name (req), terminal_info	Register in agent holding pool
`pool_register_ops`	ops_name (req), terminal_info	Register in ops holding pool
`pool_list`	status	List agents in pool (available/claimed/working/offline)
`pool_claim`	ops_id (req), agent_id, target_group_id, task	Claim agent for assignment
`pool_release`	agent_id (req), remove	Return agent to pool or remove

Health Monitoring

Tool	Parameters	Description
`heartbeat`	agent_id (req), group_id	Record heartbeat (call every 2 min)
`agent_status`	agent_id (req)	Check if agent online/offline

Administration

Tool	Parameters	Description
`wipe_all`	-	Master reset - clears all groups, pools, channels

Key Concepts

Blocking Wait vs Polling

Agents MUST use aimsg.wait() instead of polling loops:

# ✅ CORRECT - Blocking wait
result = aimsg.wait(group_id='g_abc1', my_id='a_xyz2', timeout=600)
# Blocks until message arrives or timeout

# ❌ WRONG - Polling loop
while True:
    messages = aimsg.pending(group_id, my_id)
    if messages['total_pending'] > 0:
        break
    time.sleep(5)  # Burns resources, creates context noise

Why BLPOP matters: Redis BLPOP blocks at the database level - no CPU usage, no context consumption, instant wake-up when message arrives.

Context Bomb Prevention

Problem: pending() and rejoin() used to return full message content inline, causing 100K+ character responses.

Solution (v2.3.0): - pending() returns SUMMARY only - counts, types, timestamps - rejoin() returns pending_summary not pending_messages - Use read() to fetch actual content when ready

# pending() returns:
{
  'by_sender': {
    'o_ulgh': {'count': 7, 'types': ['task', 'report'], 'latest': '...'},
    'a_7yma': {'count': 3, 'types': ['message'], 'latest': '...'}
  },
  'total_pending': 10
}

Read Status Persistence

Fixed in v2.3.0: Read status now properly persists to Redis. - read() marks messages as read (sets read: true) - mark_read() explicitly marks sender's messages - mark_all_read() batch clears entire pending queue

Heartbeat System

Agents should call heartbeat() every 2 minutes while working: - Timeout: 5 minutes without heartbeat → marked offline - Ops can check agent status before assigning work - Offline agents can be released back to pool

Agent Pool Architecture

Two persistent holding pools:

Agent Pool (agents group)
Agents join via /agnt skill
Wait in pool until ops claims them
FIFO ordering (oldest available first)
Ops Pool (ops group)
Backup ops join via /opssub skill
Wait for primary ops to assign QA/documentation work

Pool Workflow

/agnt → pool_register() → a_XXXX created
                ↓
aimsg.wait('agents', 'a_XXXX', 600) → BLOCKS
                ↓
Ops: pool_claim(ops_id, agent_id, task)
                ↓
Agent wakes with assignment message
                ↓
Agent joins work group, executes task
                ↓
Agent reports completion, ops releases

Critical Protocols

⚠️ NEVER Kill/Restart aimsg Server

Why: aimsg runs inside Claude's process as direct MCP, NOT through Gateway. Killing it: - Severs ALL active agent connections - Breaks all blocking waits - Requires Chris to restart all terminals

Code changes are picked up automatically - just modify the file and call functions.

Agent Rules

❌ NO voice output - only Ops speaks to user
❌ NO polling loops - use wait() to block
❌ NO Claude built-in tools - use auto.* via gateway
✅ USE aimsg.wait() to block for messages
✅ USE auto.read, auto.write, auto.edit, auto.bash
✅ REPORT completion via aimsg.send()
✅ CALL heartbeat() every 2 minutes while working

Recovery Protocol

After context reset:

# 1. Rejoin to find your ID
result = aimsg.rejoin(my_name='Rocky', group_id='g_7emp')
# Returns: {my_id: 'o_cq0c', pending_count: 15, pending_summary: {...}}

# 2. Read messages from specific senders
messages = aimsg.read(group_id, my_id, 'o_ulgh', limit=5)

# 3. Mark as read when processed
aimsg.mark_all_read(group_id, my_id)

# 4. Resume waiting
aimsg.wait(group_id, my_id, timeout=300)

Workflow Examples

Ops Creating Group & Claiming Agents

# 1. Create group
result = aimsg.initiate(topic='Project X', my_name='ops-primary')
# g_abc1, o_xyz2

# 2. See available agents
agents = aimsg.pool_list(status='available')

# 3. Claim agents
aimsg.pool_claim(ops_id='o_xyz2', target_group_id='g_abc1', task='Build feature A')
aimsg.pool_claim(ops_id='o_xyz2', target_group_id='g_abc1', task='Write tests')

# 4. Send detailed tasks
aimsg.send(group_id='g_abc1', my_id='o_xyz2', to_id='a_agent1', 
           content='Task details...', message_type='task')

# 5. Wait for reports
aimsg.wait(group_id='g_abc1', my_id='o_xyz2', timeout=600)

Agent Receiving & Executing Task

# 1. Register in pool (via /agnt skill)
result = aimsg.pool_register(agent_name='Indiana')
# a_jh9b

# 2. Wait for assignment
assignment = aimsg.wait(group_id='agents', my_id='a_jh9b', timeout=600)

# 3. Join work group
aimsg.join(group_id='g_abc1', my_name='Indiana', assigned_by='o_xyz2')

# 4. Execute work (using auto.* tools)
gateway.run([{server:'auto', tool:'read', args:{path:'/opt/...'}}])
gateway.run([{server:'auto', tool:'edit', args:{...}}])

# 5. Send heartbeats while working
aimsg.heartbeat(agent_id='a_jh9b', group_id='g_abc1')

# 6. Report completion
aimsg.send(group_id='g_abc1', my_id='a_jh9b', to_id='o_xyz2',
           content='Task complete. Details...', message_type='report',
           files_modified=['/opt/mcp-servers/...'])

# 7. Wait for next task
aimsg.wait(group_id='g_abc1', my_id='a_jh9b', timeout=600)

Ops Handoff (Context Limit)

# 1. Send context summary to agent
aimsg.send(group_id, my_id, agent_id, 'CONTEXT SUMMARY: We are building...',
           message_type='message')

# 2. Verify understanding
aimsg.send(group_id, my_id, agent_id, 
           'Questions: What are we building? What phase? Next steps?',
           message_type='verification')

# 3. Wait for confirmation
response = aimsg.wait(group_id, my_id, timeout=120)

# 4. Transfer ops role
aimsg.transfer_ops(group_id, from_ops_id=my_id, to_agent_id=agent_id,
                   context_summary='Full summary...', verification_passed=True)

Integration with LARS Training

Why This Matters for AI Training

Coordination Patterns: Claude learns how to work with other AI instances
Role Understanding: Clear ops vs agent responsibilities
Recovery Protocols: How to resume after context resets
Blocking Operations: Efficient resource usage via BLPOP

Training Data Opportunities

Archive messages for coordination pattern examples
Task → Report pairs for instruction tuning
Recovery sequences for context management training

Message Archive

All messages are permanently archived (survives wipe): - Key: aigroup:archive:{YYYYMMDD}:{msg_id} - Timeline: aigroup:archive:timeline sorted set - Search via search.aimsg tool

Security Assessment

✅ Passwords via credentials_helper (locker l_4f35) ✅ Auto-approve managed per session (enable on join, disable on close) ✅ No command injection vectors ✅ Channel isolation (sorted IDs prevent spoofing)

Server: v2.3.0 | Documented by Rocky (o_cq0c) | 2026-01-06