Voice System v4.1 - Single Tool Architecture

Overview

The Voice system enables AI-to-human spoken communication via ElevenLabs TTS. Version 4.1 simplifies the architecture to a single voice tool with paragraph-based output.

Current Architecture (v4.1)

Voice MCP Server (v4.1.0)

Location: /opt/mcp-servers/voice/mcp_voice_server.py

Single Tool API:

gateway.run([{
    server: 'voice',
    tool: 'voice',
    args: {
        paragraphs: ['First paragraph', 'Second paragraph', ...]
    }
}])

Key Features: - ONE tool instead of 6 (removed voice_200, voice_350, voice_500, voice_1500, voice_2500, voice_queue) - paragraphs array - each string is a separate audio clip - Max 500 chars per paragraph (auto-truncates if exceeded) - All paragraphs generate in PARALLEL on server - Sent to browser sequentially for playback - Uses single ElevenLabs voice ID (no voice parameter needed) - Phonetic conversion for numbers and dates - Auto-saves to context.notes for session history

Nexus Voice VS Code Extension (v1.7.1)

Location: /home/nexus/.config/systemd/user/.cache/voice-extension/

Features: - Background WebSocket connection (stays connected even when browsing files) - Auto-opens sidebar on voice message - Audio queue for sequential playback with minimal gaps - Mute button (🔊/🔇) - mutes volume, audio keeps playing in background - Messages display even when muted - Auto-detects voice server URL based on VS Code remote host

Mute Behavior: - Mute sets volume to 0, does NOT pause - Audio continues playing silently - Queue keeps advancing - Unmute restores volume - you hear wherever the stream currently is - Like muting a TV - show keeps going

Voice WebSocket Bridge

Location: /opt/mcp-servers/voice/voice_websocket_bridge.py Port: 8765

Bridges HTTP POST from MCP server to WebSocket for VS Code extension.

Usage Pattern

Lightning Response Pattern

# Quick acknowledgment first, then substance
gateway.run([{
    server: 'voice',
    tool: 'voice',
    args: {
        paragraphs: [
            'Got it!',  # Lightning response
            'Here is the detailed explanation...',  # Follow-up
            'And another point to consider...'  # More detail
        ]
    }
}])

All three generate in parallel, play sequentially with minimal gaps.

Legacy Reference (Archived)

Previous Tools (v3.x - REMOVED)

voice_200 - 200 char limit
voice_350 - 350 char limit
voice_500 - 500 char limit
voice_1500 - 1500 char limit
voice_2500 - 2500 char limit
voice_queue - Multiple messages
voice parameter for selecting different voices

These were replaced with the single voice tool in v4.0.

File Locations

MCP Server: /opt/mcp-servers/voice/mcp_voice_server.py
WebSocket Bridge: /opt/mcp-servers/voice/voice_websocket_bridge.py
VS Code Extension: /home/nexus/.config/systemd/user/.cache/voice-extension/
Built VSIX: nexus-voice-1.7.1.vsix

Redis Storage

Voice notes are stored in Context environment (port 6620) with format: ctxt:{timestamp}:NOTE:voice:{session_id}