InWorld AI TTS Integration

Why InWorld AI

Already working in Nexus voice pipeline
Better voice quality than Kokoro/Whisper TTS
Supports expression markup [happy], [sad], etc.
Voice cloning available (Lena voice)

Current Setup

Voice MCP server at /opt/mcp-servers/voice/ - Uses InWorld as primary provider - ElevenLabs as fallback - WebSocket delivery to browser

DO NOT USE

Kokoro TTS (poor quality)
Whisper TTS (poor quality)
Any other TTS in LiveKit stack

Integration Approach

When LiveKit pipeline produces text response: 1. Intercept the text output 2. Route to our voice MCP server 3. voice.voice({paragraphs: ["response text"]}) 4. WebSocket delivers audio to client

Code Pattern

# Instead of LiveKit's TTS
# response_audio = tts.synthesize(text)

# Route to our gateway
import requests

def speak_via_nexus(text: str):
    # Call our voice MCP through gateway
    # This uses InWorld AI under the hood
    response = requests.post(
        "http://nexus-gateway/voice",
        json={"paragraphs": [text]}
    )
    return response.json()

Voice Characteristics

Natural pauses at commas and periods
Expression tags: [happy], [laughing], [sigh]
Pronunciation: Use hyphens for unusual words (Marga-heat-a)
Max 500 chars per paragraph