page

InWorld AI Integration

InWorld AI TTS Integration

Why InWorld AI

  • Already working in Nexus voice pipeline
  • Better voice quality than Kokoro/Whisper TTS
  • Supports expression markup [happy], [sad], etc.
  • Voice cloning available (Lena voice)

Current Setup

Voice MCP server at /opt/mcp-servers/voice/ - Uses InWorld as primary provider - ElevenLabs as fallback - WebSocket delivery to browser

DO NOT USE

  • Kokoro TTS (poor quality)
  • Whisper TTS (poor quality)
  • Any other TTS in LiveKit stack

Integration Approach

When LiveKit pipeline produces text response: 1. Intercept the text output 2. Route to our voice MCP server 3. voice.voice({paragraphs: ["response text"]}) 4. WebSocket delivers audio to client

Code Pattern

# Instead of LiveKit's TTS
# response_audio = tts.synthesize(text)

# Route to our gateway
import requests

def speak_via_nexus(text: str):
    # Call our voice MCP through gateway
    # This uses InWorld AI under the hood
    response = requests.post(
        "http://nexus-gateway/voice",
        json={"paragraphs": [text]}
    )
    return response.json()

Voice Characteristics

  • Natural pauses at commas and periods
  • Expression tags: [happy], [laughing], [sigh]
  • Pronunciation: Use hyphens for unusual words (Marga-heat-a)
  • Max 500 chars per paragraph
ID: 2beeb7fd
Path: LARS Voice Assistant > TTS Configuration > InWorld AI Integration
Updated: 2025-12-30T19:41:33